data-driven decisionmaking

Why we won't publish individual teachers' value-added scores

Tomorrow’s planned release of 12,000 New York City teacher ratings raises questions for the courts, parents, principals, bureaucrats, teachers — and one other party: news organizations. The journalists who requested the release of the data in the first place now must decide what to do with it all.

At GothamSchools, we joined other reporters in requesting to see the Teacher Data Reports back in 2010. But you will not see the database here, tomorrow or ever, as long as it is attached to individual teachers’ names.

The fact is that we feel a strong responsibility to report on the quality of the work the 80,000 New York City public school teachers do every day. This is a core part of our job and our mission.

But before we publish any piece of information, we always have to ask a question. Does the information we have do a fair job of describing the subject we want to write about? If it doesn’t, is there any additional information — context, anecdotes, quantitative data — that we can provide to paint a fuller picture?

In the case of the Teacher Data Reports, “value-added” assessments of teachers’ effectiveness that were produced in 2009 and 2010 for reading and math teachers in grades 3 to 8, the answer to both those questions was no.

We determined that the data were flawed, that the public might easily be misled by the ratings, and that no amount of context could justify attaching teachers’ names to the statistics. When the city released the reports, we decided, we would write about them, and maybe even release Excel files with names wiped out. But we would not enable our readers to generate lists of the city’s “best” and “worst” teachers or to search for individual teachers at all.

It’s true that the ratings the city is releasing might turn out to be powerful measures of a teacher’s success at helping students learn. The problem lies in that word: might.

Value-added measures do, by many readings, appear to do the job that no measure of a teacher’s quality has done before: They estimate the amount of learning by students for which a teacher, and no one else, is responsible, and they do this with impressive reliability. That is, a teacher judged to be more effective one year by value-added is likely to continue to be judged effective the next year, and the year after that.

But this is not true for every teacher — hardly. Many teachers will be mislabeled; no one disputes this. Value-added scores may be more reliable than existing alternatives, but they are still far from perfectly reliable. It’s completely possible, for instance, that a teacher judged as less effective one year will be judged as very effective the next, and vice versa.

As we reported two years ago, when the NYU economist Sean Corcoran looked at New York City’s value-added data, he found that 31 percent of English teachers who ranked in the bottom quintile of teachers in 2007 had jumped to one of the top two quintile by 2008. About 23 percent of math teachers made the same jump.

The fluctuation is acknowledged by even the strongest supporters of using value-added measures to evaluate teachers. One of the creators of the city’s original value-added model, the Columbia economist Jonah Rockoff, compares value-added scores to baseball players’ batting averages. One of his reasons: In each case, the year-to-year fluctuations of an individual’s score are about the same.

“If someone hit, you know, .280 last year, that doesn’t guarantee they’re going to hit .280 next year,” Rockoff said today. “However, if you hit .210 last year and I hit .300, there’s a very high likelhood I’m going to hit more than you next year, too. Whereas if you hit .280 and I hit .278, we’re basically the same.”

Another challenge is that many researchers still aren’t convinced that value-added scores are measuring the right sort of teacher impact. The challenge lies in the flaws of the measures on which value-added scores depend — standardized state test scores.

Tests are supposed to measure what a student has learned about a subject, but they can also reflect other things, like how well her teacher prepared her for the test, or how well she mastered the narrow band of the subject the test assessed.

The test-prep concern is magnified by findings that a single teacher can generate two different value-added scores if evaluators use two different student tests to determine them. The Gates Foundation’s Measures of Effective Teaching study calculated value-added scores for teachers based on both state tests and more conceptual tests. They found substantial differences between the two, according to an analysis by the economist Jesse Rothstein of the University of California at Berkeley.

“If it’s right that some teachers are good at raising the state test scores and other teachers are good at raising other test scores, then we have to decide which tests we care about,” Rothstein said today. “If we’re not sure that this is the test that captures what good teaching is, then we might be getting our estimates of teaching quality very wrong.”

Flags about exactly what high value-added ratings reward are also raised by studies that ask how the ratings match up with measures of what teachers actually say and do in the classroom. Heather Hill,  professor at Harvard’s Graduate School of Education, rated math teachers’ teaching quality based on an observation rubric called the Mathematical Quality of Instruction, which looks at factors like whether the teacher made mathematical errors and the quality of her explanations. Then Hill compared the math teaching rating to value-added measures.

Two individual cases stood out: One teacher had made a slew of math errors in her teaching, and the other had failed to connect a class activity to math concepts. But teachers’ value-added scores put them at the top of their cohort.

There is some reason to think that value-added measures reflect more than test prep. Rockoff points out that while different tests can produce different value-added scores for the same teacher, the two measures are still correlated. Using different tests, he said, is akin to looking at slugging percentage rather than batting average. “I’m sure those two things are positively correlated, but probably not one for one,” he said.

More persuasively, a recent study by Rockoff and two other colleagues concluded that value-added measures can actually predict long-term life success outcomes, including higher cumulative lifelong income, reduced chance of teen pregnancy, and living in a high-quality neighborhood as an adult. The study examined an anonymous very large urban school district that bears several similarities to New York City.

That study targeted another concern about value-added measures: that teachers score consistently well year after year not because of something they are doing, but because they consistently teach students with certain advantages.

Rothstein has used value-added models to conclude that fifth-grade teachers have strong effects on their students’ performances in third-grade — something they could not possibly influence, unless value-added scores reflect not just teachers’ influence but also advantages brought by students.

Rockoff and his colleagues evaluated the possibility by testing a question. If high-value added teachers do well because they get the “better” students of those in their grade, then their students’ high test score growth would be linked with mediocre performance in other classrooms. That would mean that, when researchers looked at growth for the entire grade, the “better” students’ growth would be canceled out by their less lucky peers. But the scores were not canceled out, suggesting that effective teachers did more than just have unusually good students.

None of this means that we won’t write about what the data dump includes or that we might not publish an adapted database that strips out information linking the city’s data to individual teachers. With more than 90 columns in the Excel sheet the city has developed — and more than 17,000 rows, representing the number of reports issued over their two-year lifespan — the release might well enable us to examine the city’s value-added experiment in new ways.

Value-added measures certainly aren’t going away. City officials only stopped producing Teacher Data Reports because they knew the State Education Department is preparing its own. The measures, which are expected to come out in 2013, will make up 25% of the evaluation for teachers of math and English in tested grades.

money matters

Report: Trump education budget would create a Race to the Top for school choice

PHOTO: Official White House Photo by Shealah Craighead
President Donald Trump and U.S. Secretary of Education Betsy DeVos participate in a tour of Saint Andrews Catholic in Orlando, Florida.

The Trump administration appears to be going ahead with a $1 billion effort to push districts to allow school choice, according to a report in the Washington Post.

The newspaper obtained what appears to be an advance version of the administration’s education budget, set for release May 23. The budget documents reflect more than $10 billion in cuts, many of which were included in the budget proposal that came out in March, according to the Post’s report. They include cuts to after-school programs for poor students, teacher training, and more:

… a $15 million program that provides child care for low-income parents in college; a $27 million arts education program; two programs targeting Alaska Native and Native Hawaiian students, totaling $65 million; two international education and foreign language programs, $72 million; a $12 million program for gifted students; and $12 million for Special Olympics education programs.

Other programs would not be eliminated entirely, but would be cut significantly. Those include grants to states for career and technical education, which would lose $168 million, down 15 percent compared to current funding; adult basic literacy instruction, which would lose $96 million (down 16 percent); and Promise Neighborhoods, an Obama-era initiative meant to build networks of support for children in needy communities, which would lose $13 million (down 18 percent).

The documents also shed some light on how the administration plans to encourage school choice. The March proposal said the administration would spend $1 billion to encourage districts to switch to “student-based budgeting,” or letting funds flow to students rather than schools.

The approach is considered essential for school choice to thrive. Yet the mechanics of the Trump administration making it happen are far from obvious, as we reported in March:

There’s a hitch in the budget proposal: Federal law spells out exactly how Title I funds must be distributed, through funding formulas that sends money to schools with many poor students.

“I do not see a legal way to spend a billion dollars on an incentive for weighted student funding through Title I,” said Nora Gordon, an associate professor of public policy at Georgetown University. “I think that would have to be a new competitive program.”

There are good reasons for the Trump administration not to rush into creating a program in which states compete for new federal funds, though. … Creating a new program would open the administration to criticism of overreach — which the Obama administration faced when it used the Race to the Top competition to get states to adopt its priorities.

It’s unclear from the Post’s report how the Trump administration is handling Gordon’s concerns. But the Post reports that the administration wants to use a competitive grant program — which it’s calling Furthering Options for Children to Unlock Success, or FOCUS — to redistribute $1 billion in Title I funds for poor students. That means the administration decided that an Obama-style incentive program is worth the potential risks.

The administration’s budget request would have to be fulfilled by Congress, so whether any of the cuts or new programs come to pass is anyone’s guess. Things are not proceeding normally in Washington, D.C., right now.

By the numbers

After reshaping itself to combat declining interest, Teach For America reports a rise in applications

PHOTO: Kayleigh Skinner
Memphis corps members of Teach For America participate in a leadership summit in last August.

Teach for America says its application numbers jumped by a significant number this year, reversing a three-year trend of declining interest in the program.

The organization’s CEO said in a blog post this week that nearly 49,000 people applied for the 2017 program, which places college graduates in low-income schools across the country after summer training — up from just 37,000 applicants last year.

“After three years of declining recruitment, our application numbers spiked this year, and we’re in a good position to meet our goals for corps size, maintaining the same high bar for admission that we always have,” Elisa Villanueva Beard wrote. The post was reported by Politico on Wednesday.

The news comes after significant shake-ups at the organization. One of TFA’s leaders left in late 2015, and the organization slashed its national staff by 15 percent last year. As applications fell over the last several years, it downsized in places like New York City and Memphis, decentralized its operations, and shifted its focus to attracting a more diverse corps with deeper ties to the locations where the program places new teachers. 

This year’s application numbers are still down from 2013, when 57,000 people applied for a position. But Villanueva Beard said the changes were working, and that “slightly more than half of 2017 applicants identify as a person of color.”