By the numbers

91 percent of city teachers rated effective or higher in first round of new evaluations

PHOTO: Jackie Schechter

Updated, 2:15 p.m. — Far fewer New York City teachers received the highest possible rating on last year’s evaluations than teachers in the rest of the state, according to numbers released Tuesday by state officials.

Of the 62,184 city teachers evaluated in the 2013-14 school year, 9.2 percent earned a “highly effective” rating, though nearly 60 percent of teachers outside the city earned that distinction. Still, 91 percent of city teachers earned one of the top two ratings, with 82 percent of city teachers rated “effective,” 7 percent rated “developing,” and 1.2 percent rated “ineffective,” the lowest possible rating.

Outside of New York City, the numbers skewed toward the higher ratings: Only 2 percent of teachers in the rest of the state were rated developing and less than 1 percent were rated ineffective. Those numbers indicated that the city’s evaluation system was more reliable than the ones negotiated in other districts, state education officials said, not that fewer top teachers were working in New York City schools.

“Our teachers are not getting feedback about their relative strengths and weaknesses,” said Assistant Commissioner Julia Rafal-Baer, referring to the large numbers of teachers in other districts rated highly effective.

The data offers a first glimpse of how the new evaluations — the result of a years-long legislative fight and a contentious dispute between city and union officials — played out in New York City, which debuted the new method in the 2013-14 school year. This is the second year of data for the rest of the state.

The high number of teachers earning the highest ratings sets the stage for another push to change the evaluations, something that Gov. Andrew Cuomo has said he wants to see this year. It’s also likely to be fodder for advocates like Campbell Brown and Mona Davids, who are suing the state over laws they say make it too difficult to remove ineffective teachers.

Screen Shot 2014-12-16 at 10.59.33 AM
PHOTO: NYSED

“The ratings show there’s much more work to do to strengthen the evaluation system,” Regents Chancellor Merryl Tisch said in a statement. “We look forward to working with the Governor, Legislature, NYSUT, and other education stakeholders to strengthen the evaluation law in the coming legislative session to make it a more effective a tool for professional development.”

New York City’s evaluation system is different from those in place in the rest of the state in large part because it was imposed by State Education Commissioner John King after the city and the teachers union were unable to negotiate a plan. Rafal-Baer noted that the plan’s differences from other districts resulted in “significant principal control” and the option to allow teachers and teachers to use video in their observations.

Under the state’s new system, determined by a law passed in 2010, teacher ratings in New York come from three components. Sixty percent of the evaluation comes mostly from classroom observations, 20 percent comes from student learning measures determined by the state, and 20 percent is based on student learning measures determined by the district.

The city’s scoring system set a higher bar for what scores were needed to receive effective and highly effective ratings. For instance, more students needed to hit learning goals for their teachers to be highly rated under the city’s plan.

“A well-developed evaluation system – with four, much more nuanced ratings, instead of only two – helps us identify and provide specific support to struggling teachers, as well as identify those who do not belong in the classroom,” Chancellor Carmen Fariña said in a statement.

The system’s four possible ratings were meant to better distinguish teacher quality, and its supporters said that the evaluations would help resolve the disconnect between teachers’ almost uniformly high ratings and the low number of students who graduate high school prepared for college-level work.

The evaluations’ proponents also said they would also help districts root out the lowest-performing teachers by allowing districts to use ratings to fire or deny job protections. In recent years, supporters have begun to say that the new evaluations are better used to spur teacher improvement.

But as the state simultaneously transitioned to Common Core-aligned tests that factored into some teacher evaluations, lawmakers created a “safety net” to ensure teachers could not lose their jobs or be denied tenure for low student scores. (Gov. Andrew Cuomo still hasn’t signed that legislation.)

Statewide, the distribution of ratings is similar to data from the 2012-13 school year, which the state did not release until this August. In total, 94 percent of teachers and 92 percent of principals earned one of the top two ratings that year.

fight another day

In union defeat, lawmakers end session without revamping teacher evaluation law

After a hard-fought battle by the state teachers union, New York lawmakers went home for the summer without overhauling a controversial teacher evaluation law that ties state test scores to educator ratings.

The bill pushed by the unions would have left decisions about whether to use state test scores in teacher evaluations up to local union negotiations. While the bill cleared the Assembly, it was bottled up by the Senate’s leadership, which demanded charter school concessions in return that Assembly Democrats wouldn’t agree to.

The effort to decouple test scores from teacher evaluations was one of several that fizzled out at the end of a lackluster session characterized by lawmaker gridlock.

“Sen. Flanagan, his caucus and five Democrats chose to betray the state’s teachers,”  said New York State United Teachers President Andy Pallotta in a statement. “Make no mistake, New York teachers, parents and public school students will remember which senators voted against their public schools when we head to the polls this September and again in November.”

There is some possibility that lawmakers could return to finish a few unresolved issues this summer, but Pallotta told Chalkbeat he is not holding out hope for that outcome.

The lack of action is a defeat for the state teachers union, which fought hard for the bill since the beginning of the session. Union officials have staged musical rallies, bought balloons, rented a truck with a message urging lawmakers to pass the bill, and capped off the last day of session handing out ice cream for the cause.

However, the legislative loss gives the union something to rally around during this fall’s elections. Also, other education advocacy organizations are content to engage in a longer process to revamp evaluations.

“Inaction isn’t always the worst outcome,” said Julie Marlette, Director of Governmental Relations for the New York State School Boards Association.“Now we can continue to work with both legislative and regulatory figures to hopefully craft an update to evaluations that is thoughtful and comprehensive and includes all the stakeholders.”  

The news also means that New York’s teacher evaluation saga which has been raging for eight years will spill over into at least next year. Policymakers have been battling about state teacher evaluations since 2010, when New York adopted a system that started using state test scores to rate teachers in order to win federal “Race to the Top” money.

Teacher evaluations were altered again in 2015 when Gov. Andrew Cuomo called for a more stringent evaluation system, saying evaluations as they existed were “baloney.” The new system was met with resistance from the teachers unions and parents across the state. Nearly one in five families boycotted state tests in response to evaluation changes and a handful of other education policies.

The state’s Board of Regents acted quickly, passing a moratorium on the use of grades three to eight math and English tests in teacher evaluations. But the original 2015 law remains on the books. It was a central plank in that law which could require as much as half of an educator’s evaluation to be based on test scores that the unions targeted during this session.

With the moratorium set to expire in 2019, the fight over teacher evaluations will likely become more pressing next year. It may also allow the state education department to play a greater role in shaping the final product. State education department officials had begun to lay out a longer roadmap for redesigning teacher evaluations that involved surveys and workgroups, but the legislative battle threatened to short-circuit their process.

Now officials at the state education department say they will restart their work and pointed out that they could extend the moratorium to provide extra time if needed.

“We will resume the work we started earlier this year to engage teachers, principals and others as we seek input in moving toward developing a new educator evaluation system,” said state education department spokeswoman Emily DeSantis.

For some education advocates, slowing down the process sounds like a good idea.

“Our reaction on the NYSUT Assembly teacher evaluation bill is that you could do worse but that you could also do better and that we should take time to try,” said Bob Lowry, deputy director of the New York State Council of School Superintendents.

What seems to be a setback for the union now may be a galvanizing force during elections this fall. Republican lawmakers will likely struggle to keep control of the state Senate, and NYSUT is promising to use this inaction against them. That could be particularly consequential in Long Island, which is a hotbed of the testing opt-out movement.

It’s unclear whether the failure to act will also prove problematic for Cuomo, who is also seeking re-election. Cuomo, who pushed for the 2015 law the unions despise, is facing competition from the left in gubernatorial challenger Cynthia Nixon.

But at least so far, it seems like the union is reserving the blame for Senate Republicans and not for the governor.

Cuomo is “making it clear that he has heard the outcry,” said Pallotta. “I blame Senator Flanagan, I blame his conference and I blame 5 [Senate] Democrats.”

a high-stakes evaluation

The Gates Foundation bet big on teacher evaluation. The report it commissioned explains how those efforts fell short.

PHOTO: Brandon Dill/The Commercial Appeal
Sixth-grade teacher James Johnson leads his students in a gameshow-style lesson on energy at Chickasaw Middle School in 2014 in Shelby County. The district was one of three that received a grant from the Gates Foundation to overhaul teacher evaluation.

Barack Obama’s 2012 State of the Union address reflected the heady moment in education. “We know a good teacher can increase the lifetime income of a classroom by over $250,000,” he said. “A great teacher can offer an escape from poverty to the child who dreams beyond his circumstance.”

Bad teachers were the problem; good teachers were the solution. It was a simplified binary, but the idea and the research it drew on had spurred policy changes across the country, including a spate of laws establishing new evaluation systems designed to reward top teachers and help weed out low performers.

Behind that effort was the Bill and Melinda Gates Foundation, which backed research and advocacy that ultimately shaped these changes.

It also funded the efforts themselves, specifically in several large school districts and charter networks open to changing how teachers were hired, trained, evaluated, and paid. Now, new research commissioned by the Gates Foundation finds scant evidence that those changes accomplished what they were meant to: improve teacher quality or boost student learning.  

The 500-plus page report by the Rand Corporation, released Thursday, details the political and technical challenges of putting complex new systems in place and the steep cost — $575 million — of doing so.

The post-mortem will likely serve as validation to the foundation’s critics, who have long complained about Gates’ heavy influence on education policy and what they call its top-down approach.

The report also comes as the foundation has shifted its priorities away from teacher evaluation and toward other issues, including improving curriculum.

“We have taken these lessons to heart, and they are reflected in the work that we’re doing moving forward,” the Gates Foundation’s Allan Golston said in a statement.

The initiative did not lead to clear gains in student learning.

At the three districts and four California-based charter school networks that took part of the Gates initiative — Pittsburgh; Shelby County (Memphis), Tennessee; Hillsborough County, Florida; and the Alliance-College Ready, Aspire, Green Dot, and Partnerships to Uplift Communities networks — results were spotty. The trends over time didn’t look much better than similar schools in the same state.

Several years into the initiative, there was evidence that it was helping high school reading in Pittsburgh and at the charter networks, but hurting elementary and middle school math in Memphis and among the charters. In most cases there were no clear effects, good or bad. There was also no consistent pattern of results over time.

A complicating factor here is that the comparison schools may also have been changing their teacher evaluations, as the study spanned from 2010 to 2015, when many states passed laws putting in place tougher evaluations and weakening tenure.

There were also lots of other changes going on in the districts and states — like the adoption of Common Core standards, changes in state tests, the expansion of school choice — making it hard to isolate cause and effect. Studies in Chicago, Cincinnati, and Washington D.C. have found that evaluation changes had more positive effects.

Matt Kraft, a professor at Brown who has extensively studied teacher evaluation efforts, said the disappointing results in the latest research couldn’t simply be chalked up to a messy rollout.

These “districts were very well poised to have high-quality implementation,” he said. “That speaks to the actual package of reforms being limited in its potential.”

Principals were generally positive about the changes, but teachers had more complicated views.

From Pittsburgh to Tampa, Florida, the vast majority of principals agreed at least somewhat that “in the long run, students will benefit from the teacher-evaluation system.”

Source: RAND Corporation

Teachers in district schools were far less confident.

When the initiative started, a majority of teachers in all three districts tended to agree with the sentiment. But several years later, support had dipped substantially. This may have reflected dissatisfaction with the previous system — the researchers note that “many veteran [Pittsburgh] teachers we interviewed reported that their principals had never observed them” — and growing disillusionment with the new one.

Majorities of teachers in all locations reported that they had received useful feedback from their classroom observations and changed their habits as a result.

At the same time, teachers in the three districts were highly skeptical that the evaluation system was fair — or that it made sense to attach high-stakes consequences to the results.

The initiative didn’t help ensure that poor students of color had more access to effective teachers.

Part of the impetus for evaluation reform was the idea, backed by some research, that black and Hispanic students from low-income families were more likely to have lower-quality teachers.  

But the initiative didn’t seem to make a difference. In Hillsborough County, inequity expanded. (Surprisingly, before the changes began, the study found that low-income kids of color actually had similar or slightly more effective teachers than other students in Pittsburgh, Hillsborough County, and Shelby County.)

Districts put in place modest bonuses to get top teachers to switch schools, but the evaluation system itself may have been a deterrent.

“Central-office staff in [Hillsborough County] reported that teachers were reluctant to transfer to high-need schools despite the cash incentive and extra support because they believed that obtaining a good VAM score would be difficult at a high-need school,” the report says.

Evaluation was costly — both in terms of time and money.

The total direct cost of all aspects of the program, across several years in the three districts and four charter networks, was $575 million.

That amounts to between 1.5 and 6.5 percent of district or network budgets, or a few hundred dollars per student per year. Over a third of that money came from the Gates Foundation.

The study also quantifies the strain of the new evaluations on school leaders’ and teachers’ time as costing upwards of $200 per student, nearly doubling the the price tag in some districts.

Teachers tended to get high marks on the evaluation system.

Before the new evaluation systems were put in place, the vast majority of teachers got high ratings. That hasn’t changed much, according to this study, which is consistent with national research.

In Pittsburgh, in the initial two years, when evaluations had low stakes, a substantial number of teachers got low marks. That drew objections from the union.

“According to central-office staff, the district adjusted the proposed performance ranges (i.e., lowered the ranges so fewer teachers would be at risk of receiving a low rating) at least once during the negotiations to accommodate union concerns,” the report says.

Morgaen Donaldson, a professor at the University of Connecticut, said the initial buy-in followed by pushback isn’t surprising, pointing to her own research in New Haven.

To some, aspects of the initiative “might be worth endorsing at an abstract level,” she said. “But then when the rubber hit the road … people started to resist.”

More effective teachers weren’t more likely to stay teaching, but less effective teachers were more likely to leave.

The basic theory of action of evaluation changes is to get more effective teachers into the classroom and then stay there, while getting less effective ones out or helping them improve.

The Gates research found that the new initiatives didn’t get top teachers to stick around any longer. But there was some evidence that the changes made lower-rated teachers more likely to leave. Less than 1 percent of teachers were formally dismissed from the places where data was available.

After the grants ran out, districts scrapped some of the changes but kept a few others.

One key test of success for any foundation initiative is whether it is politically and financially sustainable after the external funds run out. Here, the results are mixed.

Both Pittsburgh and Hillsborough have ended high-profile aspects of their program: the merit pay system and bringing in peer evaluators, respectively.

But other aspects of the initiative have been maintained, according to the study, including the use of classroom observation rubrics, evaluations that use multiple metrics, and certain career-ladder opportunities.

Donaldson said she was surprised that the peer evaluators didn’t go over well in Hillsborough. Teachers unions have long promoted peer-based evaluation, but district officials said that a few evaluators who were rude or hostile soured many teachers on the concept.

“It just underscores that any reform relies on people — no matter how well it’s structured, no matter how well it’s designed,” she said.

Correction: A previous version of this story stated that about half of the money for the initiative came from the Gates Foundation; in fact, the foundation’s share was 37 percent or about a third of the total.