Ratings Reduction

Fariña floats possible evaluations fix that would require UFT reversal

The city schools chief has floated an idea to simplify the complex new teacher evaluation system, but carrying it out would require a major concession from the teachers union.

When the union and city were negotiating teacher evaluations in recent years, one of the many sticking points was how many factors teachers should be rated on when their classes are observed.

The union wanted teachers to be scored on all 22 components of a teacher-effectiveness rubric, while the city pushed for just seven of the rubric components. Ultimately, the state intervened last year and insisted on 22 components.

Now, well into the city’s first year under the new system, many principals report feeling swamped by all their rating duties, and some teachers wonder how fairly they will be rated on all those measures.

Enter new Schools Chancellor Carmen Fariña, a former principal committed to lifting unnecessary burdens from school leaders. At a private meeting with administrators in January, she raised the idea of greatly reducing the number of rubric components that principals and other evaluators have to rate teachers on, according to several people at the meeting.

To get state approval for that change for next school year, the United Federation of Teachers would need to sign off on it. That would mark a significant reversal for the union — but perhaps a palatable trade-off as it seeks billions in back pay and raises in the ongoing contract negotiations with the city.

Meanwhile, the change would undoubtedly cheer school leaders who have struggled to observe each of their teachers multiple times this year and rate them on the nearly two dozen components as required by the new system.

“I’m way behind,” said William Frackelton, principal of Soundview Academy for Culture and Scholarship in the Bronx, who supports the 22-component rubric in theory. “But in practice, how manageable is it? It’s a beast.”

At a meeting in late January with district superintendents and school-support network leaders, Fariña spoke about the need to support overburdened principals, according to several attendees. She suggested one way to do that would be to pare down the 22 instructional components that principals must observe and rate.

“She said, ‘That’s too many, we need to get it down,’” said Alan Dichter, a network leader. He added that he took Fariña’s comment as an “intention,” not a firm commitment.

The component question has not gone away since that meeting. At a conference for new principals Saturday, a veteran principal leading a workshop on evaluations said there could be fewer components in the future, but that the city is still discussing the matter with the teachers union, according to a principal who attended the workshop.

“Something good is cooking,” said the attendee, who requested anonymity because she had not been authorized to discuss the private training.

The state education commissioner imposed the new evaluation system last summer after a long city-union tussle over the details. Under it, 60 percent of teachers’ ratings come from subjective measures, including observations by administrators.

To rate teachers’ performance, principals or other evaluators must use a rubric known as the Danielson Framework. The rubric is divided into four “domains” of teaching: planning; classroom environment, which includes managing student behavior; instruction; and professional duties, such as communicating with parents and keeping records. Those domains are then broken down into 22 narrower components, such as cultivating a respectful classroom culture and sparking rich class discussions.

In its written submission for the state arbitration hearing, the UFT argued that the full 22 components are “essential” to measure the complexity of teaching. What it didn’t say, but what many read into the UFT’s position paper, was that requiring all 22 components could protect low-rated teachers from consequences that include firing: More components mean more potential points a teacher could contest if given a poor rating.

The city education department argued that teachers could be fairly rated using just seven Danielson components. It pointed to research that shows complex rubrics can overwhelm evaluators, leading them to rate disparate components similarly. It also noted that the city had used seven components during an evaluation pilot program. It cited evidence that the pilot ratings were accurate and that 93 percent of school leaders in the program said the seven components provided enough data to make fair assessments.

State Education Commissioner John King sided with the union on the issue of components, ruling that the Danielson rubric was “validated and was designed to be used in its entirety.”

As a result, New York City principals must rate teachers annually on all 22 components, for which they can use both observations and other evidence, such as teacher-created lessons and tests.

Many principals and other administrators have struggled to observe each teacher the required number of times, document their ratings and evidence, and give teachers feedback.

Frackelton, the Bronx principal, and an assistant principal must observe and rate 30 teachers. He said some school leaders respond to that pressure by filling in “cookie-cutter” explanations of their ratings on multiple teachers’ forms. He said he avoids using such stock language only by working on the forms until 10 p.m. some nights and on Saturdays.

“It’s really a lot of work to do it well,” he said.

The schools in the city’s evaluation pilot program did not expect to jump from seven to 22 rubric components when the official system launched this year, said Thandi Center, New York City director for the New Teacher Center, which was one of the city’s lead partners in the pilot. She said many principals have complained the new system “isn’t doable,” and teachers have expressed concern about the “credibility” of their ratings.

“I just think it’s untenable to introduce 22 components and expect it to be done well consistently,” Center said.

Phil Weinberg, the education department’s new deputy chancellor for teaching and learning, acknowledged principals’ concerns about the evaluations in a letter last week. He offered them advice for “reducing evaluator burden” and announced a survey and “listening tour” next month where the city will collect feedback from principals about evaluations. He also urged principals struggling to rate all their teachers before the June deadline to contact their support networks “immediately.”

If the city and union were to agree on an evaluation change for next year, they would have to jointly submit a request to the state.

If they ask to rate teachers on fewer rubric components, they would need to prove that all four domains will still be assessed and that the “integrity of the rubric” is preserved, said Julia Rafal-Baer, executive director of the state education department’s Office of Teacher and Leader Effectiveness.

She noted that some districts have approved evaluation plans that guarantee all Danielson domains will be assessed, but not all 22 components will be rated. For example, Webster Central School District’s plan says any observed components can be rated, but only seven specific components absolutely must be rated.

Rafal-Baer added that it would be “interesting” if the city teachers union agreed to fewer components, since the UFT “really felt very strongly about having all 22 components” when it pitched its evaluation plan last year to the state.

The union is currently pushing for more than $3 billion in back pay in contract negotiations with the city, along with a pay hike for the future. Teacher evaluations are part of those negotiations, and the UFT could potentially use a component-number change as a bargaining chip.

A UFT spokesperson declined to comment, citing the union’s policy to avoid public negotiations.

A city Department of Education spokesman declined to comment on Fariña’s remarks or possible evaluation changes, saying the city’s focus is on “improving classroom instruction.”

“Through meaningful observations and feedback under the evaluation system, it’s our goal to help educators hone their craft,” said the spokesman, Devon Puglia.

fight another day

In union defeat, lawmakers end session without revamping teacher evaluation law

After a hard-fought battle by the state teachers union, New York lawmakers went home for the summer without overhauling a controversial teacher evaluation law that ties state test scores to educator ratings.

The bill pushed by the unions would have left decisions about whether to use state test scores in teacher evaluations up to local union negotiations. While the bill cleared the Assembly, it was bottled up by the Senate’s leadership, which demanded charter school concessions in return that Assembly Democrats wouldn’t agree to.

The effort to decouple test scores from teacher evaluations was one of several that fizzled out at the end of a lackluster session characterized by lawmaker gridlock.

“Sen. Flanagan, his caucus and five Democrats chose to betray the state’s teachers,”  said New York State United Teachers President Andy Pallotta in a statement. “Make no mistake, New York teachers, parents and public school students will remember which senators voted against their public schools when we head to the polls this September and again in November.”

There is some possibility that lawmakers could return to finish a few unresolved issues this summer, but Pallotta told Chalkbeat he is not holding out hope for that outcome.

The lack of action is a defeat for the state teachers union, which fought hard for the bill since the beginning of the session. Union officials have staged musical rallies, bought balloons, rented a truck with a message urging lawmakers to pass the bill, and capped off the last day of session handing out ice cream for the cause.

However, the legislative loss gives the union something to rally around during this fall’s elections. Also, other education advocacy organizations are content to engage in a longer process to revamp evaluations.

“Inaction isn’t always the worst outcome,” said Julie Marlette, Director of Governmental Relations for the New York State School Boards Association.“Now we can continue to work with both legislative and regulatory figures to hopefully craft an update to evaluations that is thoughtful and comprehensive and includes all the stakeholders.”  

The news also means that New York’s teacher evaluation saga which has been raging for eight years will spill over into at least next year. Policymakers have been battling about state teacher evaluations since 2010, when New York adopted a system that started using state test scores to rate teachers in order to win federal “Race to the Top” money.

Teacher evaluations were altered again in 2015 when Gov. Andrew Cuomo called for a more stringent evaluation system, saying evaluations as they existed were “baloney.” The new system was met with resistance from the teachers unions and parents across the state. Nearly one in five families boycotted state tests in response to evaluation changes and a handful of other education policies.

The state’s Board of Regents acted quickly, passing a moratorium on the use of grades three to eight math and English tests in teacher evaluations. But the original 2015 law remains on the books. It was a central plank in that law which could require as much as half of an educator’s evaluation to be based on test scores that the unions targeted during this session.

With the moratorium set to expire in 2019, the fight over teacher evaluations will likely become more pressing next year. It may also allow the state education department to play a greater role in shaping the final product. State education department officials had begun to lay out a longer roadmap for redesigning teacher evaluations that involved surveys and workgroups, but the legislative battle threatened to short-circuit their process.

Now officials at the state education department say they will restart their work and pointed out that they could extend the moratorium to provide extra time if needed.

“We will resume the work we started earlier this year to engage teachers, principals and others as we seek input in moving toward developing a new educator evaluation system,” said state education department spokeswoman Emily DeSantis.

For some education advocates, slowing down the process sounds like a good idea.

“Our reaction on the NYSUT Assembly teacher evaluation bill is that you could do worse but that you could also do better and that we should take time to try,” said Bob Lowry, deputy director of the New York State Council of School Superintendents.

What seems to be a setback for the union now may be a galvanizing force during elections this fall. Republican lawmakers will likely struggle to keep control of the state Senate, and NYSUT is promising to use this inaction against them. That could be particularly consequential in Long Island, which is a hotbed of the testing opt-out movement.

It’s unclear whether the failure to act will also prove problematic for Cuomo, who is also seeking re-election. Cuomo, who pushed for the 2015 law the unions despise, is facing competition from the left in gubernatorial challenger Cynthia Nixon.

But at least so far, it seems like the union is reserving the blame for Senate Republicans and not for the governor.

Cuomo is “making it clear that he has heard the outcry,” said Pallotta. “I blame Senator Flanagan, I blame his conference and I blame 5 [Senate] Democrats.”

a high-stakes evaluation

The Gates Foundation bet big on teacher evaluation. The report it commissioned explains how those efforts fell short.

PHOTO: Brandon Dill/The Commercial Appeal
Sixth-grade teacher James Johnson leads his students in a gameshow-style lesson on energy at Chickasaw Middle School in 2014 in Shelby County. The district was one of three that received a grant from the Gates Foundation to overhaul teacher evaluation.

Barack Obama’s 2012 State of the Union address reflected the heady moment in education. “We know a good teacher can increase the lifetime income of a classroom by over $250,000,” he said. “A great teacher can offer an escape from poverty to the child who dreams beyond his circumstance.”

Bad teachers were the problem; good teachers were the solution. It was a simplified binary, but the idea and the research it drew on had spurred policy changes across the country, including a spate of laws establishing new evaluation systems designed to reward top teachers and help weed out low performers.

Behind that effort was the Bill and Melinda Gates Foundation, which backed research and advocacy that ultimately shaped these changes.

It also funded the efforts themselves, specifically in several large school districts and charter networks open to changing how teachers were hired, trained, evaluated, and paid. Now, new research commissioned by the Gates Foundation finds scant evidence that those changes accomplished what they were meant to: improve teacher quality or boost student learning.  

The 500-plus page report by the Rand Corporation, released Thursday, details the political and technical challenges of putting complex new systems in place and the steep cost — $575 million — of doing so.

The post-mortem will likely serve as validation to the foundation’s critics, who have long complained about Gates’ heavy influence on education policy and what they call its top-down approach.

The report also comes as the foundation has shifted its priorities away from teacher evaluation and toward other issues, including improving curriculum.

“We have taken these lessons to heart, and they are reflected in the work that we’re doing moving forward,” the Gates Foundation’s Allan Golston said in a statement.

The initiative did not lead to clear gains in student learning.

At the three districts and four California-based charter school networks that took part of the Gates initiative — Pittsburgh; Shelby County (Memphis), Tennessee; Hillsborough County, Florida; and the Alliance-College Ready, Aspire, Green Dot, and Partnerships to Uplift Communities networks — results were spotty. The trends over time didn’t look much better than similar schools in the same state.

Several years into the initiative, there was evidence that it was helping high school reading in Pittsburgh and at the charter networks, but hurting elementary and middle school math in Memphis and among the charters. In most cases there were no clear effects, good or bad. There was also no consistent pattern of results over time.

A complicating factor here is that the comparison schools may also have been changing their teacher evaluations, as the study spanned from 2010 to 2015, when many states passed laws putting in place tougher evaluations and weakening tenure.

There were also lots of other changes going on in the districts and states — like the adoption of Common Core standards, changes in state tests, the expansion of school choice — making it hard to isolate cause and effect. Studies in Chicago, Cincinnati, and Washington D.C. have found that evaluation changes had more positive effects.

Matt Kraft, a professor at Brown who has extensively studied teacher evaluation efforts, said the disappointing results in the latest research couldn’t simply be chalked up to a messy rollout.

These “districts were very well poised to have high-quality implementation,” he said. “That speaks to the actual package of reforms being limited in its potential.”

Principals were generally positive about the changes, but teachers had more complicated views.

From Pittsburgh to Tampa, Florida, the vast majority of principals agreed at least somewhat that “in the long run, students will benefit from the teacher-evaluation system.”

Source: RAND Corporation

Teachers in district schools were far less confident.

When the initiative started, a majority of teachers in all three districts tended to agree with the sentiment. But several years later, support had dipped substantially. This may have reflected dissatisfaction with the previous system — the researchers note that “many veteran [Pittsburgh] teachers we interviewed reported that their principals had never observed them” — and growing disillusionment with the new one.

Majorities of teachers in all locations reported that they had received useful feedback from their classroom observations and changed their habits as a result.

At the same time, teachers in the three districts were highly skeptical that the evaluation system was fair — or that it made sense to attach high-stakes consequences to the results.

The initiative didn’t help ensure that poor students of color had more access to effective teachers.

Part of the impetus for evaluation reform was the idea, backed by some research, that black and Hispanic students from low-income families were more likely to have lower-quality teachers.  

But the initiative didn’t seem to make a difference. In Hillsborough County, inequity expanded. (Surprisingly, before the changes began, the study found that low-income kids of color actually had similar or slightly more effective teachers than other students in Pittsburgh, Hillsborough County, and Shelby County.)

Districts put in place modest bonuses to get top teachers to switch schools, but the evaluation system itself may have been a deterrent.

“Central-office staff in [Hillsborough County] reported that teachers were reluctant to transfer to high-need schools despite the cash incentive and extra support because they believed that obtaining a good VAM score would be difficult at a high-need school,” the report says.

Evaluation was costly — both in terms of time and money.

The total direct cost of all aspects of the program, across several years in the three districts and four charter networks, was $575 million.

That amounts to between 1.5 and 6.5 percent of district or network budgets, or a few hundred dollars per student per year. Over a third of that money came from the Gates Foundation.

The study also quantifies the strain of the new evaluations on school leaders’ and teachers’ time as costing upwards of $200 per student, nearly doubling the the price tag in some districts.

Teachers tended to get high marks on the evaluation system.

Before the new evaluation systems were put in place, the vast majority of teachers got high ratings. That hasn’t changed much, according to this study, which is consistent with national research.

In Pittsburgh, in the initial two years, when evaluations had low stakes, a substantial number of teachers got low marks. That drew objections from the union.

“According to central-office staff, the district adjusted the proposed performance ranges (i.e., lowered the ranges so fewer teachers would be at risk of receiving a low rating) at least once during the negotiations to accommodate union concerns,” the report says.

Morgaen Donaldson, a professor at the University of Connecticut, said the initial buy-in followed by pushback isn’t surprising, pointing to her own research in New Haven.

To some, aspects of the initiative “might be worth endorsing at an abstract level,” she said. “But then when the rubber hit the road … people started to resist.”

More effective teachers weren’t more likely to stay teaching, but less effective teachers were more likely to leave.

The basic theory of action of evaluation changes is to get more effective teachers into the classroom and then stay there, while getting less effective ones out or helping them improve.

The Gates research found that the new initiatives didn’t get top teachers to stick around any longer. But there was some evidence that the changes made lower-rated teachers more likely to leave. Less than 1 percent of teachers were formally dismissed from the places where data was available.

After the grants ran out, districts scrapped some of the changes but kept a few others.

One key test of success for any foundation initiative is whether it is politically and financially sustainable after the external funds run out. Here, the results are mixed.

Both Pittsburgh and Hillsborough have ended high-profile aspects of their program: the merit pay system and bringing in peer evaluators, respectively.

But other aspects of the initiative have been maintained, according to the study, including the use of classroom observation rubrics, evaluations that use multiple metrics, and certain career-ladder opportunities.

Donaldson said she was surprised that the peer evaluators didn’t go over well in Hillsborough. Teachers unions have long promoted peer-based evaluation, but district officials said that a few evaluators who were rude or hostile soured many teachers on the concept.

“It just underscores that any reform relies on people — no matter how well it’s structured, no matter how well it’s designed,” she said.

Correction: A previous version of this story stated that about half of the money for the initiative came from the Gates Foundation; in fact, the foundation’s share was 37 percent or about a third of the total.