If state tests keep changing, should they still be used to judge struggling schools?

In a packed room of educators, New York City schools Chancellor Carmen Fariña announced that the city’s turnaround program for struggling schools is making extraordinary progress.

“I want to be clear,” Fariña said. “English proficiency … increased at 59 out of the 63 [Renewal] schools. Let me say this again, 59 out of 63 schools.”

Fariña is correct, but the state offered a more tempered assessment of the scores. On Friday, State Education Commissioner MaryEllen Elia said changes to this year’s test, such as offering students unlimited time and asking fewer questions, meant last year’s scores were not an “apples-to-apples comparison” with last year’s.

Her statement underscores what critics see as a dilemma: As tests continue to change, how can officials judge yearly progress on major initiatives?

That question is particularly relevant to two high-profile programs for struggling schools — the state’s receivership program and the city’s “Renewal” school program — both of which carry penalties for underperforming schools and use test scores as one way to gauge student progress. Renewal schools are expected to show improvements between 2015 and 2017, but the test process has changed within that timeframe — and could change again next year.

“It’s like trying to judge the success of a weight loss program when you have three different scales that you can’t count on,” said Aaron Pallas, a professor of sociology and education at Teachers College at Columbia University.

City officials defended the comparison between 2015 and 2016 test results, saying the rigor of the exam remained the same this year, only the structure of the test changed. They also noted that multiple benchmarks will be used to evaluate “Renewal” schools, not just test scores.

“These tests are not easier,” Fariña said during a Monday press conference. “I want to be clear on that. These tests had the same rigor as the one they took last year.”

State officials reiterated that the tests were “comparably rigorous” to last year’s assessments and said they will review the indicators used to judge improvement in struggling schools and make sure they are “working as intended.”

Even if the comparison is flawed, some say, the test scores are still useful in assessing progress. “We have to have some point of comparison for how our students are doing, as imperfect as it might be,” said David Albert, spokesman for the New York State School Boards Association.

This year is not the first in which tests have changed — far from it. The tests have been revised multiple times over the last decade, with certain years showing large swings due to those changes.

After test scores dropped in 2013 with the introduction of Common Core standards and exams, the state vowed to revise the test to address concerns.

That process will likely take years, and during the transition period, grades 3-8 math and English tests will not be used to evaluate teachers. But state and city officials are still using those tests to judge struggling schools, and that’s a problem, said David Bloomfield, an education professor at Brooklyn College and the CUNY Graduate Center. Schools on the state’s or city’s lists of low-performing schools could face consequences, such as being taken over by an outside receiver or closed, if they fail to meet academic benchmarks.

“This isn’t a new phenomenon, it’s just that earlier there weren’t high stakes,” Bloomfield said. “Now because of the particularly short time span that’s involved, they are being used way beyond their ability for accurate measurement.”

Even without changes to the test, yearly fluctuations should be viewed carefully, said Roey Ahram, director of research and evaluation at the NYU Steinhardt Metropolitan Center for Research on Equity and the Transformation of Schools.

“You always have to look at test scores with a grain of salt, whether they are changing or not,” he said. “The asterisk is there for a reason.”