Making the grade

New data show more than half of NYC teachers judged, in part, by test scores they don’t directly affect

PHOTO: Christina Veiga

Just over half of New York City teachers were evaluated in the 2015–16 school year, in part, by tests in subjects or of students they didn’t teach, according to data obtained by Chalkbeat through a public records request.

At 53 percent of city teachers, it’s significant number, but substantially lower than in previous years, possibly thanks to a moratorium placed on using state tests, instituted mid-year.

That figure also highlights a key tension in evaluating all teachers by student achievement, even teachers who work with young students or in subjects like physical education. Being judged by other teachers’ students or subjects has long annoyed some educators and relieved others, who otherwise might have had to administer additional tests.

Supporters say evaluating teachers by group measures — often school-wide scores on standardized tests — helps create a sense of shared mission in a school. But the approach could also push teachers away from working in struggling schools.

“The key point around school-wide measures is that this could serve as a strong disincentive for these teachers in non-tested grades and subjects to stay in lower-performing schools,” said Matthew Steinberg at the University of Pennsylvania, who has studied teacher evaluation systems.

Will Mantell, a spokesperson for the New York City Department of Education, defended the district’s approach.

“Selecting school-wide [or] grade-wide … measures may better measure educators’ practice and support professional development,” he said. “For example, it makes sense for a social studies teacher who emphasizes writing in her classroom to be evaluated partially on an assessment of students’ ELA skills.”

New York’s evaluation system has gone through a number of substantial changes since it was first codified in state law in 2012, part of a nationwide push to connect teacher performance to student test scores, spurred by federal incentives.

Student assessments have comprised anywhere from 40 percent of the evaluation to essentially 50 percent, under a matrix system pushed by Governor Andrew Cuomo in 2015. Most recently, New York stopped using grades 3-8 English and math state tests as part of the system, but teachers must continue to be judged based on some assessment.

States across the country have struggled to evaluate teachers in traditionally non-tested grades and subjects. New York City has created a number of exams — known as performance assessments — in non-tested areas and given schools significant flexibility in which measures are used to judge their teachers.

In the 2015-16 school year, 53 percent of teachers were evaluated by a group metric, meaning one not focused on their subject or students. In the two previous years, the number was much higher — around 85 percent. It’s not clear why there was a substantial drop, but a spokesperson for the city’s education department notes that 2015-16 was an “outlier” due to the moratorium on state tests, instituted mid-year.

In all three years, most teachers were also evaluated by at least one individualized measure targeted to teachers’ grade, subject and students.

Data for the most recent school year are not yet available.

It’s also not clear what percentage of a teacher’s rating was based on group measures, and Mantell said this “varies from teacher to teacher.”

The United Federation of Teachers has pushed to give schools more individual options, including the use of more “authentic” assessments, not based on multiple choice questions.

“Right now, we don’t have enough options, which is why our most recent agreement with the DOE seeks to build more authentic assessments for additional grades and subjects,” said Michael Mulgrew, president of the UFT in a statement.

Group measures offer an alternative to creating exams for each teacher in every grade and subject, which can lead to a proliferation of new tests, though in New York City teachers have often been judged by both group and individual metrics.

The challenge of evaluating teachers in traditionally untested areas is not unique to New York, and a number of states have embraced group or school-wide approaches. An analysis of 32 states, conducted by Steinberg, found that the average teacher in a non-tested grade or subject had about 7 percent of his or her evaluation based on school-wide achievement measures, though this averaged together substantial variation from place to place. Teachers in Tennessee and Florida have sued (unsuccessfully), arguing that it is unfair to evaluate them based on students they didn’t teach.

A more popular option, used in some districts in New York, has been student-learning objectives, in which teachers set goals for students often based on classroom exams. This approach has been praised for helping teachers set specific goals, but criticized as burdensome and easy to manipulate.

Research has found that using school-wide measures of performance tends to bring teachers closer to average performance. An analysis by the Brookings Institution showed that these group measures pulled down ratings of teachers with higher individual ratings at low-performing schools.

previewing TNReady

Why Tennessee’s high school test scores, out this week, matter more — and less — than usual

PHOTO: Nic Garcia

When scores dropped last year for most Tennessee high school students under a new state test, leaders spoke of “setting a new baseline” under a harder assessment aligned to more rigorous standards.

This week, Tennesseans will see if last year’s scores — in which nearly three-quarters of high schoolers performed below grade level — was in fact just a reset moment.

Education Commissioner Candice McQueen has scheduled a press conference for Thursday morning to release the highly anticipated second year of high school scores under TNReady, which replaced the state’s TCAP tests in 2015-16. (Students in grades 3-8 will get TNReady scores for the first time this fall; last year, their tests were canceled because of a series of testing failures.)

Here’s what you need to know about this week’s data dump, which will focus on statewide scores.

1. Last year’s low scores weren’t a big surprise.

Not only was it the first time Tennessee students took TNReady, it also was the first time that they were being tested on new academic standards in math and language arts known as the Common Core, which reached Tennessee classrooms in 2012.

Other states that switched to Common Core-aligned exams also saw their scores plummet. In New York, for example, the proportion of students who scored proficient or higher in reading dropped precipitously in 2013 during the first year of a new test for grades 3-8.

McQueen sought last year to prepare Tennessee for the same experience. After all, she said, the state was moving away from a multiple-choice test to one that challenges students’ higher-order thinking skills. Plus, while Tennessee students had been posting strong scores on the state’s own exam, they had struggled on national tests such as the ACT, raising questions about whether the previous state test was a good measure of students’ skills.

“We expected scores to be lower in the first year of a more rigorous assessment,” McQueen said after only 21 percent of high school students scored on or above grade level in math, while 30 percent tested ready in English and reading.

2. It’s expected that this year’s scores will rise … and it will be a bad sign if they don’t.

Over and over, state officials assured Tennesseans that 2016 was just the start.

“[We] expect that scores will rebound over time as all students grow to meet these higher expectations — just as we have seen in the past,” McQueen said.

She was referring to the state’s shift to Diploma Standards in 2009, when passing rates on end-of-course tests dropped by almost half. But in subsequent years, those scores rose steadily in a “sawtooth pattern” that has been documented over and over when states adopt new assessments and students and teachers grow accustomed to them.

That includes New York, where after the worrisome results in 2013, the percentage of students passing started inching up the following year, especially in math.

In Tennessee, this year’s high school scores will provide the first significant data point in establishing whether the state is on the same track. Higher scores would put the state on an upward trajectory, and suggest that students are increasingly proficient in the skills that the test is measuring. Scores that remain flat or go down would raise questions about whether teachers and students are adjusting to more rigorous standards.

3. There’s lots more scores to come.

This week’s statewide high school scores will kick off a cascade of other TNReady results that will be released in the weeks and months ahead.

Next comes district- and school-level high school scores, which will be shared first with school systems before being released to the public. That’s likely to happen in August.

In the fall, Tennessee will release its scores for students in grades 3-8, who took TNReady for the first time this year after the 2016 testing debacle. While testing went better this year, the state’s new testing company needed extra time to score the exams, because additional work goes into setting “cut scores” each time a new test is given.

A group of educators just concluded the process of reviewing the test data to recommend what scores should fall into the state’s four new categories for measuring performance: below grade level, approaching grade level, on grade level, or mastered. The State Board of Education will review and vote on those recommendations next month.

4. This year’s scores are lower stakes than usual, but that probably won’t last.

For years, Tennessee has been a leader in using test scores to judge students, teachers, and schools. Like most states, it uses the data to determine which schools are so low-performing that they should be closed or otherwise overhauled. It also crunches scores through a complicated “value-added” algorithm designed to assess how much learning that teachers contribute to their students — an approach that it has mostly stuck with as value-added measures have fallen out of favor across the nation. And unusually, the state exam scores are also supposed to factor into final student grades, this year counting for 10 percent.

But the rocky road to the new tests has temporarily diminished how much the scores count. Because preliminary scores arrived late this spring, most districts opted to grade students on the basis of their schoolwork alone.

And because of the testing transition, the scores won’t be given as much weight in this year’s teacher evaluations — an adjustment that lawmakers made to alleviate anxiety about the changes. Test scores will contribute only 10 percent to teachers’ ratings. Depending on the subject, that proportion is supposed to rise to between 15 and 25 percent by 2018-19.

First Person

Two fewer testing days in New York? Thank goodness. Here’s what else our students need

PHOTO: Christina Veiga

Every April, I feel the tension in my fifth-grade classroom rise. Students are concerned that all of their hard work throughout the year will boil down to six intense days of testing — three for math and three for English language arts.

Students know they need to be prepared to sit in a room for anywhere from 90 minutes to three hours with no opportunity to leave, barring an emergency. Many of them are sick to their stomachs, feeling more stress than a 10-year-old ever should, and yet they are expected to perform their best.

Meanwhile, teachers are frustrated that so many hours of valuable instruction have been replaced by testing, and that the results won’t be available until students are moving on to other classrooms.

This is what testing looks like in New York state. Or, at least it did. Last month, state officials voted to reduce testing from three days for each subject to two, to the elation of students, parents, and teachers across New York. It’s an example of our voices being heard — but there is still more to be done to make the testing process truly useful, and less stressful, for all of us.

As a fifth-grade teacher in the Bronx, I was thrilled by the news that testing time would be reduced. Though it doesn’t seem like much on paper, having two fewer days of gut-wrenching stress for students as young as eight means so much for their well-being and education. It gives students two more days of classroom instruction, interactive lessons, and engagement in thought-provoking discussions. Any reduction in testing also means more time with my students, since administrators can pull teachers out of their classrooms for up to a week to score each test.

Still, I know these tests provide us with critical data about how students are doing across our state and where we need to concentrate our resources. The changes address my worries about over-testing, while still ensuring that we have an objective measure of what students have learned across the state.

For those who fear that cutting one-third of the required state testing hours will not provide teachers with enough data to help our students, understand that we assess them before, during, and after each unit of study, along with mid-year tests and quizzes. It is unlikely that one extra day of testing will offer any significant additional insights into our students’ skills.

Also, the fact that we receive students’ state test results months later, at the end of June, means that we are more likely to have a snapshot of where are students were, rather than where they currently are — when it’s too late for us to use the information to help them.

That’s where New York can still do better. Teachers need timely data to tailor their teaching to meet student needs. As New York develops its next generation of tests and academic standards, we must ensure that they are developmentally appropriate. And officials need to continue to emphasize that state tests alone cannot fully assess a student’s knowledge and skills.

For this, parents and teachers must continue to demand that their voices are heard. Until then, thank you, New York Regents, for hearing us and reducing the number of testing days.

In my classroom, I’ll have two extra days to help my special needs students work towards the goals laid out in their individualized education plans. I’ll take it.

Rich Johnson teaches fifth grade at P.S. 105 in the Bronx.