Testing Testing

Bye-bye bubble sheets: New Hampshire’s innovative approach to testing appeals to Indiana, other states

PHOTO: Shaina Cavazos
Students in an English-learner class at Southport High School work on an assignment during the last period of the day.

As Indiana awaits recommendations from a committee that’s trying to figure out what student exams will look like after 2017, one idea out of New Hampshire is capturing the attention of educators.

New Hampshire’s “performance tasks” are considered some of the most innovative standardized tests in the country, but they don’t look much like standardized tests at all.

The new pilot program in the Granite State — called Performance Assessment of Competency Education or PACE — moves away from the computerized testing and multiple-choice bubble sheets that have been the backbone of annual state exams for decades.

In their place, the PACE program asks kids in the eight pilot districts to do “performance tasks” throughout the school year to show deep understanding of the subjects they’re studying.

For example, while a traditional geometry exam might ask students to solve math problems and even require them to show how they calculated their answer, New Hampshire now asks them to complete complex problems applied to real-word situations that require a range of skills and knowledge they’ve been learning in class.

“We asked the kids to be a town planner, and as part of that planning board they are asked to design two towers that use solids,” said Lee Sheedy, a New Hampshire high school geometry teacher who’s been working on the new test questions since the pilot began in 2014. “One would be a simple solid and the other had to be a compound solid. They then write a proposal to the town recommending one of the towers.”

To complete the task, students must draw models, do calculations, analyze results and write a proposal all in one exercise, Sheedy said.

Students in the pilot districts take the Smarter Balanced exam — a more traditional standardized test that is used in more than a dozen states — in third-grade English, fourth-grade math and eighth-grade English and math. All high-school juniors take the SAT.

In the rest of the grades, students must complete performance tasks in math, English and science throughout the year according to where those tasks fall in the curriculum. Some of the tasks are “local,” which help districts measure student progress at certain points in the academic year, but others are “common” which can be compared across districts.

Once the tasks are completed, the classroom teachers grade them. “Common” tasks are scored and then validated by the state against predetermined sample answers.

For both common and local questions, teachers are trained for about two weeks over the course of the year by their peers to use the scoring guides to grade student answers. Then, for the common questions, teachers compare their scoring processes to those of teachers’ from other schools and districts to ensure they are accurate. Final scores are reported to the state for accountability purposes.

For the water tower problem, there were four possible scores a student could receive and three main areas where they needed to show work: models and scale drawings, calculations and mathematical strategy and communication, analysis and recommendation.

Kathleen Cotton, a curriculum and instruction coach in Sheedy’s district in Rochester, New Hampshire, said that although there is extra work involved on the front end, the performance tasks give teachers information they can use immediately.

“You look at some of this high-stakes testing that we have, and it really is not engaging at the time because the students don’t really have any buy-in except of that one score at the end,” Cotton said.

Throughout the pilot, Sheedy said his students have been more engaged than they were taking traditional exams. He’s never seen kids so focused as when they are working on the new types of tests.

“When you give students a real world problem, you allow them to be creative, you allow them to think critically,” Sheedy said. “They get incredibly motivated. If you walked into my room during PACE you could hear a pin drop.”

He’s also been impressed by how much by how much developing the tasks has helped him as a teacher.

“When you let teachers … get out of their classrooms and you look at student work and you talk about it, teachers become better teachers,” Sheedy said. “Their ability to instruct and assess, it increases exponentially. I have grown more as a teacher since I’ve been doing PACE than any other thing I’ve been doing in the classroom over the last 12 years.”

The teacher-led work in designing and learning to grade the tasks was significant. Teams of teachers worked on the questions themselves and the scoring guides to grade them.

The New Hampshire experiment is making ripples across the country as more and more states are looking for alternatives to traditional once-a-year testing methods.

States looking for new options are encouraged by changes to federal testing regulations that are expected next year when the No Child Left Behind Act is replaced by the Every Student Succeeds Act. The new law still requires every state to create an accountability system that measures annual student performance, but this law allows more flexibility. As many as seven states could be chosen to try new, innovative exams.

The work to completely change a testing system isn’t easy, and for larger states with more diverse student populations, varied funding across districts and stricter accountability systems, like Indiana, it’s not clear if this model would see the the same kind of success that it’s seen in New Hampshire.

It’s also not clear if Indiana education officials are going to even pursue an innovation pilot under ESSA, although state Superintendent Glenda Ritz and House Education Committee Chairman Bob Behning have expressed interest in New Hampshire’s model.

Many Indiana educators say they’re frustrated with years of ISTEP exams that have seen major delays, results that don’t do much to guide instruction and computer testing glitches. Some say they’re ready to try something new, and state officials agree.

For now, said Danielle Shockey, Indiana’s deputy state superintendent, the state will focus on its work with the new testing committee before it gets involved in a new federal initiative.

“There’s a lot left to be learned about that innovation pilot,” Shockey said.

previewing TNReady

Why Tennessee’s high school test scores, out this week, matter more — and less — than usual

PHOTO: Nic Garcia

When scores dropped last year for most Tennessee high school students under a new state test, leaders spoke of “setting a new baseline” under a harder assessment aligned to more rigorous standards.

This week, Tennesseans will see if last year’s scores — in which nearly three-quarters of high schoolers performed below grade level — was in fact just a reset moment.

Education Commissioner Candice McQueen has scheduled a press conference for Thursday morning to release the highly anticipated second year of high school scores under TNReady, which replaced the state’s TCAP tests in 2015-16. (Students in grades 3-8 will get TNReady scores for the first time this fall; last year, their tests were canceled because of a series of testing failures.)

Here’s what you need to know about this week’s data dump, which will focus on statewide scores.

1. Last year’s low scores weren’t a big surprise.

Not only was it the first time Tennessee students took TNReady, it also was the first time that they were being tested on new academic standards in math and language arts known as the Common Core, which reached Tennessee classrooms in 2012.

Other states that switched to Common Core-aligned exams also saw their scores plummet. In New York, for example, the proportion of students who scored proficient or higher in reading dropped precipitously in 2013 during the first year of a new test for grades 3-8.

McQueen sought last year to prepare Tennessee for the same experience. After all, she said, the state was moving away from a multiple-choice test to one that challenges students’ higher-order thinking skills. Plus, while Tennessee students had been posting strong scores on the state’s own exam, they had struggled on national tests such as the ACT, raising questions about whether the previous state test was a good measure of students’ skills.

“We expected scores to be lower in the first year of a more rigorous assessment,” McQueen said after only 21 percent of high school students scored on or above grade level in math, while 30 percent tested ready in English and reading.

2. It’s expected that this year’s scores will rise … and it will be a bad sign if they don’t.

Over and over, state officials assured Tennesseans that 2016 was just the start.

“[We] expect that scores will rebound over time as all students grow to meet these higher expectations — just as we have seen in the past,” McQueen said.

She was referring to the state’s shift to Diploma Standards in 2009, when passing rates on end-of-course tests dropped by almost half. But in subsequent years, those scores rose steadily in a “sawtooth pattern” that has been documented over and over when states adopt new assessments and students and teachers grow accustomed to them.

That includes New York, where after the worrisome results in 2013, the percentage of students passing started inching up the following year, especially in math.

In Tennessee, this year’s high school scores will provide the first significant data point in establishing whether the state is on the same track. Higher scores would put the state on an upward trajectory, and suggest that students are increasingly proficient in the skills that the test is measuring. Scores that remain flat or go down would raise questions about whether teachers and students are adjusting to more rigorous standards.

3. There’s lots more scores to come.

This week’s statewide high school scores will kick off a cascade of other TNReady results that will be released in the weeks and months ahead.

Next comes district- and school-level high school scores, which will be shared first with school systems before being released to the public. That’s likely to happen in August.

In the fall, Tennessee will release its scores for students in grades 3-8, who took TNReady for the first time this year after the 2016 testing debacle. While testing went better this year, the state’s new testing company needed extra time to score the exams, because additional work goes into setting “cut scores” each time a new test is given.

A group of educators just concluded the process of reviewing the test data to recommend what scores should fall into the state’s four new categories for measuring performance: below grade level, approaching grade level, on grade level, or mastered. The State Board of Education will review and vote on those recommendations next month.

4. This year’s scores are lower stakes than usual, but that probably won’t last.

For years, Tennessee has been a leader in using test scores to judge students, teachers, and schools. Like most states, it uses the data to determine which schools are so low-performing that they should be closed or otherwise overhauled. It also crunches scores through a complicated “value-added” algorithm designed to assess how much learning that teachers contribute to their students — an approach that it has mostly stuck with as value-added measures have fallen out of favor across the nation. And unusually, the state exam scores are also supposed to factor into final student grades, this year counting for 10 percent.

But the rocky road to the new tests has temporarily diminished how much the scores count. Because preliminary scores arrived late this spring, most districts opted to grade students on the basis of their schoolwork alone.

And because of the testing transition, the scores won’t be given as much weight in this year’s teacher evaluations — an adjustment that lawmakers made to alleviate anxiety about the changes. Test scores will contribute only 10 percent to teachers’ ratings. Depending on the subject, that proportion is supposed to rise to between 15 and 25 percent by 2018-19.

First Person

Two fewer testing days in New York? Thank goodness. Here’s what else our students need

PHOTO: Christina Veiga

Every April, I feel the tension in my fifth-grade classroom rise. Students are concerned that all of their hard work throughout the year will boil down to six intense days of testing — three for math and three for English language arts.

Students know they need to be prepared to sit in a room for anywhere from 90 minutes to three hours with no opportunity to leave, barring an emergency. Many of them are sick to their stomachs, feeling more stress than a 10-year-old ever should, and yet they are expected to perform their best.

Meanwhile, teachers are frustrated that so many hours of valuable instruction have been replaced by testing, and that the results won’t be available until students are moving on to other classrooms.

This is what testing looks like in New York state. Or, at least it did. Last month, state officials voted to reduce testing from three days for each subject to two, to the elation of students, parents, and teachers across New York. It’s an example of our voices being heard — but there is still more to be done to make the testing process truly useful, and less stressful, for all of us.

As a fifth-grade teacher in the Bronx, I was thrilled by the news that testing time would be reduced. Though it doesn’t seem like much on paper, having two fewer days of gut-wrenching stress for students as young as eight means so much for their well-being and education. It gives students two more days of classroom instruction, interactive lessons, and engagement in thought-provoking discussions. Any reduction in testing also means more time with my students, since administrators can pull teachers out of their classrooms for up to a week to score each test.

Still, I know these tests provide us with critical data about how students are doing across our state and where we need to concentrate our resources. The changes address my worries about over-testing, while still ensuring that we have an objective measure of what students have learned across the state.

For those who fear that cutting one-third of the required state testing hours will not provide teachers with enough data to help our students, understand that we assess them before, during, and after each unit of study, along with mid-year tests and quizzes. It is unlikely that one extra day of testing will offer any significant additional insights into our students’ skills.

Also, the fact that we receive students’ state test results months later, at the end of June, means that we are more likely to have a snapshot of where are students were, rather than where they currently are — when it’s too late for us to use the information to help them.

That’s where New York can still do better. Teachers need timely data to tailor their teaching to meet student needs. As New York develops its next generation of tests and academic standards, we must ensure that they are developmentally appropriate. And officials need to continue to emphasize that state tests alone cannot fully assess a student’s knowledge and skills.

For this, parents and teachers must continue to demand that their voices are heard. Until then, thank you, New York Regents, for hearing us and reducing the number of testing days.

In my classroom, I’ll have two extra days to help my special needs students work towards the goals laid out in their individualized education plans. I’ll take it.

Rich Johnson teaches fifth grade at P.S. 105 in the Bronx.