Test transitions

They rejected multi-state Common Core exams. Now what?

PHOTO: Shaina Cavazos
Indianapolis Public Schools picked a new model for teacher observations.

When it comes to standardized testing, Indiana has commitment issues.

In just the past five years, it joined a multi-state testing consortium, dumped that consortium, launched its own new test, and now, unhappy with the latest problem-plagued version, is searching for something even newer for 2018.

It’s a similar story in Tennessee, Michigan, and dozens of other states where the backlash against the Common Core led lawmakers to overrule state education officials who had invested years of time and resources in tests aligned with the new standards. The process of leaving consortia that was meant to pacify local protests against Common Core-aligned tests has actually led to chaos and confusion in the classroom, not to mention extra costs to those same states to develop replacements exams.

Read: Scrapping Indiana’s ISTEP test: What might come next and at what cost?

Of 44 states and the District of Columbia that were initially affiliated with one or both multi-state test consortia, just 21 will give the tests in 2016. Three states – including Indiana – have gone even farther and rejected the Common Core itself, choosing instead to adopt new state-specific standards.

But while politicians knew what they didn’t want — Republicans blasted the Common Core and its associated tests as federal overreach while Democrats and teachers unions worried about excessive testing — states such as Indiana are now scrambling to figure out what to do after abandoning those exams

The result is a problem-plagued testing season that’s left teachers, students and parents desperate for some stability and craving a chance to be heard.

“Nobody comes and talks to us,” said Robin Clark, a math teacher at Indianapolis’ Emmerich Manual High School. “We are constantly trying to implement whatever we are being told by people who are not fluent in the field.”

When the Common Core first surfaced in 2009 as a state-led effort to raise standards for schools across the country, the idea was largely free from controversy. Lawmakers on both sides of the aisle embraced the Common Core as a tool to create a single measuring stick for student learning across the country.

To make things easier and less expensive for states as they transitioned to the new standards, the federal government encouraged states to collaborate and develop tests together. Forty-four states plus the District of Columbia joined at least one — and sometimes both — testing consortia around 2010 that took up the challenge using $350 million in federal funds.

Indiana joined 25 other states including New York, Tennessee, and Colorado in affiliating with the PARCC Assessment System. Michigan, California and 28 other states went with the Smarter Balanced Assessment Consortium. Trial questions were given to students in the 35 states that became official consortia members, and the first actual tests were scheduled to be administered in 2015.

Then, the political winds shifted. As the Common Core became politically toxic, state legislatures across the country voted to pull out of the testing consortia. Some cited political pressure, others logistics, but in most cases, lawmakers wanted to telegraph to voters that they were opposed to the Common Core.

In one state after another, decisions that had been made by curriculum and testing experts working for state education departments were overturned.

“Politicians and ideologues continue driving this system without regard to its educational value,” said Robert Schaeffer, a spokesman for the National Center for Fair and Open Testing, a testing watchdog group.

Now, just seven states are giving the PARCC test in 2016. Smarter Balanced is down to 14 states.

But in their haste to pull away from something they didn’t want, many states were left without backup plans. They quickly assembled new testing programs that, all over the country, are now backfiring. Three or more years is generally what’s needed to design and vet a new test, but some states, including Indiana, are trying to do that work in two years or less.

In Tennessee, lawmakers pulled out of PARCC in 2014 for not being “Tennessee-specific.” They hurried to replace it with a new “TNReady” test that was plagued by glitches so severe that the state ultimately canceled testing for most students.

In Michigan, lawmakers voted to banish the Smarter Balanced exam just nine months before it was supposed to be administered. That led the state to essentially administer the Smarter Balanced test with a new name, M-STEP, in 2015, followed by different M-STEP in 2016 — a setup that basically means Michigan kids will take three different tests in three years, making year-to-year comparisons impossible. Meanwhile, experts say the Michigan test still bears a lot of similarities to Smarter Balanced, just as many new state-specific standards share much of their content with Common Core. Like Indiana, Michigan is now considering changing its test yet again — an effort led by the state superintendent.

In Indiana, lawmakers who fled both PARCC and the Common Core itself, adopting Indiana-specific standards, continued working with test company CTB McGraw-Hill to quickly update the state’s decades-old ISTEP exam. The hasty revamp was assailed for technical and scoring flaws and was so universally disliked that lawmakers this year voted to scrap ISTEP entirely, pointing to problems with its administration and computer platform as culprits.

Although leaving consortia allowed some states to circumvent political backlash, it didn’t prevent them from realizing consequences to state coffers and in the classroom.

The PARCC test, administered by Pearson, costs the states that use it about $24 per student, according to the consortium’s website. If Indiana were still in the consortium, taxpayers would have spent about $12 million last year for the 500,000 students who were tested. Instead, the state paid about $24 million to CTB for its state-specific exam and is contracted with Pearson for about $32 million this year and next year.

The Brown Center on Education Policy, at the Brookings Institution based in Washington, D.C., reported in 2012 that states spent, on average, $27 per student on tests for state accountability. It’s not clear, the report said, just how large savings would be for consortia states, but the scale of the PARCC and Smarter Balanced tests was intended to reduce costs for states at a time when federal requirements were pressuring states to update their exams in response to new more rigorous standards.

For the second time in two years, Indiana lawmakers will have to scramble again to replace the state’s testing program with another new exam in 2018. Officials don’t yet know what a future contract would include or who would administer it.

The situation is different in every state, but in most cases, states walked away from the years of planning that had gone into the multi-state exams for reasons that had nothing to do with the tests themselves.

“A lot of this has to do with sort of perception wars around the assessments and less to do with the practicality of choosing the best assessments for our kids,” said Abigail Swisher, an education policy analyst with the New America Foundation.

The turmoil has many teachers on edge.

“I work with students who have test anxiety normally, and so if you just compound that by 10, that’s kind of what we’re facing,” said Megan Parker, a third- and fourth-grade special education teacher at Tindley Renaissance Academy, a charter school in Indianapolis. “It’s all about not letting anyone know that you’re freaking out about when the state is doing something to a test that you’re getting ready to give for the first time.”

Indiana largely protected teachers and schools from serious consequences from the dramatic drop in test scores created when the state switched to a new tougher ISTEP in 2015, but no such protections are expected for this year when the state gives largely the same test under a new vendor, British-based Pearson. As lawmakers contemplate yet another new test in 2018, they’ve made no promises to protect teachers or schools from the coming changes.

That means, if students struggle on future exams, their teachers could lose their bonuses. If their schools score poor enough to get four consecutive F-grades from the state, the schools could face state takeovers.

And while teachers are facing stiffer consequences from the exams, they say the constantly changing testing landscape means the tests don’t offer them much value. A different test every year means it’s hard for teachers to track student progress from one year to the next, and it also make it hard for teachers to help their students prepare. When it comes to state accountability, ever-changing exams can prevent policymakers from comparing how students and schools perform one year to the next.

Beth Shaffer-Scott, a veteran Indianapolis Public Schools teacher at School 70, longs for stability.

“It’s just like anything,” Shaffer-Scott said. “You’ve got to stick with it long enough to see whether or not it works.”

If states think there’s an easy way forward after leaving the multi-state testing consortia, they might have another thing coming.

An expectation for quick test turnaround in a year or less could be a dangerous move, even for states like Indiana that own their own test questions — one of the most time-consuming and expensive parts of the test creation process.

Realistically, if Indiana, or any state, wanted to make a big change quickly, it should consider using a national test or shared test questions for the time being while it takes steps to build its own question pools and test programs, said Ed Roeber, a former Michigan test director who’s worked with Indiana. That way, questions can be properly vetted, and states have the benefit of time to really figure out the best test solution.

“That would be the kind of option that I think would more politically viable,” Roeber said. “Eventually, the test is all Indiana.”

But whether that will be an appealing option in states where lawmakers are on high alert for signs that the Common Core could be creeping back in is not clear.

In Indiana, a committee of educators, lawmakers and policymakers plans to consider several options for the state in the next few months.

But the capacity to build entirely new tests might be more challenging than lawmakers expect. Even if states have the autonomy to make big changes, they might not want to spend the money or take the time that’s typically needed.

That timeline is especially important if a state wants to create a quality test that will last, Roeber said. He views Indiana’s struggle with ISTEP as less a problem with the actual test, and more a problem with last-minute decisions from lawmakers.

“The (Indiana Department of Education) did the best they could under the circumstances, but those are pretty severe circumstances,” Roeber said. “You have to shortcut a lot of stuff if you have to do it in under a year… they were working under an incredibly unrealistic deadline to get this thing done.”

Consistency is key to a testing system that can reliably measure student performance and cut down on disruptions for schools. And there’s no fast way to create a customized, in-depth test that measures challenging state standards if time and resources aren’t dedicated to it, Roeber said.

If lawmakers want test results that can be compared from one year to the next, they’re going to need to pick one test and stick with it for more than a year or two.

Teachers say they hope they’ll see some consistency before their bonuses and school accountability grades start to suffer.

“Trying to accommodate a new test without having a lot of knowledge about it is a little disconcerting when your school grade is tied to it and your (pay) increases are tied to it,” said Clark, the Manual High School math teacher. “How many times can you change to ultimately deliver the same result?”

First Person

Two fewer testing days in New York? Thank goodness. Here’s what else our students need

PHOTO: Christina Veiga

Every April, I feel the tension in my fifth-grade classroom rise. Students are concerned that all of their hard work throughout the year will boil down to six intense days of testing — three for math and three for English language arts.

Students know they need to be prepared to sit in a room for anywhere from 90 minutes to three hours with no opportunity to leave, barring an emergency. Many of them are sick to their stomachs, feeling more stress than a 10-year-old ever should, and yet they are expected to perform their best.

Meanwhile, teachers are frustrated that so many hours of valuable instruction have been replaced by testing, and that the results won’t be available until students are moving on to other classrooms.

This is what testing looks like in New York state. Or, at least it did. Last month, state officials voted to reduce testing from three days for each subject to two, to the elation of students, parents, and teachers across New York. It’s an example of our voices being heard — but there is still more to be done to make the testing process truly useful, and less stressful, for all of us.

As a fifth-grade teacher in the Bronx, I was thrilled by the news that testing time would be reduced. Though it doesn’t seem like much on paper, having two fewer days of gut-wrenching stress for students as young as eight means so much for their well-being and education. It gives students two more days of classroom instruction, interactive lessons, and engagement in thought-provoking discussions. Any reduction in testing also means more time with my students, since administrators can pull teachers out of their classrooms for up to a week to score each test.

Still, I know these tests provide us with critical data about how students are doing across our state and where we need to concentrate our resources. The changes address my worries about over-testing, while still ensuring that we have an objective measure of what students have learned across the state.

For those who fear that cutting one-third of the required state testing hours will not provide teachers with enough data to help our students, understand that we assess them before, during, and after each unit of study, along with mid-year tests and quizzes. It is unlikely that one extra day of testing will offer any significant additional insights into our students’ skills.

Also, the fact that we receive students’ state test results months later, at the end of June, means that we are more likely to have a snapshot of where are students were, rather than where they currently are — when it’s too late for us to use the information to help them.

That’s where New York can still do better. Teachers need timely data to tailor their teaching to meet student needs. As New York develops its next generation of tests and academic standards, we must ensure that they are developmentally appropriate. And officials need to continue to emphasize that state tests alone cannot fully assess a student’s knowledge and skills.

For this, parents and teachers must continue to demand that their voices are heard. Until then, thank you, New York Regents, for hearing us and reducing the number of testing days.

In my classroom, I’ll have two extra days to help my special needs students work towards the goals laid out in their individualized education plans. I’ll take it.

Rich Johnson teaches fifth grade at P.S. 105 in the Bronx.

a failure of accountability

High-stakes testing may push struggling teachers to younger grades, hurting students

PHOTO: Justin Weiner

Kindergarten, first grade, and second grade are often free of the high-stakes testing common in later grades — but those years are still high-stakes for students’ learning and development.

That means it’s a big problem when schools encourage their least effective teachers to work with their youngest students. And a new study says that the pressure of school accountability systems may be encouraging exactly that.

“Evidence on the importance of early-grades learning for later life outcomes suggests that a system that pushes schools to concentrate ineffective teachers in the earliest grades could have serious unintended consequences,” write study authors Jason Grissom of Vanderbilt and Demetra Kalogrides and Susanna Loeb of Stanford.

The research comes at an opportune time. All 50 states are in the middle of crafting new systems designed to hold schools accountable for student learning. And this is just the latest study to point out just how much those systems matter — for good and for ill.

The study, published earlier this month in the peer-reviewed American Educational Research Journal, focuses on Miami-Dade County schools, the fourth-largest district in the country, from 2003 to 2014. Florida had strict accountability rules during that period, including performance-based letter grades for schools. (Those policies have been promoted as a national model by former Florida Governor Jeb Bush and his national education reform outfit, where Education Secretary Betsy DeVos previously served on the board.)

The trio of researchers hypothesized that because Florida focuses on the performance of students in certain grades and subjects — generally third through 10th grade math and English — less-effective teachers would get shunted to other assignments, like early elementary grades or social studies.

That’s exactly what they found.

In particular, elementary teachers effective at raising test scores tended to end up teaching grades 3-6, while lower-performing ones moved toward early grades.

While that may have helped schools look better, it didn’t help students. Indeed, the study finds that being assigned a teacher in early elementary school who switched from a higher grade led to reduced academic achievement, effects that persisted through at least third grade.

The impact was modest in size, akin to being assigned a novice teacher as opposed to a more experienced one.

The study is limited in that it focuses on just a single district, albeit a very large one — a point the authors acknowledge. Still, the results are consistent with past research in North Carolina and Florida as a whole, and district leaders elsewhere have acknowledged responding to test pressure in the same way.

“There was once upon a time that, when the test was only grades 3 through 12, we put the least effective teachers in K-2,” schools chief Sharon Griffin of Shelby County schools in Memphis said earlier this year. “We can’t do that anymore. We’re killing third grade and then we have students who get in third grade whose challenges are so great, they never ever catch up.”

While the Florida study can’t definitively link the migration of teachers to the state’s accountability system, evidence suggests that it was a contributing factor.

For one, the pattern is more pronounced in F-rated schools, which face the greatest pressure to raise test scores. The pattern is also stronger when principals have more control over staffing decisions — consistent with the idea that school leaders are moving teachers around with accountability systems in mind.

Previous research of policies like No Child Left Behind that threaten to sanction schools with low test scores have found both benefits and downsides. On the positive side, accountability can lead to higher achievement on low-stakes exams and improved instruction; studies of Florida’s system, in particular, have found a number of positive effects. On the negative side, high-stakes testing has caused cheating, teaching to the test, and suspensions of students unlikely to test well.

So how can districts avoid the unintended consequences for young students documented by the Miami-Dade study?

One idea is to emphasize student proficiency in third grade, a proxy for how well schools have taught kids in kindergarten, first and second grades.

Scholars generally say that focusing on progress from year to year is a better gauge of school effectiveness than student proficiency. But a heavily growth-based system could actually give schools an incentive to lower student achievement in early grades.

“These results do make an argument for weighting [proficiency] in those early tests to essentially guard against totally ignoring those early grades,” said Grissom, who also noted that states could make more efforts to directly measure performance of the youngest students.

But Morgan Polikoff, an associate professor at the University of Southern California, was more skeptical of this approach.

“It’s not as if states are going to add grades K-2 testing, so schools and districts will always have this incentive (or think they do),” he told Chalkbeat in an email. “I think measurement is always going to be an issue in those early grades.”