Speaking Out

Testmaker: What went wrong with TNReady

The head of the company that created TNReady accepts blame for this year’s botched rollout of Tennessee’s new standardized online assessment, but says the subsequent delays in delivering printed testing materials were unavoidable.

Measurement Inc. president and founder Henry Scherich says Tennessee Education Commissioner Candice McQueen’s decision to scrap the online assessment on the first day of testing in February set in motion a chain of logistical quagmires that were impossible to overcome.

Once McQueen ordered districts to switch back to paper tests, his company found the sudden task of printing and delivering up to 5 million documents this spring overwhelming, if not impossible.

Henry Scherich
Henry Scherich

“I understand the frustration of superintendents and the state department,” Scherich said. “Having said all of that, this was a huge job that we took on and there’s been no testing company in the country — in the world, probably — who has taken on the task of printing and shipping this many tests in this short a period of time, and we really struggled with it.”

Last week, the Tennessee Department of Education informed district leaders that many of the testing materials wouldn’t arrive in time for the opening of this week’s final TNReady testing window — the latest in a series of delivery delays that has wreaked havoc in districts and classrooms across the state. State leaders placed the blame squarely on Measurement Inc.

In an interview this week with Chalkbeat, Scherich acknowledged that developing and delivering TNReady in a new online platform was the biggest job that his 36-year-old Durham, N.C.-based company has ever undertaken — perhaps too big given the one-year deadline.

He offered a behind-the-scenes look at the snafus and challenges. At the same time, he insisted that TNReady is a strong test and — once its delivery platform is fixed — the assessment can can help Tennessee reach its accountability goals.

Measurement Inc. won the bid to create Tennessee’s test for grades 3-11 math and English language arts in October of 2014, only months after a vote by the Tennessee legislature prompted the Department of Education to pull out of PARCC, a consortium of other states with a shared Common Core-aligned assessment. The company would have a year to develop a test for Tennessee. A small number of high school students on block schedules would take the test in the fall of 2015, with the bulk of students in grades 3-11 taking it the following spring.

TNReady marked an unprecedented shift for Tennessee and, like PARCC, was supposed to be online and aligned with the current Common Core State Standards.

"It was a failure in some respects because we were supposed to design a system that would take 100,000 students in at one time."Henry Scherich

It was also an unprecedented task for Measurement Inc., which had never before developed and delivered a state’s entire online testing program.

But on Feb. 8, the very first day of statewide online testing, the test buckled as more and more students logged on. Even so, leaders of Measurement Inc. were surprised when McQueen quickly pulled the plug on the online assessment, and announced that the state would switch to paper-and-pencil versions.

Here’s what happened, according to Scherich:

Online ‘crash’

Scherich says that, first of all, the system never “crashed” on the first day. Students’ screens never went blank. Instead, he calls what happened “infrastructure saturation.” As more and more students logged on, their cursors began to spin, signaling that the test was taking longer to load than it should have.

What was the problem? Ultimately, Scherich says, there weren’t enough servers for the volume of students online, causing the system to clog up as more and more students logged in. He declined to speculate on how long it would have taken to fix the problem and add more primary servers, but said that it would have been possible to get back on track.

“We could have duplicated the system,” he said. “We would have said to half of the state, you work on these 64 servers and the other half work on another set.”

About 48,000 students logged on that day, and about 18,000 submitted assessments. It’s unknown the number of students who weren’t having troubles with the test, but stopped after McQueen sent an email instructing districts to halt testing.

“It was a failure in some respects because we were supposed to design a system that would take 100,000 students in at one time… We had a problem with 48,000,” Scherich said.

Printing delays

Scherich says the subsequent delays come down to this: There were a lot of tests to be printed, and not a lot of printers available on short notice. Overall, the switch to printing meant Measurement Inc. had to scramble to print answer sheets and test booklets for grades 3-11 amounting to 5 million documents — when only weeks before, they hadn’t planned on printing any.

"There’s been no testing company in the country — in the world, probably — who has taken on the task of printing and shipping this many tests in this short a period of time ..."

Measurement Inc. worked with the Department of Education to transfer different versions of the tests from computer to paper. Each test had several versions with different field test items embedded within.

“You can’t just push the button on the computer and have the test be printed out,” he said. “The formatting is all different.”

In the meantime, Measurement Inc. sought out printers who were able to fulfill the large order quickly. Through 36 years in testing, the company had a lot of connections, but only three printing plant operators said they were up to the task. Eventually, two backed out, leaving Measurement Inc. with one: RR Donnelley based in Chicago.

“It’s a large printing company, and they had plants all over the U.S. They printed one or more of the tests or the answer documents in 11 different printing plants around the country,” he said. “So we were getting tests from Minnesota, Missouri. They ran a lot of night shifts to do that for us.”

Once the tests and answer sheets arrived at Measurement Inc., they had to be sorted and distributed to schools. That’s 5 million tests, spread across nearly 1,000 schools.

The last documents arrived from the printer last Saturday, and Measurement Inc. is rushing to get them out in the next two to three days, Scherich said.

Tight timeline

Measurement Inc. had about a year to develop the test and roll out an online system for the entire state. In comparison, PARCC, the online assessment that Tennessee originally was slated to use, was developed in about five years.

Though Measurement Inc. had been working on its online platform for six years and used it previously in other states, including Tennessee for its writing test, the company had never before undertaken a state’s entire testing program.

Measurement Inc. not only developed the TNReady tests for math and English language arts, but also put the content for science and social studies on its online platform, known as MIST.

He said a lot was done right in developing TNReady, including the recruitment of 400 Tennessee teachers to help write test questions designed to measure critical thinking skills.

“I think that our staff and the state of Tennessee staff did an excellent job in building an assessment,” he said. “The math test is a good test. (English language arts) is a good test. Tennessee has a good catalog, a good library of test items for the future.”

measuring up

After criticism, Denver will change the way it rates elementary schools

PHOTO: Denver Post file
Eva Severance, a first-grader, concentrates on a reading lesson at Lincoln Elementary in Denver.

Facing criticism that its school ratings overstated young students’ reading abilities, the Denver school district announced it will change the way elementary schools are rated next year.

The district will increase the number of students in kindergarten, first, second, and third grade who must score at grade-level on early literacy tests for a school to earn points on the district’s rating scale, and decrease how many points those scores will be worth, officials said.

The changes will lessen the impact of early literacy scores on a school’s overall rating, while also raising the bar on how many students must ace the tests for a school to be considered good. Denver rates schools on a color-coded scale from blue (the highest) to red (the lowest).

“We want to see more students making more progress,” Superintendent Tom Boasberg said.

Local civil rights groups, elected officials, educators, and education advocates criticized Denver Public Schools this year for misleading students and families with what they characterized as inflated school ratings based partly on overstated early literacy gains.

“At a time when this country is at war on truth, we have an obligation to Denver families to give them a true picture of their schools’ performance,” state Sen. Angela Williams, a Denver Democrat, told Boasberg and the school board at a meeting in December.

The groups had asked the district to revise this year’s ratings, which were issued in October. Boasberg refused, saying, “If you’re going to change the rules of the game, it’s certainly advisable to change them before the game starts.” That’s what the district is doing for next year.

The state requires students in kindergarten through third grade to take the early literacy tests as a way to identify for extra help students who are struggling the most to learn to read. Research shows third graders who don’t read proficiently are four times as likely to fail out of high school. In Denver, most schools administer an early literacy test called iStation.

The state also requires students in third through ninth grade to take a literacy test called PARCC, which is more rigorous. Third-graders are the only students who take both tests.

The issue is that many third-graders who scored well on iStation did not score well on PARCC. At Castro Elementary in southwest Denver, for example, 73 percent of third-graders scored at grade-level or above on iStation, but just 17 percent did on PARCC.

Denver’s school ratings system, called the School Performance Framework, or SPF, has always relied heavily on state test scores. But this year, the weight given to the early literacy scores increased from 10 percent to 34 percent of the overall rating because the district added points for how well certain groups, such as students from low-income families, did on the tests.

That added weight, plus the discrepancy between how third-graders scored on PARCC and how they scored on iStation, raised concerns about the validity of the ratings.

At a school board work session earlier this week, Boasberg called those concerns “understandable.” He laid out the district’s two-pronged approach to addressing them, noting that the changes planned for next year are a stop-gap measure until the district can make a more significant change in 2019 that will hopefully minimize the discrepancy between the tests.

Next year, the district will increase the percentage of students who must score at grade-level on the early literacy tests. Currently, fewer than half of an elementary school’s students must score that way for a school to earn points, said Deputy Superintendent Susana Cordova. The district hasn’t yet settled on what the number will be for next year, but it will likely be more than 70 percent, she said. The more points a school earns, the higher its color rating.

The district will also reduce the impact the early literacy test scores have on the ratings by cutting in half the number of points schools can earn related to the tests, Cordova said. This makes the stakes a little lower, even as the district sets a higher bar.

The number of points will go back up in 2019 when the district makes a more significant change, officials said. The change has to do with how the tests are scored.

For the past several years, the district has used the “cut points” set by the test vendors to determine which students are reading at grade-level and which are not. But the discrepancy between the third-grade iStation and PARCC reading scores – and the public outcry it sparked – has caused officials to conclude the vendor cut points are too low.

District officials said they have asked the vendors and the state education department to raise the cut points. But even if they agree, that isn’t a simple or quick fix. In the meantime, the district has developed a set of targets it calls “aimlines” that show how high a student must score on the early literacy tests to be on track to score at grade-level on PARCC, which district officials consider the gold standard measure of what students should know.

The aimlines are essentially higher expectations. A student could be judged to be reading at grade-level according to iStation but considered off-track according to the aimlines.

In 2019, the district will use those aimlines instead of the vendor cut points for the purpose of rating schools. Part of the reason the district is waiting until 2019 is to gather another year of test score data to make sure the aimlines are truly predictive, officials said.

However, the district is encouraging schools to start looking at the aimlines this year. It is also telling families how their students are doing when measured against them. Schools sent letters home to families this past week, a step district critics previously said was a good start.

Van Schoales, CEO of the advocacy group A Plus Colorado, has been among the most persistent critics of this year’s elementary school ratings. He said he’s thrilled the district listened to community concerns and is making changes for next year, though he said it still has work to do to make the ratings easier to understand and more helpful to families.

“We know it’s complicated,” he said. “There is no perfect SPF. We just think we can get to a more perfect SPF with conversations between the district and community folks.”

The district announced other changes to the School Performance Framework next year that will affect all schools, not just elementary schools. They include:

  • Not rating schools on measures for which there is only one year of data available.

Denver’s ratings have always been based on two years of data: for instance, how many students of color met expectations on state math tests in 2016 and how many met expectations in 2017.

But if a school doesn’t have data for one of those years, it will no longer be rated on that measure. One way that could happen is if a school has 20 students of color one year but only 12 the next. Schools must have at least 16 students in a category for their scores to count.

The goal, officials said, is to be more fair and accurate. Some schools complained that judging them based on just one year of data wasn’t fully capturing their performance or progress.

  • Applying the “academic gaps indicator” to all schools without exception.

This year, the district applied a new rule that schools with big gaps between less privileged and more privileged students couldn’t earn its two highest color ratings, blue and green. Schools had to be blue or green on a new “academic gaps indicator” to be blue or green overall.

But district officials made an exception for three schools where nearly all students were from low-income families, reasoning it was difficult to measure gaps when there were so few wealthier students. However, Boasberg said that after soliciting feedback from educators, parents, and advocates, “the overwhelming sentiment was that it should apply to all schools,” in part because it was difficult to find a “natural demographic break point” for exceptions.

Contract review

Here’s what a deeper probe of grade changing at Memphis schools will cost

PHOTO: Marta W. Aldrich
The board of education for Shelby County Schools is reviewing another contract with a Memphis firm hired last year to look into allegations of grade tampering at Trezevant High School. Board members will discuss the new contract Feb. 20 and vote on it Feb. 27.

A proposed contract with the accounting firm hired to examine Memphis schools with high instances of grade changes contains new details on the scope of the investigation already underway in Shelby County Schools.

The school board is reviewing a $145,000 contract with Dixon Hughes Goodman, the Memphis firm that last year identified nine high schools as having 199 or more grade changes between July 2012 and October 2016. Seven of those are part of the deeper probe, since two others are now outside of the Memphis district’s control.

The investigation includes:

  • Interviewing teachers and administrators;
  • Comparing paper grade books to electronic ones and accompanying grade change forms;
  • Inspecting policies and procedures for how school employees track and submit grades

In December, the firm recommended “further investigation” into schools with high instances of grade changes. At that time, Superintendent Dorsey Hopson emphasized that not all changes of grades from failing to passing are malicious, but said the district needs to ensure that any changes are proper.

Based on the firm’s hourly rate, a deeper probe could take from 300 to 900 hours. The initial review lasted four months before the firm submitted its report to Shelby County Schools.

The school board is scheduled to vote on the contract Feb. 27.

You can read the full agreement below: