To our readers

Hey, we heard you. You had a lot of questions about TNReady. We found answers.

The news that Tennessee’s testing company scored some high school tests incorrectly this year uncorked a flood of questions about the validity of the state’s new standardized assessment.


Here are five things to know about the latest brouhaha over TNReady


We wanted to know how the brouhaha was impacting classrooms, so we asked our readers on Facebook.

You responded in droves.

We took your top concerns directly to the state Department of Education and asked for answers. Here’s what you wanted to know — and what we have learned:

Several readers asked why they should trust TNReady results, given the series of setbacks in the test’s first two years.

  • “I do not trust the results. We have had so many problems in the last few years, that I am suspicious of any results we do get. It bothers me greatly that the state uses these numbers to hold students and teachers and districts accountable, but they seem to be unable to deliver scores they believe are accurate in a timely manner.” —Rebecca Dickenson
  • “I no longer trust the accountability of the state nor its methods. My concern is if there is a teacher who has only one year of test data, how is it the same teacher shown multi-year growth when he or she had only last year of testing? This poses a huge concern.” —Mildred Williams  

Tennessee Department of Education: “TNReady is fully aligned to Tennessee’s academic standards, and every question has been reviewed, edited, and approved by Tennessee teachers through a rigorous review process. We also have quantitative checks and processes after a test is over to ensure student responses are reliable. While more than 99.9% of TNReady tests were scored accurately this year, we want to improve on that next year, and our vendor (Questar) is taking new quality assurance steps to make sure their programming is error-free. Also, this year, as soon as the scoring error on some of the English I, II and Integrated Math II EOCs was identified, scores were updated and all TNReady tests were re-reviewed and verified for full accuracy.”

Some teachers told us that, given the delay in score deliveries this spring, many students don’t think the results will arrive in time to affect their final grades next spring. Those teachers are struggling to get their students to buy in.

  • “After two years of TNReady, it still hasn’t counted for my students. Going into year three, I will once again tell them with a hopeful, straight face that it will count as part of their report card grades and implore them to try their best. I quietly wonder what reason they have to believe me, given recent history.” —Mike Stein
  • “I struggle to get students to buy in to the importance of trying their best on state tests because the students are confident that the scores won’t come back in time to affect their grades (which has been the situation for several years now). The students see zero incentive for doing well.” —Nicole Mayfield

TDOE: “We believe that if districts and schools set the tone that performing your best on TNReady is important, then students will take the test seriously, regardless of whether TNReady factors into their grade. We should be able to expect our students will try and do their best at any academic exercise, whether or not it is graded. This is a value that is established through local communication from educators and leaders, and it will always be key to our test administration. We believe that when we share these messages and values celebrating the variety of accomplishments our students have made, taking advantage of TNReady’s scheduling flexibility to minimize disruption, focusing on strong standards-based instruction every day, sending positive messages around the importance of the variety of tests that students take, and sharing that students should always do their best then students will buy-in and TNReady will be successful.”

Other teachers asked what happens to writing scores for tests in English language arts.

  • “I can tell you that two years ago — when we first piloted the new writing test online — districts received not only every student’s scores (broken down by each of the four indicators) but also the actual student responses to each prompt. In my former district our supervisor shared them, and we analyzed them as a department. If you check with your principal, VP, or supervisors, there are some published “anchor papers” with scores available on edtools from this past year. It’s not a lot, but it’s more than we’ve had in the past. My hope is that if online continues, we’ll keep seeing the student responses in the future.” —Wj Gillespie II

TDOE: “The question appears to be referencing the process we had through the 2014-15 school year, when our writing assessment was separate. Since 2015-16, students’ writing responses on TNReady have been incorporated as part of their overall ELA score. Responses are scored based on our writing rubrics, and for educators, we have provided access to the “anchor papers” from the 2016-17 year, so they can see how students’ responses were scored based on the writing rubric, which can help them inform the feedback they give their students.”

On that same issue of writing scores, one teacher referenced the hiring of scorers off of Craigslist. We asked the state if that’s true.

  • “I continue to be curious about our ELA writing scores. Each year we are required to use state writing rubrics, attend PD related to the state’s four types of writing, etc etc…and yet our scores never come back. Students spend hours taking the writing portion of the test, scorers are hired off Craig’s list…, and yet we never actually get the scores back. It seems like every year this is swept under the rug. Where do these writing tests go?” —Elizabeth Faison Clifton

TDOE: “Questar does not use Craigslist. Several years ago, another assessment company supposedly posted advertisements on Craigslist, but Questar does not. We provide opportunities for our educators to be involved in developing our test, and we also encourage Tennessee teachers to apply to hand-score TNReady. To be eligible, each applicant must provide proof of a four-year college degree, and preference is given to classroom teachers. As part of the interview process, an applicant would have to hand-score several items for review and evaluation. Once hired, each scorer is trained based on materials that Tennessee teachers and the department approve — and which are assembled from responses given by Tennessee students on the exam — and scorers are regularly refreshed and “recalibrated” on scoring guidelines. Each writing response is scored at least twice; if those two responses differ significantly, they are sent to a third scorer. Each day, the department reads behind a sample of essays to ensure hand-scorers are adhering to the criteria set by our teachers. Any scores that do not align are thrown out, and those scorers are retrained. Any scorer who does not meet our and Questar’s standards is released from scoring TNReady.”

Finally, readers expressed a lot of concern about the complexity behind growth scores known as TVAAS, which are based on TNReady results and which go into teachers’ evaluations. We asked the state for a simple explanation.

  • “What formula is used in calculating the overall score for TVAAS when fallacies were determined as a result? My performance is weighed heavily on the state TVAAS score which is why this type of error has occurred before. This is quite disturbing. Teachers work tirelessly to ensure student achievement is a success; however, testing to measure performance seems to not be working.” —Mildred Williams  
  • “No one can give me the formula for how my students’ scores are calculated to create my score in TVAAS. How is (t)hat transparency? Yet, I’m required, constantly, to “prove” myself with documentation of education, observations, professional development and the like; all in originals, of course, to numerous overseeing bodies.” —Rachel Bernstein Kannady
  • “I find it ludicrous that data from these tests are used to evaluate MY performance when I get little to no control over most of the variables regarding the test. How could a miscalculated, misinformed, and (for all I know) incomprehensible test demonstrate what my students have learned!? And don’t even get me started on that fact that the rigor of the tests was increased ten-fold, yet not scaffolded in.” —Nicole Mayfield

TDOE: “TVAAS is statistically valid and reliable, and we follow the recommendations outlined by the American Educational Research Association (AERA) on value-added measures. Conceptually, TVAAS looks at how students have performed historically on TCAP and TNReady and compares their performance to their peers who have had similar past performance. If students tended to grow at about the same rate as their peers across the state — the expected amount of growth — they would earn a 3. If students tended to grow faster than their peers, they would earn a 4 or a 5, depending on the amount of progress they showed. If they tended to not show as much growth as their peers, they would earn a 1 or a 2. The model itself is sophisticated and complex to be as fair and nuanced as possible for each teacher’s situation, and we are working with our educator preparation providers as well as district leaders to provide more training on specifically how the model calculates scores. Tennessee educators also have access to a TVAAS user support team that can answer any specific questions about their TVAAS data, including how the data was analyzed.

Because TVAAS always looks at relative growth from year to year, not absolute test scores, it can be stable through transitions — and that is what we saw this year. Students can still grow, even if their overall proficiency level is now different. You can think about it like a running race. If you used to finish a 5K at about the same time as 10 other students, and all 10 students made the same shift to a new race at the same time with the same amount of time to prepare, you should finish the new race at about the same time. If you finished ahead of the group’s average time, you grew faster than your peers. If you lagged behind everyone, that would indicate you did not grow as much as was expected.  Because students’ performance will be compared to the performance of their peers and because their peers are making the transition at the same time, drops in statewide proficiency rates resulting from increased rigor of the new assessments had no impact on the ability of teachers, schools, and districts to earn strong TVAAS scores. Transitions to higher standards and expectations do not change the fact that we still want all students in a district to make a full year’s worth of growth, relative to their peers who are all experiencing the same transition.”

Reporter Laura Faith Kebede contributed to this report.

measuring up

After criticism, Denver will change the way it rates elementary schools

PHOTO: Denver Post file
Eva Severance, a first-grader, concentrates on a reading lesson at Lincoln Elementary in Denver.

Facing criticism that its school ratings overstated young students’ reading abilities, the Denver school district announced it will change the way elementary schools are rated next year.

The district will increase the number of students in kindergarten, first, second, and third grade who must score at grade-level on early literacy tests for a school to earn points on the district’s rating scale, and decrease how many points those scores will be worth, officials said.

The changes will lessen the impact of early literacy scores on a school’s overall rating, while also raising the bar on how many students must ace the tests for a school to be considered good. Denver rates schools on a color-coded scale from blue (the highest) to red (the lowest).

“We want to see more students making more progress,” Superintendent Tom Boasberg said.

Local civil rights groups, elected officials, educators, and education advocates criticized Denver Public Schools this year for misleading students and families with what they characterized as inflated school ratings based partly on overstated early literacy gains.

“At a time when this country is at war on truth, we have an obligation to Denver families to give them a true picture of their schools’ performance,” state Sen. Angela Williams, a Denver Democrat, told Boasberg and the school board at a meeting in December.

The groups had asked the district to revise this year’s ratings, which were issued in October. Boasberg refused, saying, “If you’re going to change the rules of the game, it’s certainly advisable to change them before the game starts.” That’s what the district is doing for next year.

The state requires students in kindergarten through third grade to take the early literacy tests as a way to identify for extra help students who are struggling the most to learn to read. Research shows third graders who don’t read proficiently are four times as likely to fail out of high school. In Denver, most schools administer an early literacy test called iStation.

The state also requires students in third through ninth grade to take a literacy test called PARCC, which is more rigorous. Third-graders are the only students who take both tests.

The issue is that many third-graders who scored well on iStation did not score well on PARCC. At Castro Elementary in southwest Denver, for example, 73 percent of third-graders scored at grade-level or above on iStation, but just 17 percent did on PARCC.

Denver’s school ratings system, called the School Performance Framework, or SPF, has always relied heavily on state test scores. But this year, the weight given to the early literacy scores increased from 10 percent to 34 percent of the overall rating because the district added points for how well certain groups, such as students from low-income families, did on the tests.

That added weight, plus the discrepancy between how third-graders scored on PARCC and how they scored on iStation, raised concerns about the validity of the ratings.

At a school board work session earlier this week, Boasberg called those concerns “understandable.” He laid out the district’s two-pronged approach to addressing them, noting that the changes planned for next year are a stop-gap measure until the district can make a more significant change in 2019 that will hopefully minimize the discrepancy between the tests.

Next year, the district will increase the percentage of students who must score at grade-level on the early literacy tests. Currently, fewer than half of an elementary school’s students must score that way for a school to earn points, said Deputy Superintendent Susana Cordova. The district hasn’t yet settled on what the number will be for next year, but it will likely be more than 70 percent, she said. The more points a school earns, the higher its color rating.

The district will also reduce the impact the early literacy test scores have on the ratings by cutting in half the number of points schools can earn related to the tests, Cordova said. This makes the stakes a little lower, even as the district sets a higher bar.

The number of points will go back up in 2019 when the district makes a more significant change, officials said. The change has to do with how the tests are scored.

For the past several years, the district has used the “cut points” set by the test vendors to determine which students are reading at grade-level and which are not. But the discrepancy between the third-grade iStation and PARCC reading scores – and the public outcry it sparked – has caused officials to conclude the vendor cut points are too low.

District officials said they have asked the vendors and the state education department to raise the cut points. But even if they agree, that isn’t a simple or quick fix. In the meantime, the district has developed a set of targets it calls “aimlines” that show how high a student must score on the early literacy tests to be on track to score at grade-level on PARCC, which district officials consider the gold standard measure of what students should know.

The aimlines are essentially higher expectations. A student could be judged to be reading at grade-level according to iStation but considered off-track according to the aimlines.

In 2019, the district will use those aimlines instead of the vendor cut points for the purpose of rating schools. Part of the reason the district is waiting until 2019 is to gather another year of test score data to make sure the aimlines are truly predictive, officials said.

However, the district is encouraging schools to start looking at the aimlines this year. It is also telling families how their students are doing when measured against them. Schools sent letters home to families this past week, a step district critics previously said was a good start.

Van Schoales, CEO of the advocacy group A Plus Colorado, has been among the most persistent critics of this year’s elementary school ratings. He said he’s thrilled the district listened to community concerns and is making changes for next year, though he said it still has work to do to make the ratings easier to understand and more helpful to families.

“We know it’s complicated,” he said. “There is no perfect SPF. We just think we can get to a more perfect SPF with conversations between the district and community folks.”

The district announced other changes to the School Performance Framework next year that will affect all schools, not just elementary schools. They include:

  • Not rating schools on measures for which there is only one year of data available.

Denver’s ratings have always been based on two years of data: for instance, how many students of color met expectations on state math tests in 2016 and how many met expectations in 2017.

But if a school doesn’t have data for one of those years, it will no longer be rated on that measure. One way that could happen is if a school has 20 students of color one year but only 12 the next. Schools must have at least 16 students in a category for their scores to count.

The goal, officials said, is to be more fair and accurate. Some schools complained that judging them based on just one year of data wasn’t fully capturing their performance or progress.

  • Applying the “academic gaps indicator” to all schools without exception.

This year, the district applied a new rule that schools with big gaps between less privileged and more privileged students couldn’t earn its two highest color ratings, blue and green. Schools had to be blue or green on a new “academic gaps indicator” to be blue or green overall.

But district officials made an exception for three schools where nearly all students were from low-income families, reasoning it was difficult to measure gaps when there were so few wealthier students. However, Boasberg said that after soliciting feedback from educators, parents, and advocates, “the overwhelming sentiment was that it should apply to all schools,” in part because it was difficult to find a “natural demographic break point” for exceptions.

Contract review

Here’s what a deeper probe of grade changing at Memphis schools will cost

PHOTO: Marta W. Aldrich
The board of education for Shelby County Schools is reviewing another contract with a Memphis firm hired last year to look into allegations of grade tampering at Trezevant High School. Board members will discuss the new contract Feb. 20 and vote on it Feb. 27.

A proposed contract with the accounting firm hired to examine Memphis schools with high instances of grade changes contains new details on the scope of the investigation already underway in Shelby County Schools.

The school board is reviewing a $145,000 contract with Dixon Hughes Goodman, the Memphis firm that last year identified nine high schools as having 199 or more grade changes between July 2012 and October 2016. Seven of those are part of the deeper probe, since two others are now outside of the Memphis district’s control.

The investigation includes:

  • Interviewing teachers and administrators;
  • Comparing paper grade books to electronic ones and accompanying grade change forms;
  • Inspecting policies and procedures for how school employees track and submit grades

In December, the firm recommended “further investigation” into schools with high instances of grade changes. At that time, Superintendent Dorsey Hopson emphasized that not all changes of grades from failing to passing are malicious, but said the district needs to ensure that any changes are proper.

Based on the firm’s hourly rate, a deeper probe could take from 300 to 900 hours. The initial review lasted four months before the firm submitted its report to Shelby County Schools.

The school board is scheduled to vote on the contract Feb. 27.

You can read the full agreement below: