testing takeaways

A decade of stagnation: Little progress on closely watched federal test, as big disparities persist

Scores on the exams known as the “nation’s report card” have barely budged over the last two years, new data show.

The minimal progress on the federal math and reading exams given to fourth and eighth graders will be a disappointment to officials who have hoped that their policies would boost students’ performance or help close yawning gaps between groups of students.

The 2017 results also mean that the U.S. has seen its test scores largely stagnate for a decade, after 10 years of substantial gains in math. The country’s “achievement gaps” between black and white students, and between low-income and affluent students, have also largely held steady over the last 10 years.

“I’m pleased that eighth-grade reading scores improved slightly but remain disappointed that only about one-third of America’s fourth- and eighth-grade students read at the NAEP Proficient level,” said former Michigan Governor John Engler, the chair of the National Assessment Governing Board, which oversees the tests. “We are seeing troubling gaps between the highest- and lowest-performing students. We must do better for all children.”

In an era when standardized testing is commonplace, the National Assessment of Educational Progress is the rare exam with low stakes for individual students and schools, but high stakes for politicians and policymakers. Some education leaders have staked their own reputations on NAEP results.

But score analyzers, beware: It’s difficult to draw conclusions about the benefits of specific policies based on the results. NCES, the federal agency that administers the tests, warns against it.

Some have also questioned what the transition to digital assessments means for the trends in individual state results, though NCES insists that extensive efforts have been made to account for this change.

Still, advocates on all sides will use them to argue for their preferred changes to education policy. U.S. Secretary of Education Betsy DeVos has already praised the gains in one state, Florida, and highlighted the disappointing national results. “The report card is in, and the results are clear: We can and we must do better for America’s students,” she said.

What you should know about NAEP and this year’s scores

The National Assessment of Educational Progress is administered by the federal government to a sample of students across the country. The most closely watched tests are the fourth- and eighth-grade math and reading exams, since they show how scores are changing nationally, in individual states, and in a number of cities.

The 2017 results showed only tiny differences from 2015: a loss of 1 point in both subjects in fourth grade, and a gain of 1 point in both subjects in eighth grade. Only the grade eight reading improvement was statistically significant compared to the last test.

(One way to think about how big that is: the difference between the “basic” and the “proficiency” benchmarks is about 35 points, depending on the test.)

Most students did not reach the test’s “proficient” benchmark, which is considered a high bar to clear. But some groups of students remain further behind.

Exam Share of students scoring ‘proficient’ and above Share scoring ‘basic’ and above
Fourth-grade math 40% 80%
Eighth-grade math 34% 70%
Fourth-grade reading 37% 68%
Eighth-grade reading 36% 76%

In eighth-grade math, the average black student scored just below the “basic” benchmark, while the average white student came several points shy of the higher “proficient” benchmark. Forty-four percent of white students were proficient, compared to 20 percent of Hispanic students and 13 percent of black students.

Gaps were similarly large between students who did and did not qualify for free or reduced-price lunch, a common proxy for poverty.

While test score gaps by race and poverty remained static, there was a notable increase in the difference in performance between the highest achieving students and the lowest-achieving ones.

As expected, some states and cities saw their scores rise and fall modestly, though the vast majority held steady. Alaska, Louisiana, New Hampshire, South Carolina, and Vermont saw statistically significant declines on two or more tests, while Florida was the only state that made significant gains on multiple tests. None of the state improvements or drops were more than 6 points.

Eighth grade reading scores over time

Fourth grade math scores over time

Source: National Center for Education Statistics. Graphics by Sam Park.

Louisiana, New Mexico, and Mississippi students continued to rank at or near the bottom, while Massachusetts, New Jersey, and New Hampshire students perform consistently well.

(Since state demographics vary — and NAEP results are highly correlated with race and poverty — research has tried to account to for that to better isolate performance of schools, and those rankings differ significantly.)

The longer-run trends in NAEP are more positive than the latest results. Nationally, scores have improved substantially in math and modestly in reading since the early 1990s.

What’s the deal with the flat scores?

It’s unclear why scores are flat. NCES, which administers the exam, says the scores could be influenced by specific policies, resources available to schools, and demographics.

That’s a frustrating limitation for policymakers who want clear solutions. It’s also unlikely to stop the finger-pointing and policy prescriptions.

Critics of the reform efforts that prevailed under the Obama administration — the expansion of charter schools, introduction of the Common Core learning standards, and the creation of new teacher evaluation systems — will likely see the results as vindication, even as supporters use the latest data to argue that public schools need substantial change.

A handful of careful statistical analyses have tried to gauge how certain policies have affected NAEP scores in the past. For instance, one recent study found that states that made greater cuts in school funding in response to the Great Recession saw worse NAEP scores as a result; an older study found that an infusion of school funding led to greater NAEP gains.

Other research has found that when states introduce stringent school accountability systems, they do better on the NAEP math tests.

Decision day

A state board decision on two long-struggling Pueblo schools could affect the entire district

PHOTO: Andrea Chu/Getty Images

A year after running out of chances to improve on their own, two Pueblo middle schools will be making a return appearance in front of the State Board of Education this week.

Heroes Middle School and Risley International Academy of Innovation have spent the last eight years on a watch list for low-performing schools. A year ago, the state board ordered them along with five school districts and 10 other schools to craft plans to improve — and warned them that too little progress could lead to sharper consequences in the future. It was the first time state regulators faced these decisions under Colorado’s school accountability system.

Many of the schools and districts on the state watchlist have managed to improve enough to avoid further intervention, including Bessemer Elementary, also in Pueblo City Schools.

But even after working with a nonprofit group to improve the quality of teaching, the two schools failed to advance on Colorado’s school rating system, which is largely based on performance on standardized tests. Their test scores left Heroes at the second lowest rating, where it has been for several years, and Risley on “turnaround,” the lowest possible rating, despite some improvement in some subject areas and grade levels.

On Wednesday, state board members will hold a hearing on the future of Heroes and Risley— along with the entire Adams 14 district and its high school. They’ll be taking into account recommendations from independent reviewers who visited the schools, the Pueblo district, students and their families, and advocates who have been lobbying throughout the process.

If the board members take the same approach they did last year, they’re likely to let the schools continue with “innovation” status, with some additional external management. But some state board members have expressed frustration with the pace of change, and they have more drastic options available to them, including closure or turning low-performing schools into charters.

At least in the case of Risley, the recommendation to largely stay the course comes despite grave concerns about the school. The evaluators gave a damning report, rating its leadership “not effective” at implementing change or even having the capacity to benefit from the help of an external partner.

The evaluators described chaotic classrooms in which students slept at their desks or openly played on their phones. In classrooms in which teachers were able to engage students, too many of them were “doing the cognitive work” for the students rather than leading them in real learning, they said.

The school is using too many new programs at once without enough training for teachers, with the result that most of them were not being implemented as intended, the evaluators said, and there isn’t enough coordination. In one example, the school had adopted new reading and math curriculum designed for 90-minute blocks, but the school’s schedule only allows for 75-minute periods.

But closing the school or turning it over to a charter organization would be worse options, evaluators said.

Conversion to a charter school would be divisive and unlikely to better serve students, they said, and there aren’t any nearby schools that could absorb the students if Risley were to close. “There are no other viable options for students that would likely lead to better outcomes,” the evaluators wrote.

What’s more, they wrote, the school serves as an “anchor” to the community — a view that community members expressed in comments submitted to the state board. Parents described using the health clinic associated with the school or getting food from the food pantry, as well as the pride their children felt in their sports teams, which provide positive and structured activities after school.

“As a parent, I feel better after each time I volunteer,” one mother wrote. “My daughter is a cheerleader here and I enjoy going to all her games and support her school and represent red and black and showing bear pride. I am looking forward to my son attending here in years to come.”

In several letters, students said they were having to take so many tests as part of the turnaround process that they were bored and stressed out and did not want to come to school.

“If we’re testing every month, when the real test comes around, we get tired of it and guess or click through,” one eighth-grade student said. “They’re stressing us out, and we don’t really need them. I understand you guys need to see where we are, but this many tests are not helping any of us.”

The state review panel assessment of Heroes was more positive, even as evaluators noted ongoing problems and recommended an additional external partner to help manage the school, not just provide instructional support.

“The school needs more time to see the full benefits of participation in the Innovation Zone, but implementation thus far has proven effective,” they wrote. “Leadership is developing and beginning to create positive change.”

At Heroes, evaluators did not recommend conversion to a charter school in part because the school serves a high population of students with disabilities. The middle school is also part of a K-8 school with one principal, and disentangling the elementary and middle school would have financial implications for both.

In response to written questions from the State Board of Education, Pueblo district officials said converting both schools to charters would have a serious financial impact on the entire school system. The district, which already faces declining enrollment and operates on a four-day week while staring down a $785 million maintenance backlog for its aging buildings, would lose almost $5 million a year in state funding if Risley and Heroes students all went to charter schools. The school district would also lose one of its newer buildings if Risley converted to a charter.

The opposition to a charter conversion is about more than money. In a letter, Barb Clementi, vice president of the school board in Pueblo, pointed to the example of a struggling school that was turned into a magnet school. While it has a good rating, it now serves a student population that is almost entirely different, and the former students continue to struggle in their new schools. Converting Risley or Heroes to charters runs the same risk, she said.

Risley and Heroes are part of an innovation zone that provides schools more flexibility but also allows teachers and administrators to work together. While the state review panel said both schools need to take more advantage of the zone, other Pueblo schools have come off the state watchlist using the innovation approach.

“I urge you to consider the bigger picture of our entire Pueblo community and school system when making decisions,” Clementi wrote. “These two middle school have made progress and deserve the time and opportunity to continue their good work with perhaps additional partnership support.”

Suzanne Ethredge, president of the Pueblo Education Association, the teachers union, said both schools have suffered from a lack of consistent leadership and significant teacher turnover, an issue that evaluators noted as well. She said any plan to improve the schools needs to take seriously the issue not just of training teachers but keeping them.

Some teachers and parents have asked for the schools to be turned into “community schools,” though letters to the state board indicate this approach has some serious skeptics as well.

“There is a lot of buy-in and a lot of people are looking to this model as a way to engage authentically with our community and dig in and find those root causes that are holding students back,” said Robert Donovan, an eighth-grade social studies teacher at Risley and member of the Pueblo Education Coalition.

Community schools incorporate a wide range of services for students and their families, ranging from meals, health clinics, and laundry service to English classes and job training. These schools work to engage parents in their children’s education, and in their most ideal version, parents play a big role in shaping educational decisions.

Teachers unions have been strong advocates for community schools in response to persistent low test scores, including in Pueblo and Adams 14. They argue that community schools address the social and economic problems that make it hard for students to succeed at school. Research on the academic impact of this approach is mixed.

More than 97 percent of Risley students qualify for subsidized lunches, a measure of poverty, compared to 80 percent for the district as a whole. Nearly 80 percent of Heroes students are from low-income families.

“The concerns expressed by our community fall into several areas, including authentic parent and community engagement, culturally relevant curriculum, a focus on high-quality teaching and learning, positive discipline practices, and mental health supports, to name a few,” reads the online petition. “The most powerful voices speaking about what is needed were, in fact, students. Based on this engagement, a community schools model … is the best fit for what we need and want in Pueblo.”

At Wednesday’s hearing, district officials will lay out their plans in more detail — they declined to talk to us before the meeting — and face tough questions from state board members, who have until Thursday to render a decision on the two Pueblo schools and the Adams 14 district, which could face significant loss of control.

This week’s decisions will mark a test of how the state board will deal with struggling schools going forward. Pueblo City Schools and Adams 14 have both described a process for finding additional outside partners if that’s what the state board orders, but it’s not entirely clear what that will look like on the ground.

And then it will fall back to principals, teachers, parents, and students to do the work.

First Person

Like most superintendents, I cared a lot about test scores. Too much, it turns out.

GRAMMY Career Day at Camden Creative Arts High School in Camden, New Jersey. (Photo by Mark Von Holden/WireImage for NARAS)

One of Paymon Rouhanifard’s earliest initiatives after becoming superintendent of Camden, New Jersey, schools in 2013 was to design a “school information card” that spelled out each school’s test scores in a family-friendly format. By the time he left the district this year, the cards were no longer being produced.

In this piece, delivered as a speech at the MIT School Access and Quality Summit on Tuesday, Rouhanifard explains why he did away with the cards against the advice of his team — and what that means, in his view, for the future of how children in high-need communities are educated. His personal evolution mirrors one that many in the education reform world are undergoing, as they increasingly reckon with the results of their own focus on test scores.

About five months ago, I stepped down from the best job I’ve ever had, superintendent of Camden, New Jersey. For those of you who don’t know much about Camden, it’s a big little city. There are about 80,000 residents. Fifteen thousand school-age children.

Similar to cities like Detroit, Camden has yet to recover from the postindustrial decline of the 1960s and 1970s. The challenges we inherited with our school system are rooted in decades of poverty, born out of centuries of injustice.

In March 2013, Gov. Chris Christie initiated a state intervention in Camden. And in August, I started as the first permanent superintendent subsequent to that very consequential change in governance. I was the 13th superintendent over the prior 16 years.

And that turnover was emblematic of the very problem we were aiming to address. Our belief was that politics and bureaucracy had inhibited the progress Camden students and families deserved to overcome the steep challenges the city was facing. Whiplashing changes were the norm. I saw the vestiges in just about every classroom I visited.

Our theory of action was relatively straightforward, and one we continually discussed with our community.

We believed it was important for the district to segue out of being a highly political monopoly operator of schools, but one that instead focused on regulating the system. That involved us asking high quality non-profit charter organizations to help turn around existing schools and serve our broader city as neighborhood schools, all while steadily improving our district schools on a parallel track.

During that time, I’m proud of what we accomplished.

  •  We reduced the district’s dropout rate by almost 50 percent.
  •  We reduced suspensions by over 50 percent.
  •  We developed a common enrollment system that makes life easier for families.
  •  We initiated over $340 million in capital repairs to dramatically improve neglected facilities.

Perhaps what I’m most proud of is how we went about our work. We built large coalitions of support, from our elected officials to community leaders to parents and students. While there was certainly some pushback, we undeniably left with more allies than skeptics.

But what I want to discuss with you today is not how we got to this point, but how we can get significantly get better moving forward.

This is a story about an evolution of my own thinking during that five-year experience — specifically, how I came to discover the underpinnings of our work are fraught with complications, requiring change and improvement.

What I’m referring to are the math and literacy student achievement data we utilize to drive so many of the critical decisions we make. Systems we utilize to evaluate schools, teachers, and students. Just about every person in this room regularly engages with these data.

My realization a few years ago was that I rarely asked questions about what these tests actually told us. What they didn’t tell us. And perhaps most importantly, what were the specific behaviors they incentivized, and what were the general trade-offs when we acutely focus on how students do on two state tests.

So I’ll skip to the part where about two years ago I made the decision to do away with our school information card, Camden’s school report card, an accountability tool that many other cities utilize in some shape or form.

I’m intentionally using the word “I” because, well, every last person on our remarkably talented leadership team was against it. And I can understand that on many levels.

There’s a formidable intellectual argument driving state test-based accountability systems.

“A Nation at Risk” begat a decades-long effort to turn the flood lights on within high poverty school districts. Race to the Top ensured we not only knew the gaps in student achievement, but we had a plan of action. In many respects, this was critically important work.

Accountability shouldn’t be a four-letter word. The Camden school district we inherited had grappled with challenges of many varieties – fiscal, operational, to go along with teaching basic student reading, writing, and math skills. There simply wasn’t a meaningful focus on outcomes of any kind.

Across the country, we’ve attempted to create a KPI – a Key Performance Indicator – to ensure we’re tracking progress against one or two units of measurement. We focus our energies there. I get it.

When I was running the Office of Portfolio Management for the New York City Department of Education, I was a devout believer that every decision should be predicated on math and literacy tests.

But today I want to push a bit on this conventional wisdom – and challenge what I believe to be a shared set of assumptions within the education reform establishment that has gone mostly unquestioned. I want to explain why I felt eliminating our School Information Card in Camden was a very small step in the right direction.

My thinking began to evolve as a function of simple, passing conversations I had with a variety of different people in Camden. I’ll share a few snapshots. And while I’m certainly paraphrasing here, they capture the essence of what I heard.

  •  One of our very best eighth-grade math teachers tells me: “All I’m doing is collecting formative assessment data. Multiple times per month. I hardly have the time to analyze the data. Can we please just slow down the rapid assessment calendar?”
  •  In just about every high school student roundtable we held – and this is a self-selected, highly motivated group – a student would ask: “Superintendent, I love a good test, but all we’re doing is taking these multiple choice tests! Half the building shuts down and I can’t use the laptops in the library because they’re all being used for testing.”
  •  Questions I was asked by countless parents of middle and high school students: “How come there isn’t enough time in the day for Global Studies? Why don’t we offer a second foreign language? Or have year-round art and music?”
  •  The head of a charter organization once said to me: “It’s hard not to notice almost every school receiving a top rating on the School Information Card has a lower percentage of students with disabilities than ours. To meet our students’ needs, our school must invest in mental health clinics and other wrap-around services – which don’t generate quick test results. But they’re the right thing to do. Yet we would face closure based on this system. Not to mention fall out of favor with foundations like the City Fund.”
  •  Lastly, the CEO of a curriculum provider once told me that when they are working with schools that do heavy test prep – and these are of course mostly urban charter schools – they are invariably asked how they can reduce the curriculum’s “scope and sequence” by one month to make room for their test prep schedule. One entire month.

These questions, of course, cut to the core of the testing culture we’ve created.

We are spending an inordinate amount of time on formative and interim assessments and test prep, because those are the behaviors we have incentivized. We are deprioritizing the sciences, the arts, and civic education, because we’ve placed most of our eggs in two baskets. We are implicitly encouraging schools to serve fewer English language learners and students with an IEP. We are spending less time on actual instruction, because that’s the system we’ve created.

I want to again be clear that the benefits of our current accountability constructs are real. In most of the schools I visit in Camden, there is a genuine drive for better math and literacy outcomes. This wasn’t the case just five years ago. And that applies to incredible efforts underway in New York City, New Orleans, Chicago, Newark, Denver, and many other cities over the past 10 to 15 years. There’s no question about it.

But I also believe the drawbacks currently outweigh the benefits. That we haven’t been honest about the trade-offs. And that there’s a third way approach, which I’ll get to in a moment.

It’s not uncommon for there to be formative or interim assessments every couple weeks in addition to weeks – sometimes months – of test prep in the late winter and early spring.

Even in some of our “highest performing schools,” there is insufficient access to foreign languages, the sciences, and the arts. And school budgets are not the primary driver of that.

And for our most vulnerable kids, we are assuming if test scores in two subjects don’t dramatically improve within a tight time horizon, we should throw the baby out with the bath water.

We’re not playing the long game for our kids.

That is why I made the decision to eliminate Camden’s School Information Card. They only fortified the drawbacks of our current system.

I’ll go out on a limb – most everyone in this room wouldn’t tolerate what I described for their own children’s school. Mostly affluent, mostly white schools shy away from heavy testing, and as a result, they are literally receiving an extra month of instruction – and usually with less overall time allotted to the school day.

I often share the “Is this OK for our own children” thought exercise with education reform friends and colleagues as it relates to testing, and it’s amazing how often I hear twisted logic.

Simply put: time spent on testing and test prep is not time spent on instruction. It’s time spent on testing. Often, we’ve become better at taking the assessments, but haven’t mastered the standards behind them.

The basic rule, what we would want for our own children, should apply to all kids.

What’s more, we say we’ve learned from No Child Left Behind, yet we invariably expect every three-year math and ELA proficiency curve to be on a slope to 100 percent.

When we do this, we incentivize very specific behavior – behavior that oversimplifies the challenges we’ve inherited. Challenges, again, born out of centuries of injustice that manifest themselves today through discrimination, over-criminalization, trauma, toxic stress, and the 30 million word gap. We’re not investing in mental health clinics. We cut scope and sequence in our curriculum. We forgo the sciences and the arts. School becomes less joyful.

As much as we’d like it to be, the public good that is education can’t be reduced to one or two data points measured in short time horizons. It’s so much more complex than that. This is, in essence, what Campbell’s Law teaches us. Donald Campbell, a social scientist, posited that “the more any quantitative social indicator is used for social decision-making,” “the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

Put more simply, when deeply complex social and policy decision-making is reduced to a target, the target ceases to be a good measure.

I’ve said enough calling out the challenges, so it’s only fair to suggest a course of action.

First, high-stakes testing should be a dipstick to measure systems. Most of the rest of the developed world functions this way.

States could administer standardized tests like NAEP – meaning random samplings every two to three years. This would suffice. We would know the gaps. We could address inequities.

Second, while we’re over-assessing, paradoxically, we actually don’t have enough assessments.

I’ll provide an example to make this more concrete: Most high school state tests don’t account for critical science subjects like physics and chemistry. So given we measure what we value, not surprisingly, the majority of high schools in New York City don’t even offer physics. Think about that – in the midst of a supposed national STEM movement, that is a reality in the largest city in our country.

And we must also find normed ways to assess art and music. A society without access to healthy art and music education is problematic for vast swaths of our economy.

Third, we must build smarter tests. Tests, that, for example, address current challenges with race and class bias. In Louisiana, State Superintendent John White has piloted an innovative new state assessment that uses passages from books that students have already been exposed to in class, as opposed to something that’s brand new and just for the test.

Lastly, and perhaps most importantly, tests should inform and guide our actions, and not compel them. This may sound like shades of grey, but it’s an important distinction. We need talented, thoughtful systems leaders who act with urgency, but don’t assume simple proficiency and growth scores in two subjects should immediately require structural change leading to seas of collateral damage and unintended consequences.

Altogether, the pursuit of better life outcomes for kids might just necessitate a slight depression in state test scores to focus more on instruction and other critical components of a child’s education. If life outcomes are indeed what we are about, we should welcome state test scores going down!

My bottom line is this: tests are critically important, particularly in math and literacy.

I’m not suggesting the pendulum should swing so far in the other direction. But two tests shouldn’t be what we are solving for day in and day out.

For years, we’ve found ourselves in a bitterly divisive discourse with entrenched camps. The political fights within education are well documented. We are prone to gravitating to echo chambers, dismissing the noise as political theater, filing the counter-arguments away as low expectations for children.

Here’s the thing: in my opinion, the strength of the education reform movement – the belief that we must fundamentally improve our country’s education systems – has little to do with dogma and ideology. Little to do with the policies we lead and the political battles we strive to win.

It is about the people themselves. It is about us and countless others who believe we must innovate. We must have higher expectations for children. We must strive for equity.

If you go back 20 years, it would have been hard to conceive of a gathering like this. Or the New Schools Venture Fund Summit. Or where the charter movement is today.

If we were to recognize this as our strength, then it would be easier to let go of dogma, challenge our assumptions with honesty and humility in constant pursuit of the truth. Of better ideas. Of higher educational attainment and income mobility for those born into poverty.

I’ll leave you with the most obvious advice you’ll hear today at this conference: you are a function of who you spend time with.

I was deeply shaped by the past five years because I was really in it. The best thing that ever happened to me – and the hardest – was being thrown into the deep end in Camden and left to my own devices.

I spent the vast majority of my time out of my echo chamber and in our community, in our schools. Football and basketball practices. Teacher and student roundtables. I wasn’t a great steward of our central office. I didn’t spend enough time with funders. Or with policymakers and think tanks. And that’s alright.

Being here today, I’m clearly making up for lost time.

I say this to say that we should spend more time with front-line practitioners. With people who disagree with us. While carrying a mindset of being open to disconfirming our most strongly held beliefs, rather than just affirming what we already believe to be true. This is certainly applicable to our broader, much more complex political divide.

Paymon Rouhanifard was the superintendent of schools in Camden, New Jersey from 2013 to June 2018.