failing grade

Why one Harvard professor calls American schools’ focus on testing a ‘charade’

PHOTO: Alan Petersime

Harvard professor Daniel Koretz is on a mission: to convince policymakers that standardized tests have been widely misused.

In his new book, “The Testing Charade,” Koretz argues that federal education policy over the last couple of decades — starting with No Child Left Behind, and continuing with the Obama administration’s push to evaluate teachers in part by test scores — has been a barely mitigated disaster.

The focus on testing in particular has hurt schools and students, Koretz argues. Meanwhile, Koretz says the tests are of little help for accurately identifying which schools are struggling because excessive test prep inflates students’ scores.

“Neither good intentions nor the value of well-used tests justifies continuing to ignore the absurdities and failures of the current system and the real harms it is causing,” Koretz writes in the book’s first chapter.

Daniel Koretz, Harvard Graduate School of Education

His skepticism will be welcome to families of students who have opted out of state tests across the country and others who have led a testing backlash in recent years. That sentiment helped shape the new federal education law, ESSA.

Koretz has another set of allies in some conservative charter and voucher advocates, including — to an extent — Secretary of Education Betsy DeVos, who criticized No Child Left Behind in a recent speech. “As states and districts scrambled to avoid the law’s sanctions and maintain their federal funding, some resorted to focusing specifically on math and reading at the expense of other subjects,” she said. “Others simply inflated scores or lowered standards.”

But national civil rights groups and some Democratic politicians have made a different case: That it’s the government’s responsibility to continue to use test scores to hold schools accountable for serving their students, especially students of color, poor students, and students with disabilities. (ESSA continues to require testing in grades three through eight and for states to identify their lowest performing schools, largely by using test scores.)

We talked to Koretz about his book and asked him to explain how he reached his conclusions and what to make of research that paints a more positive picture of tests and No Child Left Behind.

The interview has been edited for clarity and length.

Do you want to walk me through the central thesis of your book?

The reason I wrote the book is really the subtitle: we’re “pretending to make schools better.”

Most of the bad news that’s in this book is old news. We’ve been collecting evidence of various kinds about the impact of the very heavy handed, high-stakes testing that we use in this country for a long time. I lost patience with people pretending that these facts aren’t present. So I decided it would be worth writing a book that summarizes the evidence both good and bad about the effects of test-based accountability. When you do that, you end up with an awful lot on the bad side and not very much on the good side.

Can you talk about some of the bad effects?

There are a few that are particularly important. One is absolutely rampant bad test prep. It’s just everywhere. One of the consequences of that is that test scores are often very badly inflated.

There aren’t all that many studies of this because it’s not really a welcome suggestion. When you go to the superintendent and say, “Gee, I’d like to see whether your scores are inflated,” they rarely say, “Boy, we’ve been waiting for you to show up.” There aren’t that many studies, but they’re very consistent. The inflation that does show up is sometimes absolutely massive. Worse, there is growing evidence that that problem is more severe for disadvantaged kids, creating the illusion of improved equity.

Another is increasingly widespread cheating. We, of course, will never know just how widespread because there aren’t resources to examine the data from 13,000 school districts. Everyone knows about Atlanta, a few people know about El Paso, but that’s just the tip of the iceberg.

There’s obviously also — and perhaps this should be on the same par — enormous amounts of stress for teachers, for kids, and for parents. That’s the bad side.

I want to ask a little more about test score inflation. What is the strongest evidence for inflation? And let me give you two pieces that to me seem like potentially countervailing evidence. One piece is when I’m looking at research on school turnaround — like the most recent School Improvement Grant program and also turnaround efforts in New York City — these schools have been under intensive pressure to raise test scores. And yet their test score gains on high-stakes tests have been pretty modest at best. The other example is the Smarter Balanced exam. The scores on the Smarter Balanced exam don’t seem to be going up. If anything, they’re going down.

The main issue is that score inflation doesn’t occur in the same amount everywhere. You’ve come up with two examples where there is apparently very little. There are other examples that are much worse than the aggregate data suggest.

In the case of Smarter Balanced, I would wait and see. Score inflation can only occur when people become sufficiently aware of predictable patterns in the test. You can’t game a test when you don’t know what irrelevant things are going to recur, and that just may take some time.

I’m wondering your take on why some of the strongest advocates for test-based accountability have been national civil rights groups.

One of the rationales for some of the most draconian test-based accountability programs we’ve had has been to improve equity. If you got back to the enactment of NCLB, you had [then-Massachusetts Sen.] Teddy Kennedy and [then-California Rep.] George Miller actively lobbying their colleagues in support of a Republican bill. George Miller summed that up in one sentence in a meeting I went to. He said, “It will shed some light in the corners.” He said that schools had been getting away with giving lousy services to disadvantaged kids by showing good performance among advantaged kids, and this would make it in theory impossible to do that.

Even going back before NCLB, I think that’s why there was so much support in the disability community for including disabled kids in test-based accountability in the 1990s — so they couldn’t be hidden away in the basement anymore. I think that’s absolutely laudable. It’s the thing I praise the most strongly about NCLB.

It just didn’t work. That’s really clear from the evidence.

I think the intention was laudable and I think the intention was why high-stakes testing has gotten so much support in the minority community, but it just has failed.

You mention in your book probably the most widely cited study on the achievement effects of No Child Left Behind, showing that there were big gains in fourth grade math and some gains in eighth grade math, but there wasn’t anything good or bad in reading.

Pretty much. There was a little bit of improvement in some years in reading but nothing to write home about.

So the math gains — and that was on the low-stakes federal NAEP test — they’re just not worth it in your view?

I think the gains are real. But there are some reasons not be terribly excited about these. One is that they don’t persist. They decline a little bit by eighth grade, they disappear by the time kids are out of high school. We don’t have good data about kids as they graduate from high school, but what we do have doesn’t show any improvement.

The biggest reason I’m not as excited as some people are about those gains is we’ve had evidence going back to the 1980s that one of the responses that teachers have had to test-based accountability is to take time out of untested subjects and to put it into math and reading. We don’t know how much of that gain in math is because people are teaching math better and how much is because kids aren’t learning about civics.

That’s, in my view, not enough to justify all of the stuff on the other side of the ledger.

When I’ve looked at some studies on the impact of NCLB on students’ social-emotional skills, the impact on teachers’ attitudes in the classrooms, and the impact on voluntary teacher turnover, they haven’t found any negative effects. They also haven’t found positive effects in most cases. But that would seem to at least in one sense undermine the argument that NCLB had big harmful effects on these other outcomes.

I haven’t seen those studies, but I don’t think what you describe does undermine it. What I would like to see is an analysis of long-term trends not just on teacher attrition but on teacher selection. A lot of what I have heard has really been, frankly, anecdotal. I was once a public school teacher and teaching now is utterly unlike what it was when I taught. It seems unlikely that that had no effect on who opts in and who opts out to be a teacher.

I don’t have evidence of this but I suspect that to some extent different types of people are selecting into teaching now than were teaching 30 years ago.

Can you talk about what you see as good versus bad test prep?

Something that Audrey Qualls at the University of Iowa said was, “A student has only mastered something if she can do it when confronted with unfamiliar particulars.”

Think about training pilots — you would never train pilots by putting them in a simulator and then always running exactly the same set of conditions because next time you were in the plane and the conditions were different you’d die. What you want to know is that the pilot has enough understanding and a good enough command of the physical motions and whatnot that he or she can respond to whatever happens to you while you’re up there. That’s not all that distant an analogy from testing.

Bad test prep is test prep that is designed to raise scores on the particular test rather than give kids the underlying knowledge and skills that the test is supposed to capture. It’s absolutely endemic. In fact, districts and states peddle this stuff themselves.

I take it it’s very hard to quantify this test prep phenomenon, though?

It is extremely hard, and there’s a big hole in the research in this area.

Let’s turn from a backward-looking to a forward-looking discussion. What is your take on ESSA? Do you think it’s a step in the right direction?

This may be a little bit simplistic, but I think of ESSA as giving states back a portion of the flexibility they had before No Child Left Behind. It doesn’t give them as much flexibility as they had in 2000.  

It has the potential to substantially reduce pressure, but it doesn’t seem to be changing the basic logic of the system, which is that the thing that will drive school improvement is pushing people to improve test scores. So I’m not optimistic.

One of things that I argue very strongly at the end of the book is that we need to look at a far broader range of, not just outcomes, but aspects of schooling to create an accountability system that will generate more of what we want. ESSA takes one tiny step in that direction: it says you have to have one measure beyond testing and graduation rates. But if you read the statute it almost doesn’t matter what that measure is. The one mandate is that it can’t count as much as test scores — that’s written in the statute. The notion that it means the same thing to monitor the quality of practice or to monitor attendance rates is just absurd

As I’m sure you know, research — including from some of your colleagues at Harvard — has shown that so-called “no-excuses” charter schools in places like Boston, Chicago, and New York City, have led to substantial test score gains and in some cases improvements in four-year college enrollment. Are you skeptical that those gains are the result of genuine learning?

It depends on which test you’re talking about. Some of the no-excuses charter schools drill kids on the state test, so I don’t trust the state test scores for some of those schools. I think it’s entirely plausible that some of those schools are going to affect long-term outcomes because they’re in some cases replacing a very disorderly environment with a very orderly one. In fact, I would say too orderly by quite a margin.

But those reforms are much bigger than just test-based accountability or just the control structure we call charters. It’s a whole host of different things that are going on: different disciplinary policies, different kinds of teacher selection, different kinds of behavioral requirements, all sorts of things.

A lot of the discussion around accountability, including in your book, is about the measures we should be using to identify schools. I’m interested in your take on what happens when a school is identified by whatever system — perhaps by the holistic system you described in the book — as low performing.

The first step is to figure out why is it bad. I would use scores as an opening to a better evaluation of schools. If scores on a good test are low, something is wrong, but we don’t know what. Before we intervene we ought to find out what’s wrong.

This is the Dutch model: school inspections are concentrated on schools that shows signs of having problems, because that’s where the payoff is. I would want to know what’s wrong and then you can design an alternative. In some cases, it may be the teaching staff is too weak. It may be in some cases the teaching staff needs supports they don’t have. It may be like in the case of Baltimore, they need to turn the heat on. Who knows? But I don’t think we can design sensible interventions until we know what the problems are.

making the rounds

Tennessee’s new education chief ‘very confident’ that online testing will be smooth in April

PHOTO: Shelby County Schools
Tennessee's new education commissioner Penny Schwinn (second from left) met with Douglass High School students and Shelby County Schools leaders Friday.

As Tennessee’s new education commissioner wrapped up her second week on the job by visiting four schools in Shelby County, Penny Schwinn said she feels “very confident” the state has learned from its mistakes in online testing.

During the more than three-hour ride to Memphis on Friday, Schwinn said she continued to pore over documents showing evidence that the corrections the state department staff have put in place will work.

“I feel very confident that our team has looked into that,” she told reporters in a press conference after meeting with students. “They’re working with the vendor to ensure that testing is as smooth as possible this year.” Currently the state is working with Questar, who administered TNReady online last year.

She also said the state’s request for proposals from testing vendors, which is already months behind, will be released in about two weeks.

PHOTO: Shelby County Schools
From left: John Bush, principal of Douglass High School; Penny Schwinn, Tennessee Education Commissioner; and Joris Ray, interim superintendent for Shelby County Schools.

“No later than that,” she said. “We hope and expect to have a vendor in place before the end of the fiscal year,” in late June.

The day Schwinn was hired, she said getting state testing right would be her first priority. Three years of major technical failures have severely damaged the trust educators and parents have in the state’s test, TNReady. It is the main measure of how schools and teachers are doing, but state lawmakers exempted districts from most testing consequences in 2018.


From Schwinn’s first day on the job: Tennessee’s new education chief wants to ‘listen and learn’ with school visits


Prior to talking with reporters, Schwinn said she heard “hard-hitting questions” from several students at Douglass High School in Memphis about what the state can do to improve education. Schwinn has said she will visit Tennessee schools throughout her tenure to ‘listen and learn’ by talking to students and educators.

Reporters were not allowed to attend the student discussion with Schwinn and some Shelby County Schools leaders.

Douglass High entered Shelby County Schools’ turnaround program, known as the iZone, in 2016 and saw high academic growth in its first year. But test scores fell this past year as the state wrestled with online malfunctions.

Timmy Becton Jr., a senior at Douglass High, said he hopes for fewer tests and more projects to demonstrate what a student has learned. Those kind of assessments, he said, can help a student connect what they are learning to their daily life.

PHOTO: Shelby County Schools
Tennessee’s new education commissioner met with students at Douglass High School and Shelby County Schools leaders.

“We figured it would be a different way to measure and see how much knowledge a student really has on a specific subject,” he told Chalkbeat after meeting with Schwinn during a student roundtable session. “It’s a good alternative to taking tests.”

He said he was “surprised and happy” to see Schwinn actively seek student perspectives.

“I really think that’s the most important part because students are the ones going to school every day,” Becton said. “So, if you want to find a good perspective on how to solve a problem, it’s really great to talk to the people who are actively involved in it and the people who are actually experiencing these problems directly.”

The state’s annual testing window runs from April 15 to May 3.

School discipline

Michigan schools have expelled fewer students, but that may not be cause for celebration

PHOTO: Getty Images

Michigan schools have expelled far fewer students since the state enacted laws aimed at cutting back on expulsions. But an advocate who’s pushed for an end to zero-tolerance policies pointed out persistent problems and told elected state education leaders this week that, “We shouldn’t start celebrating yet.”

This is why: Peri Stone-Palmquist, executive director of the Ypsilanti-based Student Advocacy Center, told State Board of Education members that in the 18 months since the new laws took effect in 2017, expulsions have dropped 12 percent. But she’s concerned that too many school leaders don’t understand the law or are ignoring its requirements. And she believes some schools are finding other ways of kicking kids out of school without expelling them.

Michigan did away with zero-tolerance policies that had earned it a reputation for having some of the toughest disciplinary rules in the nation. In their place, lawmakers instituted new rules, such as requiring schools to consider seven factors — including a student’s age, disciplinary record, disability and seriousness of the incident — in making expulsion decisions.

“We have had districts and charters tell advocates that they would not consider the seven factors at all,” Stone-Palmquist said. Others aren’t sharing with parents and students how those seven factors were used. And she said there’s a general “lack of understanding of lesser interventions and the persistent belief that lengthy removals remain necessary.”

That’s a problem, she and others say, because of the negative consequences of kicking students out of school. Studies have shown that students kicked out of school are often missing out on an education and are more likely to get into trouble. Advocates also worry that expulsion exacerbates what they describe as a “school-to-prison” pipeline.

She said advocates are noticing that more students are receiving long suspensions, an indication that some schools are suspending students rather than expelling them. Hiding students in suspension data won’t work much longer, though. Michigan now requires schools to collect such data, which soon will be public.

Stone-Palmquist also said that some schools aren’t even going through the expulsion process, but simply referring students with discipline issues to “understaffed virtual settings.”

“Once again, the students who need the most get the least, and no one has to report it as an expulsion.”

Stone-Palmquist gave an example of a ninth-grader involved in a verbal altercation who was expelled for a long time for persistent disobedience, “despite our team lining up extensive community resources for him and despite the district never trying positive interventions with him.”

In another case, a fifth-grader was expelled for 180 days for spitting at another student who had done the same to them first. Stone-Palmquist said the seven factors weren’t considered.

“We were told at the appeal hearing that the student’s behaviors were too dangerous to consider lesser interventions.”

She and Kristin Totten, an education lawyer for the ACLU of Michigan, provided board members with statistics that some members found alarming. Totten noted that an ACLU review of data collected by the federal government shows that for every 100 students in Michigan, 38 days are lost due to suspension. In Oakland County, 26 days are lost for every 100 students. In Macomb County, it’s 35 days and in Wayne County, it’s 55 days.

One child who’s experienced trauma for years was repeatedly suspended from multiple schools. The 11-year-old has been diagnosed with post-traumatic stress disorder and attention deficit hyperactivity disorder. This school year, she’s been suspended for 94 days.

“Never once were the seven factors mentioned to her mother,” Totten said.

Stone-Palmquist asked board members to consider recommendations, including developing a model student code of conduct that incorporates the new rules, partnering with the advocacy center to request an attorney general’s opinion on what districts are required to do, and expanding data collection.

Tom McMillin, a member of the state board, asked whether the state should consider financial penalties, such as withholding some state aid.

“I’m a fierce advocate for local control. But in areas where the incentives might not be there to do what’s right … I’m fine with the state stepping in,” McMillin said.

Board member Pamela Pugh said she appreciated the push for the board to “move with great speed.” She said the data and stories provided are “compelling, as well as convincing.”

Stone-Palmquist said that despite her concerns, there have been some successes.

“Districts that used to automatically expel 180 days for fights, for instance, have partnered with us to dramatically reduce those removals with great outcomes,” she said. “We know alternatives are possible and that they actually help get to the root of the problem, prevent future wrongdoing and repair the harm.”

The Detroit school district didn’t come up during the hearing. But on the same day Stone-Palmquist presented to the state board, Detroit Superintendent Nikolai Vitti gave a presentation to his local board of education about what’s happened in the months since the district embarked on an effort to improve school culture by revising the student code of conduct, hiring deans for each school, and providing training on alternative discipline methods.

The bottom line: Vitti said that schools are booting out dramatically fewer students and greatly increasing alternative methods of discipline. In-school suspensions are up, given the push against out-of-school suspensions.

But the changes have also raised concerns. Some school staff have said the new rules are tying their hands. Vitti said it will take time for the changes to take hold, and he outlined some areas that need to improve, including more training.