Kristin Klopfenstein is executive director of the Education Innovation Institute at the University of Northern Colorado and a self-proclaimed data geek.

I find it tremendously exciting that high-quality research is cited more and more as the basis for education policy decisions. And I notice that researchers are responding by trying to provide concrete examples of what their results mean, and particularly to show how big their numbers are. It is also becoming apparent that the examples used to explain the size of numbers can have a disproportionate, and often inappropriate, effect on how a particular finding is applied in the policy context.

I’m struck by how dramatically this has happened with the “consecutive great teachers” argument. Over the years, several teams of researchers (Hanushek and Rivkin, Gordon, Kane, and Staiger among them) have reported variations on the theme that achievement gaps could be closed if black or low-income students were assigned several extraordinary teachers in a row. This statement has now been repeated enough in the popular press and education policy circles to become conventional wisdom, with a particularly succinct (and inaccurate) version being: “we now know that three years with a great teacher is enough to close the achievement gap.”

The problem is that when researchers made such statements, they never actually observed a disadvantaged student being exposed to multiple extraordinary teachers over time and reaching the academic level of their more advantaged peers. Rather, researchers calculated the “teacher effect” number by crunching data on a group of students exposed to a variety of good and bad teachers, from a few grades and in a particular subject, and extrapolated this result to other students, grades, and subjects. In other words, for fellow data geeks out there, the consecutive great teacher conclusion is based on an “out of sample” extrapolation.

Suppose the data crunching reveals that a fourth grade math teacher from the top ten percent of all math teachers is shown to increase the average student’s math growth score by some number, let’s say 5. Is 5 big? Small? Somewhere in between? The researcher proceeds to make the case that 5 is big because if a disadvantaged kid gained 5 points every year for 3 years, that would be enough to bring them up to the level of more advantaged students. Readers with research experience tend to take this extrapolation with a grain of salt knowing its limitations. But readers without such experience don’t necessarily have the instincts to realize that this example can’t be taken at face value. And the myth is born.

The question I’m raising today isn’t whether great teachers make a difference. There is credible evidence that they do. The question is whether a student’s gain in one teacher’s classroom can be extrapolated over time to predict a cumulative effect when there were no students in the original research who actually exhibited this result. Kids learn at different rates as they mature and their scores can be affected by many external factors including the type of curriculum used and the motivation levels of other students in the classroom. How likely is it that a student will experience uniform gains year after year working under different teachers? And there’s the issue of basing teacher ratings solely on tests of basic skills, often in just one subject, as many studies do. Can a teacher who improves math scores produce equally large gains in other subjects and, perhaps even more importantly, in the soft skills that employers are crying out for?

Misunderstanding the relationship between teacher quality and other factors influencing the achievement gap can lead to the misallocation of time and money. The consecutive great teacher argument has been used to buttress policy arguments for greater accountability, higher teacher pay, merit pay, and charter schools with liberal hiring and firing practices. None of these policies are inherently good or bad in my view, but in a world of limited resources, a policy focus in one area can limit resources for other areas that might actually be bigger levers for reducing the achievement gap. For example, robust urban planning, housing, and health policies have the potential to significantly decrease student mobility and help kids manage asthma and other health conditions that lead to chronic school absences, other key factors in the achievement gap.

I’m not the first to bring up this issue. Plenty of prominent scholars have raised such questions. Diane Ravitch devotes much of her popular book The Death and Life of the Great American School System to warnings against giving standardized tests too much power. On the issue of gap-closing claims, she quotes Richard Rothstein as noting that “good teachers can raise student achievement, and teachers are defined as good if they raise student achievement” (page 182). A circular argument if there ever was one. So why do the myths survive despite such critiques?

I think one reason magic bullet solutions gain currency is that there is no incentive in academia to write clearly for a general audience and to follow up and make sure research is being applied appropriately.  Tenure and promotion are earned by being cited, not by being cited correctly. Moreover, too few researchers who understand the complex statistics behind the magic bullet solutions publicize their objections broadly, as Ravitch and Rothstein did. It is heady stuff when one’s research enters the public conversation. Correcting misunderstandings in these situations takes courage, especially when truisms are embraced by powerful politicians and policy makers.