A new study shows why it’s so hard to improve teacher preparation

Dramatically reshaping how teachers are trained — by emulating great teacher preparation programs and shutting down ineffective ones — has been a key priority of many states and even, under the Obama administration, the federal government.

Fierce debates have ensued over how to hold training programs accountable for making sure novice teachers are ready for the classroom on day one.

Now a new study casts doubt on those efforts for a simple reason: It’s hard to identify good or bad teacher preparation programs, at least as measured by student achievement.

That’s the provocative conclusion of research by Paul von Hippel of University of Texas at Austin and Laura Bellows of Duke University.

“It appears that differences between [programs] are rarely detectable, and that if they could be detected they would usually be too small to support effective policy decisions,” write von Hippel and Bellows.

The study, which has not been formally peer-reviewed, follows other peer-reviewed research comparing teacher training programs in Florida, Louisiana, Missouri, New York City, Texas and Washington.

Those studies try to isolate the impact of teacher preparation programs — including schools of education and alternative certification initiatives — on student test scores. It’s a difficult task, since there are two degrees of separation between a teacher training program and students taking state tests, and researchers use complex value-added models to control for a number of factors.

The studies come to differing conclusions. Some suggest that programs vary substantially in effectiveness, but most find few clear differences among a state’s programs.

In the latest research, von Hippel and Bellows reanalyze those six studies using a consistent method. They find that in all states the differences between teacher preparation programs are small and it’s difficult to pick out top-notch programs with confidence.

“This is troubling because singling out [programs] is a prerequisite to the policy goal of expanding strong [programs] and shuttering weak ones,” they write.

Indeed, a number of states — 16, according to the researchers — have made efforts to evaluate training programs based in part on value-added measures. The Obama administration issued regulations encouraging states to evaluate their teacher prep programs by measuring student learning, though they were scrapped by Congress and the Trump administration earlier this year.

Dan Goldhaber, who has studied Washington’s teacher training programs and is a professor at the University of Washington, said this latest study is consistent with his own findings.

“I think there’s relatively little variation in the value-added effectiveness of teachers who hold credentials from different programs,” he said.

Still, he notes, the size of the impact is in the eye of the beholder. At most, the difference between attending a good versus an average training program is comparable to the difference in effectiveness between the average first- and third-year teacher — definitely not big, but not necessarily zero.

An inherent limitation of this research is that it focuses exclusively on the fraction of teachers who end up in tested grades and subjects, largely fourth- through eighth-grade math and English.

It may be more helpful to judge teacher preparation programs by multiple measures. For instance, recent research has found that there is substantial variation in how different programs affect teachers’ scores on classroom observations, which can be used to evaluate all teachers, not just those in tested areas.

Still, isolating the impact of training remains a challenge, since teachers are not randomly assigned to schools, and some programs aim to place teachers in high-poverty schools where attaining high ratings may be more difficult.

Bellows, the Duke researcher, warned against chasing after programs that might be appear effective simply because of a statistical fluke.

“You don’t want to remodel our [teacher preparation programs] based on one that looks really good when … it’s just by chance,” she said.

However, Bellows and von Hippel’s research suggests that there is some reason for optimism. In five of the six states, there was at least one large program that appears significantly better at preparing teachers in at least one subject.

“If we are very careful, we can occasionally identify a [program] that is truly exceptional,” they write.