Today’s New York Daily News published a bold editorial on the progress of New York City schoolchildren under the administration of Mayor Mike Bloomberg and Chancellor Joel Klein.  “You would be better off arguing that the world is flat, or that the sun revolves around the Earth, than to dispute that New York City kids are performing better and better in school,” writes the Daily News, crowing that there are “fresh and incontrovertible data” pointing to what the newspaper refers to as a “sea change” in New York City. 

They might have wanted to wait a day.

This morning, the U.S. Department of Education released the 2009 results of the National Assessment of Educational Progress assessments of fourth-grade and eighth-grade mathematics in each state and for the nation overall.  Nationally, fourth-grade performance held steady from 2007 to 2009, and there was a slight but statistically significance over this period in eighth-grade math performance.  In New York State, the small declines in fourth-grade and gains in eighth grade were not statistically significant, leading to the conclusion that there has been no change in the performance of New York students on the NAEP math assessment from 2007 to 2009. 

This is a very different story than the one told by New York’s own assessment system, on which the Bloomberg and Klein administration has staked its claims about the great progress in student achievement.  The average scale score in fourth-grade mathematics increased from 680 in 2007 to 689 in 2009, a hefty 9 points;  the jump in eighth-grade scores was even more dramatic, as the average scale score rose from 657 in 2007 to 675 in 2009, a remarkable increase of 18 points.

To put these two sets of numbers in context, the chart below shows the gains in fourth-grade and eighth-grade math performance from 2007 to 2009 expressed in standard deviation units (i.e., the amount of variation among individual students in 2007).  According to NAEP, fourth-graders’ performance fell .07 standard deviations from 2007 to 2009, a difference that is not significantly different from zero.  In contrast, fourth-graders gained .23 standard deviations on the New York State assessment from 2007 to 2009.  Similarly, the NAEP results indicate that eighth-graders in New York gained .08 standard deviations from 2007 to 2009 in math performance, a difference that is not significantly different from zero, but they gained .47 standard deviations over this period on the New York State test.

flat-earth

Another way of comparing the implications of the two different sets of test results is to think about where the average student in 2009 would have scored in 2007.  Based on these standard deviations, and assuming that the scores follow a bell-curve distribution, the New York State scores indicate that the average fourth-grader in 2009 scored at the 59th percentile of the 2007 fourth-grade distribution, which is a pretty big jump.  The increment for eighth-graders is even more striking:  the average eighth-grader in 2009 scored at the 68th percentile of the 2007 eighth-grade distribution, based on the New York State tests.  In contrast, the NAEP data indicate that the average New York fourth-grader in 2009 scored at the 47th percentile of the 2007 distribution of fourth-grade math performance in New York State, and the average eighth-grader in 2009 scored at the 53rd percentile of the 2007 eighth-grade distribution.
 
How can we explain these differences?  There are lots of possible explanations, but most of them don’t hold up under close scrutiny.  The two tests are taken by similar populations of students under similar conditions, and the grade-level mathematics standards on which the two assessments are based do not differ dramatically.  The NAEP test is a low-stakes test, which might result in students not taking it seriously, but the statisticians who oversee the NAEP testing program look for patterns suggesting this, and find little evidence of it.  It’s extremely unlikely that there’s rampant cheating going on in the New York State testing system that could explain the differences. 

 It’s possible that the New York State tests have been getting easier over time.  I have yet to see definitive evidence ruling this out.  There also is strong suggestive evidence of “score inflation” in the New York State tests, because there are predictable patterns in the standards which appear on the state tests year after year, with some standards showing up repeatedly each year, and some standards having never been tested at all during the life of the testing program.  Schools and teachers can make use of these patterns, which also show up in the format of test questions covering particular standards, to focus their instruction on the subset of standards that crop up again and again.  Because the New York State tests never test some standards, we have no idea about whether students have mastered them.  In contrast, the design of the NAEP assessment allows for a much broader picture of mathematics performance, because so many more standards and test item formats are incorporated into the test.

 Whatever the reason, the discrepancy between the NAEP trends and trends in the NewYork State test scores raises serious questions about what the New York tests are telling us about the academic performance of students in New York State.  The same, of course, goes for New York City.  We’ll see NAEP scores for New York City in a month or so, but it’s unlikely that they will yield a different story than what I’m describing here.
 
Is the Earth flat?  No.  But New York State test scores, and probably New York City scores, are.