Why NAEP Matters

NYC Chancellor Joel Klein’s response in Wednesday’s New York Times to Diane Ravitch’s op-ed last week provides a lot to chew on. Today, I’m focusing on his comments about the National Assessment of Educational Progress (NAEP), which is also known as the Nation’s Report Card. NAEP began collecting data in 1969, and remains the only federal assessment designed to report on trends in the academic performance of U.S. children and youth. All 50 states and the District of Columbia participate in NAEP, as does New York City and an increasing number of other urban school districts. NAEP has an annual operating budget of more than $130 million per year, which represents a significant share of federal investments in education research. Though not an expert on testing and assessment, Diane Ravitch has a long-standing interest in NAEP—she was appointed to the bipartisan National Assessment Governing Board (NAGB), which oversees NAEP, during President Bill Clinton’s second term, and remained on the board until 2004.

One of the ways that NAEP differs from many other standardized tests is that NAEP is designed to yield a much wider picture of the subject-matter knowledge the test is intended to measure. Many standardized tests are designed to provide an accurate picture of a particular child’s performance. It’s efficient to do so by having all test-takers respond to the same set of test items. If a group of fourth-graders all answer the same 45 items in a 90-minute math exam, we can learn a lot about performance on those particular items, which are chosen to be representative of the content domain they are supposed to represent (such as fourth-grade math). But such a test would tell us little about student performance on other items that might have a different format, or address different fourth-grade math skills. NAEP addresses this problem by having many more test items, but no child answers all of the items, because that would take hours and hours of testing time. Instead, each child responds to a sample of the items, and the performance on these items is combined across children to yield a picture of the performance of children in general. Testing experts such as Dan Koretz at Harvard believe that assessments such as NAEP are less vulnerable to score inflation than state assessments because it’s more challenging to engage in inappropriate test preparation when there are so many potential test items a student might respond to. But the tradeoff is that NAEP is not designed to provide a reliable and accurate measure of performance for a particular child.

Let’s look at what the Chancellor had to say about NAEP:

“The national tests [Ravitch] cites are not the measure of federal accountability, are given only to a small sample of schools, and are not aligned with New York State standards and therefore with what we teach in our classrooms. (That said, our fourth-grade scores on those tests are strong.)”

Not the measure of federal accountability. The No Child Left Behind Act delegated to states the responsibility of developing systems of learning standards and assessments designed to measure progress towards universal student proficiency by 2014. It’s true that the tests that are used to assess the performance of the New York City schools for NCLB purposes are state assessments, not NAEP. But it is misleading to say that NAEP is not a measure of federal accountability. The tests administered by the 50 states vary considerably in their difficulty, with some states reporting much higher rates of student proficiency than are indicated by student performance on the NAEP assessment. In New York City, 56% of fourth-graders in 2007 were judged proficient on the New York state English Language Arts test, whereas only 25% reached proficiency on the NAEP reading assessment. New York City and New York State are by no means distinctive in finding much higher rates of proficiency on state tests than on NAEP—many states have even larger disparities—but the unevenness of the proficiency standards across states, and the fact that state tests change frequently over time, has led Congress and the U.S. Department of Education to rely on NAEP as the primary measure of trends in the performance of American schoolchildren over time. Moreover, Education Secretary Arne Duncan has recently advised state superintendents that they should report state NAEP performance in their state and district report cards documenting performance under NCLB. In these ways, NAEP is very much a measure of federal accountability.

Given only to a small sample of schools. For the life of me, I can’t figure out why the Chancellor thinks this is relevant. A well-designed sample will yield estimates of student performance that are unbiased and accurate, and the New York City sample is designed by leading statisticians to be representative of the population of New York City students and large enough to detect meaningful differences between New York City and other jurisdictions, as well as meaningful differences over time.

Not aligned with New York State standards and therefore with what we teach in our classrooms. It would seem unfair for New York City schoolchildren to spend the year studying Shakespeare, and then be assessed on their knowledge of contemporary American fiction. In reality, the curricular content of NAEP and the New York State assessments doesn’t diverge that much. For example, in eighth-grade mathematics, the state specifies 104 distinct standards in the arenas of problem-solving, reasoning and proof, communication, connections, representation, number sense and operations, algebra, geometry, and measurement. (Keep in mind that these 104 standards are assessed via only 45 test items.) The NAEP framework allocates test items to number properties and operations (20%), measurement (15%), geometry (20%), data analysis and probability (15%), and algebra (30%). I’m not going to do a detailed comparison, but I invite readers to look at the NAEP standards and see if they represent content that you think is unimportant for eighth-graders to know.

Our fourth-grade scores on those tests are strong. Surely the Chancellor must know that, when a test is administered in both the fourth and eighth grade, and he claims that the fourth-grade results are “strong,” and says nothing about the eighth grade, a reasonable person might wonder about the eighth-grade results. In fact, there have been no statistically significant gains in eighth-grade performance in New York City in either reading or math between 2003 and 2007 on the NAEP assessment, and no gains in fourth-grade reading either. Fourth-grade scores in New York City are “strong” only in the sense that there were significant gains in fourth-grade math performance from 2003 to 2007.

A final note: New York City has been participating voluntarily in the NAEP Trial Urban District Assessment since 2002, so presumably the Chancellor believes that there is something to be learned from the performance of New York City’s children on the NAEP assessments. And the Department of Education’s press office has had no qualms about crowing about NAEP results when the Department believes there is good news to share. But a Department, and a Chancellor, truly committed to transparency would be willing to acknowledge the bad with the good, and present a balanced picture of successes and failures. Writing off NAEP as if it doesn’t matter fails to meet that standard.

About our First Person series:

First Person is where Chalkbeat features personal essays by educators, students, parents, and others trying to improve public education. Read our submission guidelines here.