50 years ago, one report introduced Americans to the black-white achievement gap. Here’s what we’ve learned since.

First Person is where Chalkbeat features personal essays by educators, students, parents, and others thinking and writing about public education.

This summer marks the 50th anniversary of one of the most influential education research reports ever released.

The report — colloquially known as the Coleman report after its lead author James S. Coleman — unveiled two major surprises. First, it revealed an enormous achievement gap between America’s black and white students. Second, it suggested that the gap arose largely from differences among families.

Over the last 50 years, the Coleman report has become its own institution. It has been scrutinized, corroborated, covered up, used to make social policy, and, ultimately, dramatically improved upon.

Here, we tell the story of the Coleman report and the important, fascinating, and still evolving school of research it has spawned. It is a story about how scholars have tried to unravel the tangled relationships between race, income, school, and children’s academic and life outcomes. And it is about the surprising conclusions they’ve reached.

In fact, differences in school funding do not explain students’ different outcomes. Schools and teachers explain less of the variation in student achievement than many think. And yet schools, on average, appear to help reduce inequality, at least among young children.

Here is a guide to the most important things we now know about the sources of America’s racial achievement gap — and the urgent questions that remain.

It all started with a 700-page report that said something surprising: family background, not schools, explained most of the yawning gap between the achievement of America’s white and black students.

The Coleman report made headlines in the months after its publication for what it said: that family background, not schools, explained most of the gap in achievement between American’s white and black students.

Although this sharply contradicted the orthodoxy in both academic and policy circles, it took months for this information to penetrate public discourse, in part due to prevailing political winds and in part to academics’ hesitance to rush to judgment. When it did, however, it shaped both social policy — the 1970s push for busing and integration was founded in part on the report — and the future of education research.

But first, some background. Congress commissioned the Equality of Educational Opportunity Study as part of the 1964 Civil Rights Act. In many ways, Coleman was a logical choice to lead the study. A polymath with interests in sociology, mathematics and economics, he had completed a Ph.D. in Columbia University’s prestigious sociology department. Coleman had experience with large-scale quantitative data collection and analysis, having recently finished a survey and then book on adolescent culture. Coleman was also known to support civil rights; in 1963, he and his family had been arrested for demonstrating outside an amusement park that refused to admit African-Americans.

The study itself was massive. In the fall of 1965, Coleman and his team collected data from 4,000 schools, 66,000 teachers and almost 600,000 first, third, sixth, ninth and 12th graders — one of the largest stand-alone testing and survey efforts ever undertaken in U.S. schools.

Coleman and his team also produced the data and subsequent report within a remarkably compact timeline. By way of comparison, modern studies that enroll more than a few hundred students can take up to (or more than, if we’re talking about my own studies) two years to conceptualize and design. By contrast, Coleman’s study took just over a year to fashion from stem to stern.

The looming July 1966 deadline led Coleman, according to David Cohen, a professor of education at the University of Michigan, to “hole himself up in a hotel with a very large supply of bourbon and deliveries of printouts” in order to write his portion of the report. The resulting tome was well north of 700 pages, much of it devoted to thorough analysis of statistical tables and graphs.

Surprising to many was the news that that schools serving black and white children did not look very different on a bundle of measures.

In some respects, Coleman’s analysis found what you would expect looking backwards toward 1960s America: mostly segregated schools, disparities favoring white children in some resources like class size, school facilities and the availability of advanced coursework, and heavy race-based inequality on tests of academic achievement.

To a small group of educators and civil rights activists who knew schools, this last finding felt familiar. Earlier evidence had showed wide disparities in student test scores, and the Elementary and Secondary Education Act, intended to provide aid to impoverished schools, had passed Congress in 1965 in part to alleviate this gap. Yet the achievement gap had been kept quiet, “sort of like your demented aunt in the attic,” according to Cohen.

Surprising to many, however, was the news that that schools serving black and white children did not look very different on a bundle of other measures, including the age of school facilities and textbooks, the availability of extracurricular clubs, and many teacher and principal characteristics. Even more surprising was Coleman’s assertion that inequities in school resources did not explain the observed inequalities in average student achievement.

Where differences among schools serving black and white students did exist — in the availability of resources like science laboratories, advanced curricula, textbooks and qualified teachers — these differences explained little in terms of student achievement, once other factors were taken into account. Instead, it was family background — specifically, parental income and education, wealth, and aspirations for their children — that proved a strong influence on student test scores, along with students’ peers. As Coleman noted midway through the report:

“One implication stands out above all: That schools bring little influence to bear on a child’s achievement that is independent of his background and general social context; and that this very lack of an independent effect means that the inequalities imposed on children by their home, neighborhood, and peer environment are carried along to become the inequalities with which they confront adult life at the end of school.”

These conclusions were initially slow to penetrate public discourse. The Johnson administration largely succeeded in limiting media coverage to the report’s findings on racial segregation in schools, in part to assist Congressional passage of an extension of the ESEA — legislation that was of questionable value, according to Coleman’s 1966 report.

Yet in 1967, Daniel Patrick Moynihan, an urban policy scholar and later a U.S. senator, began to deliver speeches and write articles about the report, which he saw as buttressing his views regarding the importance of families in reproducing inequality. Moynihan even managed to get Coleman called before a congressional committee that, among other things, entertained the possibility that Johnson staffers had covered up the report.

With the Coleman report came a sea change: children could be tested, and those test scores could be explained by a host of family, classroom, and school features. In Coleman’s words, it “reshap[ed] the way educational research questions were asked.”

Once evident, the report’s findings set off a strong reaction in the policy world. News that schools serving black and white children differed little on many measurable resources defied the conventional wisdom among liberals and progressives, which leaned toward the view that disparities in school quality reinforced or magnified racial and socioeconomic stratification.

“Everybody knew that the schools were worse for black kids than for white kids just like everybody knew that Communism was a threat,” said Christopher “Sandy” Jencks, then a reporter for the New Republic and fellow at the Institute for Policy Studies in Washington D.C. Yet when data failed to bear this out, scholars and others were at a loss.

In addition, news that school characteristics scarcely influenced student outcomes suggested that schools would be of scant use in closing achievement gaps. “Holy mackerel, what are you going to do when school’s not working?” Marshall “Mike” Smith, a principal data analyst for a subsequent reexamination of Coleman’s results, recalled thinking. “And the way they don’t work in this report was that they didn’t equalize outcomes.”

Less obvious to the public was the seismic shock the report set off in the world of education research. At the time of its release, quantitative studies happened mostly in educational psychology departments, where investigators conducted small-scale experimental studies with students. Though psychologists possessed the necessary statistical tools and had in fact conducted a large-scale schooling study, Project Talent, only a few years earlier, they had no natural interest in the role of schools in social stratification. Sociologists of education did think of schools along these lines, but at the time few tackled such questions with more than anecdotes or philosophical arguments.

But with the Coleman report came a sea change: children could be tested, and those test scores could be explained by a host of family, classroom, and school features. The report, in Coleman’s own words, “reshap[ed] the way educational research questions were asked.”

Molding this new field’s growth was scholars’ general consternation over and, to some degree, suspicion of the report’s findings. In particular, evidence that school resources did little to predict student test scores cut against common-sense assumptions — that more money could buy better student outcomes.

A group of academics tried to disprove the report — and couldn’t. Their questions sparked a generation of education research.

When faced with controversial findings, scholars generally lock themselves up – either alone or in groups – to check and recheck the data. At Harvard, a group of scholars led by Moynihan and Tom Pettigrew, a young social psychologist whose work focused on race and the impacts of integration, did just this, convening a year-long seminar to reanalyze the report’s data.

Originally designed to be small, the seminar eventually attracted dozens. In typical Harvard style, the main work of the seminar occurred after dinner and drinks at the Harvard faculty club.

“We’d sit around and analyze data,” said Smith. “I would give them data sheets. I’d give them data analysis, looking at some hypothesis that they’d come up with in prior meetings. And we’d pore over these tables.”

Seminar attendees were a who’s who of educational research at the time, and a who-would-be-who in educational research and policy over the following decades. Ted Sizer, a public intellectual and also, at the time, the dean of the Harvard Graduate School of Education, led a policy committee. Frederick Mosteller, a widely respected Harvard statistician, contributed analytic expertise. Smith, the data analyst, later served as a key advisor to three administrations on education policy and became a leading architect of the 1990s standards-based reforms, a precursor to the Common Core. Jencks, the reporter for the New Republic, participated in and wrote about the reanalysis, as did Eric Hanushek, then a graduate student in the economics department at MIT.

It’s likely that at no other time in the history of education research did so much intellectual firepower work collectively and in a sustained way toward a common goal. Jencks ended up at Harvard and Hanushek at Stanford. Hanushek credits the seminar with moving him toward a career in quantitative education research: “It was formative. It got me into this whole area of research. And I continue to be there.”

Seminar attendants — and by extension, the field more generally — had a lot of work to do to understand schools’ impact, and to understand schools’ impact on social inequality specifically. One reason was that the report collected only snapshots of student test scores at a single time point, not documentation of changes in those scores as students aged. Snapshot data does not allow analysts to disentangle the many factors that might contribute to student outcomes – schools themselves, but also families, neighborhoods, health care and child care access – a fact that Coleman knew and carefully navigated in his report to Congress. Instead, information about students’ rates of learning would eventually be necessary to address the question of whether schools served to reduce or exacerbate achievement equality.

It’s likely that at no other time in the history of education research did so much intellectual firepower work collectively and in a sustained way toward a common goal.

A second issue related to the structure of his dataset. In order to get clearance from what later became the U.S. Office of Management and Budget, a body that oversees the manner in which federally-funded research may conduct business in schools, the team working on the Coleman report could not link students to their teachers, only to the schools they attended. Although Coleman had found a positive correlation between students’ test scores and school-average teacher scores on a 30-question SAT-like test, that correlation was small, and he could not identify the extent to which teachers overall — not just their test scores — contributed to students’ outcomes.

This problem was symptomatic of a wider issue, too, as the report’s dataset could only correlate schools’ average resources and student outcomes, rather than exploring the ways resources were distributed to students within schools within ability groups, academic tracks, or in other ways.

The third problem with the report related to the relatively underdeveloped methods in the social sciences for answering complex questions. Coleman conducted his analyses using techniques the Harvard seminar attendants roundly criticized and rapidly replaced with methods popular in economics. Techniques for properly handling missing data – extensive in the report’s dataset – did not appear until the 1970s. Even a lack of computing power played a role; Coleman’s IBM-7094 at Johns Hopkins had strained mightily to churn out the relatively simple statistics it did produce. Only in the 1980s did computing power and statistical software become available to run more complex models.

A fourth problem concerned what the the report did – and did not – measure. The main student outcomes were basic tests of vocabulary, comprehension and computation, rather than a more robust set of indicators of student success (say, for instance, “grit” or high school graduation). And the indicators of school quality tended toward easily counted objects, like the number of books in the library or whether a school had a science lab.

In a recent interview, Jencks described the sense of the seminar on this point: “Everybody knew you had to worry about the things that were left out. You had to be a moron not know that, ‘Well, if you’re looking at the class size, there’s a lot of things that probably go with that and they might be what’s explaining what looks like an effective class size and so forth.’”

Participants in the Harvard seminar argued over these issues and more. Remarkably, however, at the end of the day their collective re-analyses largely showed that Coleman’s original findings stood. Schools appeared to exert relatively little pull – explaining only 10 to 20 percent of the variability in student outcomes – while family background, peers, and students’ own academic self-concept explained a much larger amount.

The report spawned new research into things like whether students would do better if their schools had more books or science labs. The answer: only when used wisely.

Yet the process of critiquing, reanalyzing and, ultimately, inventing helped seminar attendees and others shape the path the larger field took forward. Over the years the federal government funded and collected an alphabet soup of new datasets – High School and Beyond, National Longitudinal Study, and several waves of later data collected under the moniker Early Childhood Longitudinal Study. These datasets tracked students over time, better allowing scholars to separate home and school effects.

Scholars cast about for methodologies that would solve the report’s analytic problems, then applied them to these datasets. And new thinking about how schools, classrooms, and families contributed to child outcomes led to innovative and improved measures — almost as many measures, in fact, as there were assistant professors to write papers about them.

Cohen summarized: “From one perspective, the [Coleman report] was a very, very clumsy and crude instrument, and probably not to be believed. But from another perspective, even if that was true, it set off a whole stream of research, which greatly improved the understanding of how schools do work.”

One area of immediate inquiry focused on improving the measurement of what the field calls “purchased school inputs.” Scholars, mostly economists, found new things to count, and more accurate ways to count them. Coleman’s measures related to spending, for instance, had been obtained only at the district level, yet considerable evidence existed that schools’ funding levels differed within districts, leading economists to obtain such data. Some studies even followed the dollars within schools, measuring the number of square feet in classrooms, the number and types of books in classroom libraries, and the journals read by teachers.

“We’ve gotten much more sophisticated about our ability to match resources to the individual students who are exposed to them,” said Aaron Pallas, a sociologist at Teachers College, Columbia University.

Yet even with better measurement techniques, more complete datasets, and more sophisticated modeling techniques, dozens of studies conducted through the late 1990s failed to consistently link tangible school inputs to student test scores.

By 2000, scholars had reached wide agreement on a broader point: that how schools use dollars appears to matter more than the mere presence of dollars.

Schools in impoverished communities are demonstrably worse, in terms of facilities, access to textbooks, and many measures of teacher quality than schools serving non-impoverished communities. Standing alone, the relationship appears quite strong. However, once controlled for family background and students’ previous-year test scores, allowing analysts to estimate how the resources influenced student test score gains, the relationship typically disappeared.

“At some level money does matter. You can’t run a school without a building, a teacher and a textbook. And maybe an iPad,” remarked Eric Hanushek of Stanford University. But based on the lack of relationship between resources and student outcomes, said Hanushek, simply adding more money to schools is unlikely to raise performance. “You can’t just write a bigger check to each school and expect to get much out of it, because there’s no evidence, on average, that schools will find good ways to use that money.”

To many, Hanushek’s assertion makes sense: measures of countable things — the age of books, the condition of the school, class size, and even teacher salaries and certification — do not capture what happens in the classroom.

In my own work, I have seen many skilled, committed and compassionate teachers do excellent work despite poor facilities and large class sizes. By 2000, scholars had reached wide agreement on a broader point: that how schools use dollars appears to matter more than the mere presence of dollars. For instance, recent studies have suggested that adopting effective curriculum materials and training teachers in their use show consistently positive effects. Dollars, steered toward the right “purchased school inputs,” do make a difference.

Yet even here, the ability of resources to explain gaps in student outcomes is limited. For the average student, the difference between an effective and ineffective curriculum is about 10 percentile points on standardized tests. This difference would go a long way toward explaining income and race-based achievement gaps if effective programs were largely found in whiter, more affluent schools and ineffective programs were concentrated in impoverished schools. There’s little evidence on this issue, but it seems unlikely to be the case, particularly given recent federal regulations that require aid dollars be used to purchase effective curriculum materials.

A key realization: A lot of what influences student achievement happens before they get to school.

One set of factors that did explain student outcomes – with force – was family background, a term scholars use to refer to factors including race and ethnicity as well as parental income and education.

In Coleman’s report, race-based differences in academic achievement not only existed but were in fact quite large in the first grade. Again and again over the subsequent decades, scholars replicated Coleman’s finding. Federal data collected in 2010, for instance, showed the average black child roughly a half-year behind the average white child in mathematics at the start of kindergarten, and a third of a year behind in reading. Comparisons of families in the top and bottom of the income distribution find similar gaps.

These differences were striking and occurred, obviously, prior to any formal schooling. As better datasets and more advanced models became available in the decades after the report’s release, scholars set to work identifying and evaluating potential explanations for these gaps.

One such explanation is genetic differences among children: a fair portion of intelligence is inherited, and perhaps low-income or minority children were less lucky, in terms of their genetic endowment. Yet rigorous studies of intelligence and genetics discount such a theory, as does evidence from intelligence tests performed with infants.

Using a dataset that followed a nationally representative sample of children from birth, Roland Fryer and Steven Levitt found in a 2013 study that the average black-white difference in nine-month-olds’ mental functioning — a metric that measures infants’ exploration, expressive babbling, and problem-solving — was about one-tenth the typical differences found by kindergarten. When the authors used statistical techniques to account for differences in family demographics and children’s home environment, the relationship became even smaller; when the authors further accounted for children’s birthweight and prematurity, the direction of the relationship flipped, nominally favoring black children over white students.

In Coleman’s report, race-based differences in academic achievement were quite large in the first grade. Again and again over the subsequent decades, scholars replicated that finding.

By the time children were two years old, however, the situation looked markedly different. At that age, Fryer and Levitt showed that the typical black-white score difference had grown to about half the size of the kindergarten gap. Controlling for home environment, birthweight, and family demographics, however, only halved the size of the gap, rather than reversing it. Asian and Hispanic toddlers also showed a similar disadvantage versus white toddlers.

The appearance of the achievement gap in the second year of life — and related evidence that heredity has little to do with intelligence — led investigators to other potential explanations.

“It’s certainly the case that from birth, and actually before birth if you think about the prenatal environment that kids in different socioeconomic strata are exposed to, children have different challenges and opportunities to learn,” said Greg Duncan, an economist at the University of California Irvine whose work has focused on explaining early childhood outcomes. “Over the course of five years up to kindergarten entry, these accumulate to very dramatic differences in both reading achievement and numeracy.”

The list of ways that family background influences student outcomes is long. Parenting practices are one conduit. Middle and upper-income parents tend to be more authoritative, setting boundaries but explaining those boundaries to their children, responding to their needs, and encouraging independence and growth — all activities made easier by the time and peace of mind that money can supply. Low-income parents tend to be more authoritarian, emphasizing rules and punishing disobedience.

There’s some sense that this approach may be adaptive to families’ context, says Peg Burchinal, an early childhood researcher at the University of North Carolina: “If you live in inner-city Baltimore, it’s really important that the child do the right thing at the right time or that child could end up dead.”

Parents living at or below the poverty level also typically have less time to engage in activities that lead to positive school outcomes — reading storybooks, tracking schoolwork, and even just carrying on extended conversations with children — and are less likely to be themselves raised in households that featured these parenting activities. Poverty and its related stressors also appear to double the incidence of maternal depression, which itself has been further negatively linked to child pre-K outcomes.

Says Burchinal: “If you’re very secure economically, it’s very easy to devote time to your children. If you are worried about every aspect of your life, your relationship, your income, your relationships with your employer, it’s very difficult.”

In the pre-school years, income — and by extension family economic opportunities — appear to be a driving factor in children’s outcomes.

Family income — how many dollars a family accumulates, and what those dollars can purchase in families’ neighborhoods — is another conduit between social status and student outcomes.

Dollars buy childcare of either better or worse quality, and although low-income families’ access to better-quality care has increased in the past three decades with the expansion of subsidized childcare, Head Start, and district-based pre-K programs, says Burchinal, these programs often differ from those available in more affluent communities. In programs serving high-income families, teachers tend to engage in extended conversations with children and design classroom activities with an eye toward enhancing child development in the long term. In programs serving low-income families, these elements are less often present.

Dollars also enable families to purchase child enrichment activities: Duncan estimates that families earning $25,000 a year spend roughly $1,300 per child per year on summer camp, vacations, outings, and educational programming; families earning $135,000 year spend almost 10 times that amount.

Perhaps unsurprisingly, once scholars correct for these economic differences among families, including income, the black-white test score gap diminishes in size, and sometimes reverses in direction. For instance, Fryer and Levitt showed in a 2004 study that black students outperform white students on reading at kindergarten entry once only a relatively small set of family background factors are taken into account.

In the pre-school years, income — and by extension family economic opportunities — appear to be a driving factor in children’s outcomes.

Meanwhile, evidence mounted for one central conclusion: schools matter – but not as much as people might think.

Once in school, students experience the influence of both family and school characteristics, yet identifying the unique effect of each was largely out of scholars’ grasp in the first decades after the report.

Although most scholars and policy-makers intuitively believed that schools and teachers led students to learn — for lack of a better word — “stuff,” the scholarly archives were far from teeming with evidence regarding schools’ impact on students’ cognitive growth. This situation even led several prominent sociologists of education to write a paper, circa 1985, looking for proof that schools led students to learn at all. (The answer: yes.)

Meanwhile, however, educational statisticians were fashioning a new way to think about and model the effect of schools and teachers on student learning. The Coleman report and similar studies had correlated tangible school resources with student outcomes, finding little. Yet the data also suggested there were substantial differences among schools that could not be explained by observed differences in resources. Thus beginning in the 1980s, scholars began ask whether and how much assigning a student to School A versus School B versus School C might affect their test scores.

A hypothetical walk through what statisticians call “student growth curves” — student test performance plotted over time — helps illustrate this new modeling technique. Let’s say that plots of hundreds of elementary students’ performance over the early grades show that students gain, on average, seven or eight points every year. In late elementary and the middle school years, students learn, on average, only five or six points each year — leading to a downward bend in average student growth rate.

Whether anyone can explain it or not, something associated with differences between schools does appear to explain student outcomes.

Now let’s say that grouping these children by school, as this modeling technique can do, shows that students in school A gain one extra point per year while students in school B grow only at the sample average. Further, students in school A experience less deceleration of their growth in the later grades. By doing this over enough students and schools, these models estimate the extent to which school assignments deflect students off their typical growth patterns.

Initial findings from such models agreed with the 1966 Coleman estimates. Tony Bryk and Stephen Raudenbush, who literally wrote the book on these newer modeling methods, used another Coleman dataset to show that differences among schools accounted for about one-fifth of the variability in student outcomes. Even as modeling techniques have improved, that has remained an upper bound.

In another two decades, experiments using lottery data from oversubscribed urban schools — in other words, the most desirable schools in the eyes of urban parents and likely the top performers among all schools — began to clarify the size of this advantage. In a study of oversubscribed Boston charters, for instance, economists estimated that these schools closed between half and two-thirds of the black-white test score gap each year of middle school. In New York, newly configured small high schools improved students’ probability of high school graduation by nearly 7 percent.

What drives these schools and other high performers is still a matter of debate. Both early and recent evidence suggest that successful schools meet the most basic needs of their inhabitants: students and faculty report feeling safe, teachers have high expectations for students, and students attend to their studies seriously. Many of the urban charters included in lottery studies have a “no excuses” philosophy, which focuses on maximizing instructional time, minimizing behavioral disruptions, and improving test scores.

Beyond this, key school characteristics have been hard to measure. Many — school trust, teacher collaboration, principal leadership, teacher working conditions, teacher efficacy, academic optimism — appear to positively predict student outcomes, but studies have yet to understand whether these are related or distinct from one another, and which ones are causally related to student outcomes.

Yet whether anyone can explain it or not, something associated with differences between schools does appear to explain student outcomes. But this research has also shown that in the context of the overall variability in child outcomes, schools still pack a weaker punch than many imagine.

Even in the most sophisticated models, differences in family background, students’ intelligence, temperaments, and childhood experiences explained the majority, and in some datasets the vast majority, of children’s trajectories across the school years.

Despite the public’s focus on the shortcomings of school, research shows that America would be more unequal without it.

So far, scholars had been unable to fully untangle the causes of growing educational inequality. To do it, researchers made clever use of an artifact of the U.S. school system: summer recess.

Observing students’ academic growth over the summer, reasoned sociologists like Barbara Heyns, author of an influential 1978 study on this topic, would provide insight into how students’ natural rates of learning differ by race and class. Comparing summer learning to the corresponding school-year rates of growth would make the unique impact of schools visible.

Although Heyns and others had designed studies based on this logic since the 1970s, the best datasets for answering these questions were not created until nearly 30 years later, when the 1999 Early Childhood Longitudinal Study followed a nationally representative sample of children through their first years of school. A second ECLS began tracking a new cohort of kindergartners in 2010.

Both studies tested young students in the fall and spring, a key condition for differentiating summer from school-year growth. Analyses of both clearly show that students steadily learn during the school year, but that the average rate of learning drops to zero, in some subjects and grades, over the summer recess. Schools, when all is said and done, are fairly effective in teaching students at least some math, reading, and science each year.

Answering the question about the role of schools in social stratification required asking how student growth rates differed over time. One version of this question: When plotted over time, do kids’ growth rates look more parallel to one another during the school year as opposed to the summer?

The answer was yes: during the school year, student growth resembles telephone wires tracking steadily up a hill. During the summer months, however, those learning rates resembled more of a fan, with some children learning quickly, others not at all, and still others losing ground. Schools reduce overall variability in academic outcomes by making students’ growth look more similar during the school year than over the summer.

So absent was evidence of schools’ effects on students’ growth that several prominent sociologists wrote a paper, circa 1985, looking for proof that schools led students to learn at all. (The answer: yes.)

A second version of this question focuses on the role of family income and parental education in explaining these summer and school year growth rates. Here, the results are again unequivocal. Douglas Downey, a sociologist at Ohio State University who has conducted the most extensive work on seasonal differences in children’s growth rates, reports that “The best evidence suggests that schools reduce those [income] gaps. We observe the gaps in reading and math skills grow in the summer when students are not in school, and then those gaps don’t change much while school is in session.”

In other words, the students losing ground during the summer tend to come from poor families; children in non-poor families either hold their ground or gain, probably owing to the array of resources non-poor families marshal both within and outside the home. Schools, somewhat remarkably given the wide differences in school resources across advantaged and disadvantaged children, notes Downey, manage to make students’ rates of growth more similar to one another across class lines during the academic year.

Downey’s skeptics note that the school-year parallel lines do not necessarily mean schools are compensatory; parallel does not close the achievement gap. But Downey and others disagree.

If children did not attend schools at all — a seemingly ridiculous situation, but arguably the correct one if the question of interest is the impact of schooling on inequality — students’ growth rates would continue fanning out indefinitely, and where children ended up in the fan would be heavily determined by their family background. U.S. elementary schools, in other words, compensate for the disadvantages experienced by poor children.

For race and ethnicity, the story is more complicated. In Downey’s analysis of the original ECLS, black children’s rate of learning (corrected for family income) kept pace with white students over the summer, but fell about 10 percent behind white students’ during the school year. An analysis of a more recent wave of ECLS data by David Quinn of the University of Southern California and two Harvard colleagues suggests that black children learn more rapidly than or at the same pace as white students in some grades and subjects, but lag in others. The mixed findings are also true for Hispanic students.

Similar to the story on why some schools perform better than others, there are no clear-cut explanations for these slower growth rates among black and Latino students. Coleman’s report pointed to peer effects — essentially the impact of attending school with other students of similar academic background and ambitions — but many other explanations might hold: a slower-paced curriculum, lower-quality instruction, lower teacher expectations, implicit racism. It is likely that sorting among these explanations will take yet another set of studies and measures.

The data also tell an interesting story regarding another ethnic group. “There’s a hint that schools are potentially not a favorable institution for Asian Americans,” said Downey. “This is puzzling, because Asian Americans perform well in schools on average. Is their performance good because of schools or in spite of schools?”

The seasonal comparisons seem to be trending toward the “in spite of” explanation: in both ECLS cohorts, Asian American students’ summer growth rates are often stronger than white students’, but Asian American students’ growth either resembles or even lags white students’ during the school year. Downey continued: “There may be some processes in schools that are undermining the gains of Asian American students. What exactly those are — it’s kind of speculation.”

Delaying tracking until high school would preserve the equalizing effects of schools over the early adolescent period. Yet this is not a choice most states and districts make.

One explanation may be an artifact of the ways schools compensate for out-of-school social inequality. Downey notes that schools classify students by age into grades, then teach them a common curriculum regardless of child ability level — a process likely to help low-performing students by simply exposing them to grade-level content, and also to stymie high-performers’ growth by returning them to material they have already mastered. Most teachers also report, in surveys, directing most of their attention to struggling children rather than high-performers — another compensatory mechanism. Thus Asian Americans, who arrive in kindergarten far ahead of their non-Asian peers, may see their school-year growth slowed by these same forces that boost low-income children’s achievement.

This narrative describing schools as equalizers differs considerably from that in public discourse, which often focuses on schools’ shortcomings. Adam Gamoran, a sociologist who is now the President of the William T. Grant Foundation, explained why: “People focus on raw numbers. We look at schools for poor kids and rich kids, and we see that achievement rates are different. Graduation rates are different. College-going rates are different. And then we simply attribute those differences to schools.”

Aaron Pallas of Teachers’ College agrees: “Seeing a spanking new building and a falling apart building,” said Pallas, “those inequalities are more visible than the inequalities that come from being in school vs. not being in school.”

Another reason for the mismatch between the academic and public images of schooling may be that high schools, which include the years most vividly remembered by students and nearest to when students enter the labor market, may exacerbate inequality. Without fall-spring testing, said Gamoran, “We don’t know as much about growth and inequality for kids out of elementary school.”

The best evidence that exists, in a recent paper by sociologists at New York University and Harvard, compares eighth and 10th grade data to suggest that high schools in Texas and Massachusetts are largely neutral with regard to academic inequality, with traditionally advantaged groups only slightly more likely to attend high schools that are better at boosting student achievement.

Yet Gamoran and others’ research on high school tracking shows that this practice tends to exacerbate racial and income inequality within schools.

Whether student assignment to a track is itself overtly racially biased or simply results from prior student achievement patterns (which themselves may reflect the effects of racial bias) is a topic on which scholars have waged loud and extended arguments. But income, race, and ethnicity is correlated with track assignment, and students in higher tracks have opportunities to learn more challenging content from more qualified teachers, resulting in inequality growth. Other recent studies show that high school students’ access to Advanced Placement courses varies by the racial, ethnic, and income composition of the schools they attend — gaps very much similar in size to those reported by Coleman 50 years ago.

This points to the role social choices play in the production of inequality. Tracking is viewed as a way to make instruction more efficient and prepare qualified students for the demands of college; middle school marks the beginning of mathematics tracking in most districts and humanities tracking in some. Exposing all students to the same curriculum over those middle years, however, is a viable option. Delaying tracking until high school would preserve the equalizing effects of schools over the early adolescent period. Yet this is not a choice most states and districts make.

The logical conclusion: You can’t fix schools without trying to fix broader social inequality, too.

Thinking about solutions to academic inequality in terms of social choices highlights other possibilities, as well.

One choice that research suggests would be targeting additional resources to the schools serving at-risk students. Though Gamoran agrees that general infusions of money appear not to matter, “additional resources, wisely spent, can make a difference.” Separate studies by Fryer and Gamoran, for instance, have found that allocating enhanced services to public schools, including intensive school-based tutoring, extended school days and year and social services programs appear to close achievement gaps.

In many other cases, “fixing” schools would require addressing broader social inequalities. Downey points to a recent study that uncovered a modest gap between U.S. and Canadian high school sophomores on an international comparison test. The author of this study, Joseph Merry, also compared Canadian and U.S. children in their late preschool years on the Peabody Picture Vocabulary Test, a standard assessment used to measure children’s reading aptitude. The U.S.-Canada gap near kindergarten entry? The exact same size.

“That suggests to me that it’s easier to be poor in Canada than in the U.S. I don’t think Canadian kids are ahead of us for genetic reasons; Canada has made a wide range of social policy decisions differently,” said Downey. “The kind of society that we live in really shapes what we see at kindergarten entry. And we can make policy decisions that change that.”

Such policy decisions would surely have to address income inequality, which itself is related to a complex set of social factors. Recent studies strongly suggest continuing racism in private firms’ hiring and housing decisions, for instance, and minimum-wage jobs fall far short of allowing parents to provide the support they desire for their children.

Such a wide-ranging discussion of the role of schools, families, race, and public policy choices would be unusual in U.S. education politics today. In the years since Coleman’s report, public debates about solutions to poverty have narrowed, and academics have become shy of stating any position with what Coleman once called “illiberal implications.”

But widening the debate, and rendering more accurate the public’s assessments about the role of schooling in students’ academic outcomes, is necessary to stop blaming schools and families separately and to understand the path forward. Schools can mitigate social inequality, but they govern only a fraction of students’ lives and eventual outcomes. Families matter, and families are profoundly shaped by the contexts in which they find themselves. Finding policy solutions that work in both realms presents the challenge the next generation of scholars must solve.

Photos of classrooms by City of Boston Archives and used under a Creative Commons license.

About our First Person series:

First Person is where Chalkbeat features personal essays by educators, students, parents, and others trying to improve public education. Read our submission guidelines here.