This article was originally published in The Notebook. In August 2020, The Notebook became Chalkbeat Philadelphia.
A recent report criticizes the Philadelphia School District’s move to reduce the use of suspensions for minor offenses, but the National Education Policy Center at the University of Colorado is pushing back against the report, saying that it cherry-picks data.
The report’s authors reply that the critics don’t understand standard methods in econometrics, a branch of economics that uses mathematics and statistics to quantify economic relationships.
The report criticizing the District, which came out in December, was produced by two independent researchers for the Fordham Institute, a think tank that is a member of the American Legislative Exchange Council, which pushes conservative legislation in states around the country.
The left-leaning National Education Policy Center, established in 2010 to review existing research and conduct its own, took issue with “logical fallacies, overly simplified interpretations of findings, and inflammatory language” in the report, particularly the foreword.
That foreword, written by two Fordham staffers, criticizes the Obama administration for promoting a “near-total reversal on school discipline policy” by encouraging alternatives to suspension, which “has led to empirically unproven strategies such as restorative justice.”
The Fordham report contends that the data show that a change in the District’s code of conduct in the 2012-13 school year aimed at reducing out-of-school suspensions for minor infractions had “no long-term impact” on those suspensions.
It also says that “associational” evidence indicates that the policy change improved attendance, but not academic achievement, of students who had been suspended; that it harmed the achievement of never-suspended students in disadvantaged schools; and that it led to an “increase in racial disproportionality” in the use of suspensions.
The National Education Policy Center disputes those claims in its review released in February, at least as they are stated by the authors.
“There are several instances in which the findings and conclusions presented in the report either contradicted or overlooked results from the original research studies,” the review reads. “The pattern of inconsistent reporting of the original study results indicates some ‘cherry-picking’ in order to make a partisan argument against current federal policy.”
Matthew Steinberg, one author of the Fordham report and an assistant professor at the University of Pennsylvania’s Graduate School of Education, said he did not write the foreword and put nothing in the report about the Obama administration’s recommendation to reduce suspensions.
The report cites research on school discipline done by Daniel Losen, director of the Center for Civil Rights Remedies – part of the Civil Rights Project at UCLA – and he strongly disagrees with how the conclusions are framed.
“The findings actually suggest that the reforms were effective when implemented correctly,” Losen said. “But [Fordham’s report] is being used to criticize the guidance that had nothing to do with this homegrown Philadelphia initiative. … That’s a kind of disingenuous use of the research” by Fordham.
Steinberg and Losen do agree on one thing: The study indicates that, when school discipline reforms are not coupled with additional supports and resources for teachers, they are less likely to succeed – especially in the highest-poverty schools.
Steinberg said: “Schools that may be struggling the most in terms of students’ academic performance and students’ behavioral concerns are the schools that may have the least capacity to implement these types of reforms and likely require additional district-level resources.”
The Fordham report examined four research questions. The foreword presents four summarizing statements about what the data showed. These statements were those labeled as cherry-picking the results by the National Education Policy Center.
Number of low-level suspensions
The first summary statement is: “Changes in district policy had no long-term impact on the number of low-level ‘conduct’ suspensions.”
It goes on to say that most schools did not fully comply with the policy change. However, the method used to draw these conclusions lumps the schools together, regardless of whether they attempted to implement the policy.
Researchers divided schools into three categories: “full compliers,” which eliminated the suspensions; “partial compliers,” which reduced the number but did not eliminate them; and “non compliers,” which did not reduce these suspensions.
In the first year after the change, full compliers went from giving 2.5 percent of students a conduct suspension to zero percent. Among partial compliers, the rate declined from 6 to 3 percent. Only among non compliers did the rate rise, from 3 to 6 percent.
The Fordham report asserts that any reductions are offset by increases in the final year studied. But the report does not provide a breakdown, so it’s impossible to see whether the decline in suspensions at complying schools was offset by a larger increase among non-complying schools that year.
The report was based on two research papers, one of which does give some raw data, and the data include more school years. It breaks down the conduct suspensions into a per-capita rate averaged over the six years leading up to the reform (the “pre-reform period”) and the two years afterward (“post-reform period”).
Per capita suspensions of all kinds declined 27 percent from the pre-reform period to the post-reform period. Across all other Pennsylvania school districts, the rate declined only 15 percent. The data were similar for conduct-related suspensions, which declined by 5 per 100 students compared to all other districts in the state, where suspensions for conduct declined by less than 1 per 100 students. These numbers were not included in the Fordham report.
Harold Jordan, senior policy advocate for the ACLU of Pennsylvania who did a statewide analysis of school discipline policies for his 2015 report, said the report treats the policy change as if it were the only one of its kind in the years being studied, but that’s not the case (Jordan is a longtime Notebook board member).
Jordan, whose children attended schools in the District, was involved in negotiations on changes to the code of conduct, and he said they were “gradual” over several years and were not isolated to 2012-13, the year in question in the report.
Steinberg dismissed this as irrelevant for districtwide changes, saying it would become relevant only if policies were put in place at some schools but not others.
Jordan said that, under the leadership of Superintendent Arlene Ackerman from 2008 to 2011, principals around the District became alienated from the central office and felt that edicts were being handed down without consulting school-level leaders. Superintendent William Hite took over in 2012, the year of the policy change, and he has worked to repair relationships with principals by giving them more autonomy to make changes at the school level.
In other words, ever since the year that the policy was implemented, there have been exactly the kinds of school-level policy changes that Steinberg was referring to.
Attendance and achievement
The Fordham report’s second summary statement says: “Changes in district policy were associated with improved attendance – but not improved achievement – for previously suspended students.”
The National Education Policy Center disagrees.
“This [claim] contradicts the original study, which concluded that previously suspended students made marginal but statistically significant gains in math proficiency,” the review states. “Findings consistently indicated that outcomes improved in the post-reform period for students who had been suspended prior to the policy change.”
The more pervasive problem is that this measure includes all students in Philadelphia’s public schools – even those at the schools that continued suspending students for low-level misconduct.
“It seems like there were benefits where the policy was fully implemented,” Losen said. “You can’t say: ‘Here’s the negative consequence of the reform using some data from schools where the reform was not carried out at all.’”
Jordan said it was unsurprising to find mixed results from a policy change just two years after it had been adopted.
“When the District implemented the suspension ban for kindergartners, it wasn’t implemented perfectly either,” Jordan said. “The District, after that school year, made an effort to educate principals about the policy. … That’s the struggle.”
Effects on non-suspended students
According to the third summary statement, students who were never suspended “experienced worse [academic] outcomes in the most economically and academically disadvantaged schools, which were also the schools that did not (or could not) comply with the ban on conduct suspensions.” The report implies in several places that this is a “consequence” of the policy change, despite labeling the evidence “associational.”
The policy center’s response focused on two aspects of this statement. First, it added missing context: As the Fordham report acknowledged, the control group of students who were never suspended came from more-affluent schools.
“Such schools are likely to be meaningfully different on several dimensions that are related to test scores and attendance, but were not accounted for in the study,” the review states. “High poverty and racially segregated schools tend to have lower teacher quality and offer weaker opportunities for academic engagement than schools serving more advantaged students.”
Thus, the policy center says, the control group of students who were never suspended is “meaningfully different” from the partial and non complier groups it was compared to, and that “likely influenced the outcomes.”
Steinberg calls this a misunderstanding of a common practice in econometrics, which “compares changes in outcomes across two groups – one treated by a policy intervention and another that’s not treated.”
“It’s not the case that they need to be observably equivalent,” he said. “The pre-trends must be parallel, meaning [data from] those two groups evolved similarly leading up to the intervention.”
The National Education Policy Center’s response went on to question what it called “a strange manipulation of reasoning.”
“Achievement declined among students who were not suspended prior to the reform but attended schools that continued to use conduct suspensions,” it says. “They argue the decline in achievement among these ‘rule-abiders’ was due to the inability of these schools to suspend misbehaving peers. However, this interpretation contradicts the fact that these schools continue to use suspensions.”
Steinberg said the policy center’s critique misses the point that never-suspended students at schools that fully complied were unaffected.
“We don’t know what resources existed at these schools that fully complied” compared to those that did not, he said. “So the important question remains: What were these schools doing that may or may not have insulated the non-suspended peers?”
The foreword’s final summary statement is: “Revising the district’s code of conduct was associated with an increase in racial disproportionality at the district level.”
The metric used to draw this conclusion is the ratio of suspensions for black students to suspensions for white students.
To Losen’s dismay, Fordham’s report does not include the raw data – the total number of suspensions issued to both groups. Losen pointed out that an increase in this ratio does not necessarily represent an increase in the number of black students suspended. It could be the result of a larger decline in suspension of white students than the concurrent decline in suspension of black students, he said.
Steinberg said: “That might be true, but the gap got bigger.” He added that ratios are the common measure of racial disproportionality.
Losen wants the raw data to be released so other researchers can review it, as an alternative to the peer review that the report has not yet received. One of the papers on which the report is based has been peer reviewed, and the other is awaiting review.
“They gave us the equation and the results coming out of their equation, but didn’t give us the variables being input into the equation to get those results,” Losen said.
He asked the authors for that data, but they did not provide it. A spokesperson for the University of Pennsylvania Graduate School of Education said that the data was obtained through a data-sharing agreement with the District and contained student names that could not be shared by law. Losen, however, was not asking for names. Still, sharing any of the data would have violated Penn’s data sharing agreement with the District, according to the spokesperson.
Greg Windle is a staff reporter at the Notebook.