measuring quality

Why Denver’s school rating system is coming under fire on multiple fronts

PHOTO: Cyrus McCrimmon/Denver Post
Brown International Academy teacher Kate Tynan-Ridgeway works with a student.

Denver Public Schools’ comprehensive and increasingly complex system for rating schools is facing criticism this year from leaders and advocates on different sides of the education policy debate.

Some say the system is making bad schools look good. Others say the opposite. Many complain that frequent changes to the School Performance Framework make excellence a moving target in a district that promotes school choice — and one in which parents use the color-coded ratings to decide where to send their kids.

A record number of schools earned one of the top two ratings on the framework this fall, putting the state’s largest school district closer to meeting goals for raising the quality of schools citywide.

With student enrollment tied to funding, and in an era where low performance puts Denver schools on a path toward closure or replacement, the ratings carry real consequences.

“Lots of things get mentioned or murmured in the hallways,” said Chantel Maybach, an educator at George Washington High School. “Instead of building up a school, that’s an easy way to start tearing it down from the inside, those fears and those concerns.”

Superintendent Tom Boasberg defended the ratings system, which put new emphasis this year on how well schools are educating traditionally underserved students. He also defended the academic gains schools have made and the high ratings they earned.

But he acknowledged that some measures weren’t as rigorous as they need to be, while others had the potential to be applied in a way that didn’t make sense. Correcting that will require more changes to the framework. While the fluidity of the system is one of the most persistent criticisms, he said making those changes is critical if Denver is going to get it right.

“Do you not make improvements that clearly need to be made in the interest of saying, ‘No change?’” Boasberg said. “I think our view is that over time, as we learn more and listen to folks, we want to make those improvements. … If we have data that’s not doing a good job helping schools focus on how and what to improve, that’s a reason we want to improve our tool.”

The concerns voiced by educators and advocates this year include that the framework too heavily weights the scores of less-rigorous early literacy tests taken by students in kindergarten through third grade, thereby inflating elementary school ratings.

Others complain that the new “academic gaps indicator” for all schools does the opposite, unfairly penalizing those that serve a diverse population at a time when the 92,000-student district, where two-thirds of students are living in poverty, is trying to increase school integration.

To understand the concerns, it’s helpful to first understand the framework.

What is the School Performance Framework?

The School Performance Framework was adopted by Denver Public Schools in 2008 under Boasberg’s predecessor, Michael Bennet, who is now a U.S. senator.

It awards schools points based on a long list of metrics. The number of points a school earns puts it in one of five color categories: blue (the highest), green, yellow, orange and red.

The system was meant to reward top-performers and identify low ones, which from the beginning received extra funding to help them improve. Bennet warned the ratings could have more dire consequences, too, including being used as a basis for school closure.

While the district has for many years closed schools due to poor performance, it solidified the framework’s role in those decisions in 2015 when the school board approved a policy setting consecutive low ratings as the first step toward school closure or restart.

So how are schools measured? State test scores have always been a big part of the metrics. But it’s more than just how many students score at grade-level or above, a factor the district calls status. In fact, the framework more heavily weights academic growth, or how much progress students make on the tests compared to peers who scored similarly to them in previous years.

When the framework debuted, Denver was among a first wave of large urban districts to emphasize growth over status. In 2008, growth accounted for about 60 percent of a school’s score, while status counted for about 30 percent, a ratio of 2-to-1.

As the district has added more growth metrics over the years, that ratio has stretched to 3-to-1 for elementary and middle schools. Growth accounted for 73 percent of an elementary school’s score this year, while status counted for 22 percent.

Boasberg is adamant that growth is more important than status. The latter, he said, is more a measure of where students start, which can depend on factors outside a school’s control. A school is not “good” because it serves more affluent kids, he said.

The traditional way of measuring schools based on how many students pass a test “plays to your worst biases around privilege,” Boasberg said. “The most important thing is for schools to make sure when kids come in, whatever level they’re at, that they grow.”

But the district has been criticized, including by candidates in this year’s heated school board election, for giving high ratings to schools that may have above-average growth but where, for example, just 10 percent of third-graders can read and write at grade-level.

The percentage of schools rated blue and green, the two highest ratings, has grown over the years. In 2010, 45 percent of schools were blue and green. This year, more than 60 percent were. The district’s goal is for 80 percent of schools in every neighborhood to be blue or green by 2020.

Sean Bradley, the president and CEO of the Urban League of Metropolitan Denver, is concerned that all that blue and green is misleading to parents.

“The district has a duty to tell the truth,” he said. “And the current calculations that the district is putting out there may not be as accurate as we assume they are.”

Early literacy concerns

Last year, just 9 percent of third-graders at Barnum Elementary in southwest Denver scored at grade-level or above on the PARCC literacy test, which the state requires be given to students in grades three through nine and which it considers the gold standard measure of what students should know.

But 57 percent of those same third-graders scored at grade-level or above on the iStation literacy test, another state-chosen test that’s given to students in kindergarten through third grade.

For the purposes of Denver’s school ratings, that 48-point gap and others like it are troubling to advocates like Van Schoales, CEO of the nonprofit education advocacy group A Plus Colorado.

“What’s happened this year on the elementary school front, primarily because of the early literacy scores, threatens undermining the whole system,” Schoales said. “Most importantly, it is saying to families that schools are good when they aren’t.”

This year, the district increased the number of points schools could earn for doing well on iStation and other early literacy tests by adding metrics measuring how groups of traditionally underserved students did, which district leaders consider key to closing achievement gaps.

That increase in the number of points came at the same time schools across Denver, including Barnum, saw big jumps in the number of young students scoring at grade-level on iStation and other tests, which leaders credit to an increased focus and investment in early literacy.

As a result, Barnum earned nearly every possible point on the framework for its early literacy scores, while earning far fewer points for its PARCC scores, including zeroes in several categories. The school, which serves a primarily low-income student population, was rated green this year after being rated yellow the year before.

In a statement provided to Chalkbeat, Principal Beth Vinson said Barnum is proud to have been rated green. She said its focus on early literacy “is starting to show good results” that she hopes will lead to higher achievement in its upper grades.

Barnum was not the only green school with a big chasm between its third-grade early literacy scores and its third-grade PARCC scores. One of the biggest was at Castro Elementary, where 73 percent of third-graders scored on grade-level on iStation but just 17 percent did on PARCC. Castro jumped all the way from a red rating, the lowest, to green this year.

Boasberg agrees that the misalignment between PARCC and tests like iStation is concerning. Because PARCC is relatively new, he said it was only recently that the district had enough data to confirm the mismatch. To remedy it, the district announced this fall that it will raise the early literacy test cut points, which were previously set by test makers and the state. Doing so will make it harder for schools to earn points, which Boasberg suspects will affect ratings.

The higher cut points will go into effect for 2019, giving schools time to get used to them. Boasberg rejected an idea floated by some critics to eliminate the early literacy tests from the framework altogether. While he acknowledged they’re an imperfect measure, he said the district added them in response to complaints that elementary school ratings long ignored progress being made in the lower grades because those students don’t take PARCC.

“We definitely agree the PARCC assessment is a stronger, higher quality assessment,” he said. But the early literacy tests are useful, too, he said, and the district is better off using them than nothing. “The question is,” he said, “‘Do you let the perfect be the enemy of the good?’”

The debate over academic gaps

Another pervasive complaint this year has been how the district’s focus on academic gaps between more-privileged and less-privileged students is dragging down some schools’ ratings.

Two years ago, the district launched a new part of the framework it called the “equity indicator.” Meant to shine a light on educational disparities, it measured how traditionally underserved students — low-income students, students of color, special education students and English language learners — were scoring on tests compared to set benchmarks, and how they were scoring compared to students not in those groups, so-called “reference students.”

The district warned schools that the following year, the equity indicator could count against them. If they didn’t score blue or green on the indicator, they couldn’t be blue or green overall.

During that hold-harmless year, 33 blue or green schools scored poorly on equity. The hold-harmless period also provided a chance to highlight issues with the indicator. Some school leaders, for example, complained it was unfairly dinging them for having large gaps even though their traditionally underserved students were scoring better than average.

What sort of message was it sending low-income parents, they argued, when a school with a big gap between poor and affluent students but where poor students were doing above average was rated lower on equity than a school where all students were doing below average?

The district took those concerns into account and tweaked the indicator this year, Boasberg said. It still measures gaps within a school, but it awards twice as many points for whether traditionally underserved students are meeting the benchmarks, taking the emphasis off the comparisons and putting it on whether underserved kids are on grade-level.

The district also gave the indicator a more precise name: the “academic gaps indicator.”

But concerns persist.

The Downtown Denver Expeditionary School, a charter elementary school where about 40 percent of students are minorities and a quarter are low-income, scored red on the academic gaps indicator for the second year in a row and was rated orange overall.

School leaders acknowledge the school has work to do in closing its gaps. Last year, 61 percent of middle- and upper-income third-graders scored at grade-level on the state literacy tests, while just 23 percent of students who qualify for subsidized lunches did, for example.

But they said despite the district’s tweak, it continues to make little sense that schools with smaller gaps but 8 percent literacy proficiency are green, while their school is orange.

“This isn’t about not holding us accountable for our achievement gaps,” said principal Erin Sciscione. “We want to be held accountable to that. We just don’t think the current system of measuring that is doing what it says it’s doing.”

Chantel Maybach, a special education coordinator at George Washington High, was among a group of teachers, parents and students who spoke publicly about the indicator at a recent school board meeting. She said she was “discouraged and sickened” to learn from one of the school’s data specialists that if white students at George had just not answered every fifth question on the test, the school would done better on the indicator and been green overall instead of yellow.

Senior Emily Ostrander said the lower rating was unfair for a school that serves “some of the highest-achievers in the district.” George is home to a rigorous International Baccalaureate program that for years fueled a divide among students, often along racial lines, that the school is working to erase. About 72 percent of George students last year were students of color, and about 55 percent qualified for free or reduced-price lunch.

“In a way, it dings the school for being as diverse as it is,” said student Yemi Kelani.

Nine schools were downgraded this year because they didn’t score high enough on the academic gaps indicator. George wasn’t among them, but Brown International Academy, an elementary school in northwest Denver, was. Kate Tynan-Ridgeway, a third-grade teacher at Brown, wrote an opinion piece in the Denver Post calling the ratings misleading.

Sixty-one other teachers signed on in support of the opinion piece.

If Brown were located a few blocks west and over the border of Jefferson County, where there is no academic gaps indicator, Tynan-Ridgeway said, it’d be green and not yellow.

“The achievement gap worries us all,” she said. “As educators, we’re differentiating all the time.”

But Tynan-Ridgeway said that with the indicator highlighting the performance of traditionally underserved students, “it feels to me that the district is saying those kids are far more important than what could potentially be the bulk of your student body.”

Boasberg responded with an opinion piece of his own explaining why the indicator exists. He wrote that it’s already showing promising results: The number of would-be green schools with poor indicator scores dropped by two-thirds from the hold-harmless year to this year.

The district is still fine-tuning the indicator, Boasberg said, and it’s possible more tweaks are coming. One issue, he said, is whether it should apply to schools where nearly all students belong to traditionally underserved groups. This year, the district decided not to downgrade the overall ratings of three high-poverty schools even though they did poorly on the indicator.

Looking ahead

With such high stakes as funding, enrollment and even possible closure attached to school ratings, there are plenty of theories about the reasons behind the frequent changes. Is the district embellishing the ratings to make its schools look better and insulate itself from criticism about closing low-performers? Or is it inventing new ways to drive traditional schools’ ratings down so it can justify replacing them with charter schools?

Boasberg insisted it’s neither. But he said he understands why people hold such passionate, and often conflicting, opinions about the way the district rates its schools.

“There’s no perfect way to do it,” he said. “At the end of the day, it’s enormously helpful for teachers, for parents and for school communities to have a school performance framework that takes data from many different sources and brings it together in a way that’s understandable.”

While the district debates what to do about the academic gaps indicator and gives schools another year to get used to higher early literacy cut points, there is one change that’s definitely happening for the 2018 framework. After lowering the bar in 2016 to essentially give schools a reprieve from the new and rigorous PARCC tests, all cut points for the literacy and math tests will go up next year, inching blue and green ratings a bit further out of reach.

More autonomy

These Denver schools want to join the district’s ‘innovation zone’ or form new zones

PHOTO: Melanie Asmar
McAuliffe Manual Middle School students at a press conference about test scores in August 2017. The school has signaled its intent to be part of a new innovation zone.

Thirteen Denver schools have signaled their desire to become more autonomous by joining the district’s first “innovation zone” or by banding together to form their own zones. The schools span all grade levels, and most of the thirteen are high-performing.

Innovation zones are often described as a “third way” to govern public schools. The four schools in Denver’s first zone, created in 2016, have more autonomy than traditional district-run schools but less than charter schools, which are publicly funded but independently run.

Denver Public Schools recently released applications for schools to join the first zone, called the Luminary Learning Network, or to form new zones. The school district, which at 92,600 students is Colorado’s largest, is nationally known for nurturing a “portfolio” of different school types and for encouraging entrepreneurship among its school principals.

The district is offering two options to schools that want to form new zones. One option is for schools to apply to form a zone that would be overseen not by the district but by a nonprofit organization. That’s how the Luminary Learning Network is set up.

Another, slightly less autonomous option is for schools to apply to form a zone that would be overseen by the district. “Some additional autonomies would be available to these schools, but many decisions would still be made by the district,” the district’s website says.

One tangible difference between the two: The principals of schools in zones overseen by the district would answer to district administrators, while the principals of schools in zones overseen by nonprofit organizations would be hired and fired by the nonprofits’ boards of directors.

Schools in both types of zones would have more control over their budgets. A key flexibility enjoyed by the four schools in the Luminary Learning Network has been the ability to opt out of certain district services and use that money to buy things that meet their students’ specific needs, such as a full-time psychologist or another special education teacher. The zone schools would like even more financial freedom, though, and are re-negotiating with the district.

The district has extended the same budgetary flexibility to the schools in Denver’s three “innovation management organizations,” or IMOs, which are networks of schools with “innovation status.”

Innovation status was created by a 2008 state law. It allows district-run schools to do things like set their own calendars and choose their own curriculum by waiving certain state and district rules. The same law allows innovation schools to join together to form innovation zones.

The difference between an innovation zone and an innovation management organization is that schools in innovation zones have the opportunity for even greater autonomy, with zones governed by nonprofit organizations poised to have the most flexibility.

The deadline for schools to file “letters of intent” to apply to join an innovation zone or form a new one was Feb. 15. Leaders of the three innovation management organizations applied to form zones of their own.

One of them – a network comprised of McAuliffe International and McAuliffe Manual middle schools – has signaled its intent to join forces with an elementary school and a high school in northeast Denver to form a new, four-school zone.

Three elementary schools – Valdez, High Tech, and Swigert – submitted multiple intent letters.

Amy Gile, principal of High Tech, said in an email that her school submitted a letter of intent to join the Luminary Learning Network and a separate letter to be part of a new zone “so that we are able to explore all options available in the initial application process. We plan to make a decision about what best meets the needs of our community prior to the application deadline.”

The application deadline is in April. There are actually two: Innovation management organizations that want to become innovation zones must file applications by April 4, and schools that want to form new zones have until April 20 to turn in their applications.

Here’s a list of the schools that filed letters of intent.

Schools that want to join the Luminary Learning Network:

Dr. Martin Luther King, Jr. Early College High School
Valdez Elementary School
High Tech Elementary School

Schools that want to form new innovation zones overseen by nonprofits:

McAuliffe International School
McAuliffe Manual Middle School
Northfield High School
Swigert International School
These four schools want to form a zone called the Northeast Denver Innovation Zone.

McGlone Academy
John Amesse Elementary School
These two schools want to form a zone called the Montbello Children’s Network.

Grant Beacon Middle School
Kepner Beacon Middle School
These two schools want to form a zone called the Beacon Network Schools IMO I-Zone.

Schools that want to form a new innovation zone overseen by the district:

High Tech Elementary School
Isabella Bird Community School
Valdez Elementary School
Swigert International School
DCIS at Ford
These five schools want to form a zone called the Empower Zone.

measuring up

After criticism, Denver will change the way it rates elementary schools

PHOTO: Denver Post file
Eva Severance, a first-grader, concentrates on a reading lesson at Lincoln Elementary in Denver.

Facing criticism that its school ratings overstated young students’ reading abilities, the Denver school district announced it will change the way elementary schools are rated next year.

The district will increase the number of students in kindergarten, first, second, and third grade who must score at grade-level on early literacy tests for a school to earn points on the district’s rating scale, and decrease how many points those scores will be worth, officials said.

The changes will lessen the impact of early literacy scores on a school’s overall rating, while also raising the bar on how many students must ace the tests for a school to be considered good. Denver rates schools on a color-coded scale from blue (the highest) to red (the lowest).

“We want to see more students making more progress,” Superintendent Tom Boasberg said.

Local civil rights groups, elected officials, educators, and education advocates criticized Denver Public Schools this year for misleading students and families with what they characterized as inflated school ratings based partly on overstated early literacy gains.

“At a time when this country is at war on truth, we have an obligation to Denver families to give them a true picture of their schools’ performance,” state Sen. Angela Williams, a Denver Democrat, told Boasberg and the school board at a meeting in December.

The groups had asked the district to revise this year’s ratings, which were issued in October. Boasberg refused, saying, “If you’re going to change the rules of the game, it’s certainly advisable to change them before the game starts.” That’s what the district is doing for next year.

The state requires students in kindergarten through third grade to take the early literacy tests as a way to identify for extra help students who are struggling the most to learn to read. Research shows third graders who don’t read proficiently are four times as likely to fail out of high school. In Denver, most schools administer an early literacy test called iStation.

The state also requires students in third through ninth grade to take a literacy test called PARCC, which is more rigorous. Third-graders are the only students who take both tests.

The issue is that many third-graders who scored well on iStation did not score well on PARCC. At Castro Elementary in southwest Denver, for example, 73 percent of third-graders scored at grade-level or above on iStation, but just 17 percent did on PARCC.

Denver’s school ratings system, called the School Performance Framework, or SPF, has always relied heavily on state test scores. But this year, the weight given to the early literacy scores increased from 10 percent to 34 percent of the overall rating because the district added points for how well certain groups, such as students from low-income families, did on the tests.

That added weight, plus the discrepancy between how third-graders scored on PARCC and how they scored on iStation, raised concerns about the validity of the ratings.

At a school board work session earlier this week, Boasberg called those concerns “understandable.” He laid out the district’s two-pronged approach to addressing them, noting that the changes planned for next year are a stop-gap measure until the district can make a more significant change in 2019 that will hopefully minimize the discrepancy between the tests.

Next year, the district will increase the percentage of students who must score at grade-level on the early literacy tests. Currently, fewer than half of an elementary school’s students must score that way for a school to earn points, said Deputy Superintendent Susana Cordova. The district hasn’t yet settled on what the number will be for next year, but it will likely be more than 70 percent, she said. The more points a school earns, the higher its color rating.

The district will also reduce the impact the early literacy test scores have on the ratings by cutting in half the number of points schools can earn related to the tests, Cordova said. This makes the stakes a little lower, even as the district sets a higher bar.

The number of points will go back up in 2019 when the district makes a more significant change, officials said. The change has to do with how the tests are scored.

For the past several years, the district has used the “cut points” set by the test vendors to determine which students are reading at grade-level and which are not. But the discrepancy between the third-grade iStation and PARCC reading scores – and the public outcry it sparked – has caused officials to conclude the vendor cut points are too low.

District officials said they have asked the vendors and the state education department to raise the cut points. But even if they agree, that isn’t a simple or quick fix. In the meantime, the district has developed a set of targets it calls “aimlines” that show how high a student must score on the early literacy tests to be on track to score at grade-level on PARCC, which district officials consider the gold standard measure of what students should know.

The aimlines are essentially higher expectations. A student could be judged to be reading at grade-level according to iStation but considered off-track according to the aimlines.

In 2019, the district will use those aimlines instead of the vendor cut points for the purpose of rating schools. Part of the reason the district is waiting until 2019 is to gather another year of test score data to make sure the aimlines are truly predictive, officials said.

However, the district is encouraging schools to start looking at the aimlines this year. It is also telling families how their students are doing when measured against them. Schools sent letters home to families this past week, a step district critics previously said was a good start.

Van Schoales, CEO of the advocacy group A Plus Colorado, has been among the most persistent critics of this year’s elementary school ratings. He said he’s thrilled the district listened to community concerns and is making changes for next year, though he said it still has work to do to make the ratings easier to understand and more helpful to families.

“We know it’s complicated,” he said. “There is no perfect SPF. We just think we can get to a more perfect SPF with conversations between the district and community folks.”

The district announced other changes to the School Performance Framework next year that will affect all schools, not just elementary schools. They include:

  • Not rating schools on measures for which there is only one year of data available.

Denver’s ratings have always been based on two years of data: for instance, how many students of color met expectations on state math tests in 2016 and how many met expectations in 2017.

But if a school doesn’t have data for the most current year, it will no longer be rated on that measure. One way that could happen is if a school has 20 students of color one year but only 12 the next. Schools must have at least 16 students in a category for their scores to count.

The goal, officials said, is to be more fair and accurate. Some schools complained that judging them based on just one year of data wasn’t fully capturing their performance or progress.

  • Applying the “academic gaps indicator” to all schools without exception.

This year, the district applied a new rule that schools with big gaps between less privileged and more privileged students couldn’t earn its two highest color ratings, blue and green. Schools had to be blue or green on a new “academic gaps indicator” to be blue or green overall.

But district officials made an exception for three schools where nearly all students were from low-income families, reasoning it was difficult to measure gaps when there were so few wealthier students. However, Boasberg said that after soliciting feedback from educators, parents, and advocates, “the overwhelming sentiment was that it should apply to all schools,” in part because it was difficult to find a “natural demographic break point” for exceptions.

Correction: Feb. 20, 2018: This story has been updated to more accurately describe how the district will rate schools on measures for which there is only one year of data available.