New Teacher Evaluation Systems Are Not Trustworthy Without Better Assessments

It seems that the biggest issue these days in education “reform” is the attempt to change how teachers are evaluated. Locally in New York, the state legislature passed a new evaluation system last year and the Board of Regents more recently released its guidelines for the implementation of that law, though much of the details remain to be negotiated between local districts and unions. Nationally, the Gates Foundation-funded Measure of Effective Teaching Project is starting to share some conclusions from the first two years of their study, and a recent report from the Center for Teaching Quality’s New Millenium Initiative by a group of Denver teachers has garnered some positive attention in the blogosphere from Renee Moore, Ariel Sachs, Dan Brown, and others.

Like nearly all issues in education, this one is complex. I have gotten to see just how complex it is from two vantage points within the NYC discourse: I have been working for the past semester to support the social studies teachers in NYC’s transformation schools who were subject to the pilot of new assessments that are to be part of the new teacher evaluation system. I am also on the UFT negotiating committee for the new system. Unfortunately, I am under non-disclosure obligations for both sides and can’t yet write from those experiences. I did, however, have the luck to be invited last night to participate in a webinar through the Teacher Leadership Network with a researcher from the Gates MET study, so I will use that study as a jumping off point for some comments.

There is tremendous reason to be skeptical, if not downright resistant, to Gates money being used to support this study, as Joanne Barkan so brilliantly documented in Dissent Magazine. I’m willing to put that aside for the minute, to assume the best intentions of the researchers who are working on this and other projects. The basic logic of the MET project, as well as all efforts to measure teacher effectiveness, seems to be as follows “if we can identify what goes into good teaching, then we can a) replicate it through better teacher education and development and b) remove ineffective teachers that will be replaced with the better developed teachers we will then be able to create.” The less benign version of this argument, which is motivating the politicized teacher evaluation laws passed around the country, is that “we need to identify bad teachers so we can fire them and replace them with good ones.” Again, I’m willing here to deal with the better intentions of former, despite all the others on the bandwagon.

The billion dollar question then becomes, what is “good teaching”? And unfortunately, this is the question I have seen dealt with in far too simplistic ways, if at all. The MET study claims that evaluation should be based on “students’ achievement gains” and “any additional components of the evaluation…should be valid predictors of the student achievement gains” (“Working” p. 5). This seems like incredibly circular logic, as it implies that other measurements of teacher effectiveness are only valid if they predict students’ gains on the standardized tests the study used. And while in its initial findings, the MET study showed that “the type of teaching that leads to gains on the state tests corresponds with better performance on cognitively challenging tasks and tasks that require deeper conceptual understanding, such as writing.” (“Learning” p. 5), this reveals further flaws in the project’s logic, as it places the cart squarely before the horse. Shouldn’t the question be: Are the assessments of student outcomes valid indicators of students’ ability to complete cognitively challenging tasks? Is it not likely that teachers would see even more growth in students’ capacities for deeper conceptual understanding without the state tests that assess other skills and knowledge in the way?

The conversation in the webinar last week focused largely on the question of trust in developing new observation systems. This could not be more important. For teachers to be able to trust any new observation system, and for the public to be able to trust the validity of any system, there needs to be a much larger focus on what the desired outcomes are for students’ learning, and what is the most meaningful way to assess students’ attainment of these outcomes. Organizations like Edutopia and Fairtest have documented the incredible flaws in current assessments, and Joanne Barkan, once again, showed the misuse of these assessments to attack teachers using deeply-flawed mathematical models like Value-Added (which is also the basis for the Gates study’s data). There needs to be exponentially more dialogue initiated in order to develop better assessments that assess meaningful outcomes.

We also, I think, need to be prepared to recognize there will not be one silver bullet solution to this issue. The new evaluation system in New York State allows for one district to use different assessments from from another, and even for clusters or networks of schools within a district to choose different assessments. This is a move in the right direction. Just as colleges seem to have no problem recognizing that the IB and AP are equally valid assessments, so to, should we allow more flexibility for schools, or even teachers, to have access to a battery of meaningful, rigorous, and valid assessments of student learnings. As I wrote recently, there is never a silver bullet solution for the complexities of education, and we should not expect things to be any different with assessment.

About our First Person series:

First Person is where Chalkbeat features personal essays by educators, students, parents, and others trying to improve public education. Read our submission guidelines here.