Skip directly to content

Wadt: A Guide To School Grades ... Real Or Noise?

on August 23, 2017 - 3:38pm
Los Alamos

The New Mexico Public Education Department (PED) just released the grades for all of the schools in the State. Gov. Susana Martinez advocated for a simple grading system, similar to the one used in Florida, to improve communication with parents and other community members as to how well your local school is doing with a focus on student proficiency in reading and math. That was a very good idea. 

The next question is how do you measure student proficiency in a way that allows you to compare performance from school to school, district to district, and state to state? The answer has been annual standardized tests, initially the New Mexico Standards Based Assessment (NMSBA) and in the past three years the Partnership for Assessment of Readiness for College and Career (PARCC). 

The NMSBA was considered to be one of best state tests and the PARCC was designed to provide an even better evaluation of reading and math skills. However, even the best standardized tests that are given to thousands and thousands of students generally rely on questions that are developed by testing organizations and are often scored by computers. 

Research has shown that these tests provide only a narrow sampling of the cognitive and non-cognitive skills that students need to succeed in today’s dynamic world. However, if one wants to make comparisons of student performance across NM and the US and do it every year from grade 3 to grade 11, economics drives one to the current types of standardized tests.

As you probably know, New Mexico ranks near the bottom on reading and math proficiency compared to other states. In order to encourage schools with very low student proficiencies, the Governor focused on rewarding improvement or growth in the proficiencies as part of the PED grading system. That made great sense. 

Unfortunately, the Governor did not act on the suggestion from the NM School Boards Association to increase the weight given to actual proficiency scores, or the current standing, as the proficiencies improved. As a result, schools in Los Alamos, which have had historically higher student proficiencies, are penalized relative to schools with much lower student proficiencies that are improving. 

In previous years, a school in Los Alamos might get a B for its grade with student proficiencies around 60 percent while another school with proficiencies around 15 percent could get an A because they have improved from 5 percent. To me, the focus should be on the student proficiencies, both their level and their trends. Trying to distill a complicated set of data into a single grade can be misleading.

The other problem in the PED school grades with the focus on predicted growth is the significant year-to-year variation in student test scores. We all have good days and bad days. That natural variation introduces noise into the system. Calculating changes or growth in test scores increases the amount of variation or noise. As a result, the emphasis on growth can yield some unreliable and surprising results, both good and bad, which actually negates the utility of the school grades as a communication tool that was the Governor’s stated goal. It is never good to get a grade you don’t deserve whether it’s an A or a D!

A few years ago, leadership from the Los Alamos Public Schools formed a committee with scientists, mathematicians, and statisticians from the schools, Lab, and community. Dave Higdon, a top notch statistician from the Lab did a great analysis of all the State test data, which clearly showed that the variation or noise in the test scores was too great to support the implied accuracy of the PED school grades. However, even though I recommend you take many grains of salt when you look at the grades for any school, I think there is merit in looking at the trends in the raw student proficiencies. These data are presented under School History on the last page of the PED school report cards. That’s what I would commend to your attention – not the school grade. 

When you look at the three-year trends in student proficiencies under School History, remember the natural variation or noise in the system. I have some simple rules of thumb to help you decide whether a trend is in the noise or may be significant. For our elementary schools with student populations around 400 changes of 5 percentage points or more are likely to reflect real change. 

For the Middle School with 600 students or so, changes of 4 percent or more are likely to reflect real change. For the High School, look for changes of 3 percent or more and for Topper Freshman Academy look for changes of 6 percent or more. For subgroups, such as the lower quartile at an elementary school or 100 students, look for changes of 10 percent or more. For even smaller subgroups of 50 or 25 students, look for changes of 14 percent or 20 percent, respectively. Teachers, administrators, and parents who have more data on student achievement, can look to see if smaller changes in proficiencies are consistent with the trends from other measures.

The charts on School History and some of the other raw data in the PED school report cards are very useful for parents, teachers, administrators, and community members to promote communication, identify opportunities for improvement, and assess the benefits of interventions or changes in curriculum or instructional approaches.

That said, I have one final caveat. The primary basis for the PED school report cards are the results from the PARCC tests. As stated above, these tests are an imperfect instrument, which is why our schools and district along with others across the nation are looking for a more robust set of assessment tools from short-cycle formative assessments to student work product (written, oral, and/or projects) to evaluations of social-emotional learning.

As a scientist, I know that tests such as PARCC often fall short in measuring the ability to ask good questions, identify experiments, calculations, or previous knowledge to test those questions, critically analyze the results, work in teams of people with different skills and perspectives, and communicate your findings clearly in writing or orally.

We need to provide teachers and administrators the incentives, freedom, and support to pursue better ways to assess student achievement based on the exciting results that continue to flow from research on how we humans learn. It’s much more than a single letter grade, whether for districts, schools, or students.