The Illusion of Accuracy: SBG & Content Tiers

I’m a fan of the Bill Simmons’ The Ringer podcast network. Simmons is an interesting sports personality and he always has an interesting take on current sports events. As I was listening to a recent podcast about the top 10 NBA players right now I was reminded of one of his bits I am really fond of – his tendency to rank players and teams in tiers rather than in numerical order. In his Book of Basketball Simmons created a pyramid with the Best of the Best on the top tier, the Great in a second tier, and so on. He does this with his annual trade value column and even the quality of the NFL games in the coming week. From what I can understand he suggests that this is because it is nearly impossible to rank players with any degree of accuracy because eras are different, skill sets vary, the needs of a team, and any number of additional variables can impact the statistical outputs. I interpret this as meaning that stats can often provide an illusion of accuracy that can hide the true value or talent of a player.

This tiered system reminds me of the reason I eventually switched to Standards Based Grading. I was upset about the “Grinding” of the XP Grading system and even more so the nature of a Traditional Gradbooks 100 point scale providing the illusion of accuracy in to support ranking though grades. For the rest of this thought experiment I would like to use the simple example of a Unit Test. If a student answers 40 out of 50 Multiple Choice Questions correctly we would say they have earned an 80%. Students, teachers, and parents have been indoctrinated to believe that this means the student has “learned” 80% of the material for that Unit. It is comfortable because it is so common. Almost everyone that has gone through school knows how the grading systems work. The problem is that I think a 100% scale provides an illusion of accuracy that does not exist.

The overall percentage score hides a lot of plethora of variables. Did the student understand the vocabulary? Was the question asking something taught in the unit? Did the student eat breakfast or go to sleep the night before? Did the student misread the question or answer choices? Was the question poorly written? At what DOK or Blooms level is each question written? Are the expected to analyze a document in order to facilitate the answer and if so is it the analysis or content that is misunderstood?

You get the point.

The percentage score is mostly just a facade put into place to make everyone involved feel more comfortable when a summative grade is issued. If percentage scores are intended to be formative then the number score doesn’t give the students much in the way of feedback. Thinking back to that 50 question Multiple Choice Question test the percentage score might be used to explain several substandards but a student might be very strong in one substandard and have almost no understanding in another. Rarely would the misunderstandings be evenly distributed over the entire test. The test is also limited in that it can only assess a small portion of the information that might be found in each substandard. Even if we think of tests as summative the percentage scores obfuscate more than inform and it negates the growth mentality that is the goal of most reform minded classrooms.

My SBG Solution to Content Acquisition Assessments

This year I am experimenting with creating a system similar to Simmons’ tiered ranking system. Using the Standards Based Grading concept I have tiered content based tests into levels 1 through 4 based on the chart below.

This Poster is Hanging in my Classroom.

The SBG concept focuses mostly on skills growth; it is easier to measure mastery of specific skills using clear rubrics. This is part of my grading system as I believe that skills development and Historical Thinking Skills are essential to a social studies classroom. However, I am required to administer several district level common assessments to assess content knowledge and the tiered system was born out of this requirement.

For all of the reasons discussed above I never felt that these common assessments provided clear data for me or the students as we sought to grow and improve. I am aware that I am not able to know exactly what my students know or understand. I also know that a couple of mistakes or lucky guesses can radically swing the overall results of any given multiple choice test. My goal going into this year then has been to notice trends over time. More importantly, I want to train students how to notice trends in their own learning and to see how they are growing within the substandards over the course of the unit.

This is where the tiered categories come in to play. By clustering the percentages into tiers I am attempting to eliminate some of the variable “noise”. Over a period of time and as more data points are collected the more general view of the students understanding should give both the teacher and students a better trajectory of the student’s growth. Our goal is for a student to show growth meaning a student should be able to increase at least one tier from the beginning to the end of the unit.

Breaking Down the Tests

As I began implementing this tiered scoring process I realized that I would need to take some dramatic grading steps. The biggest was moving to a Power Law Function Mastery Calculation which uses the 4 level system to run the calculations. The next biggest change was to stop thinking of the common assessment as a single grade. Instead the test is broken down according to the substandards and the students will get a tiered score in each of the substandards. So none of my students will get a 65% or an 89% or even a 100%. Instead they will receive a score that looks more like this.

  • 3.CO = 3
  • 3.CE = 4
  • 3.P = 1
  • 3.CX = 3
  • 3.CC = 3
  • 3.E = 4

Right away it should be obvious that the students struggled on the indicator 3.P. Now the student and I can go back into the data to determine why the student struggled on this particular concept. We are taking what was often a throw away summative test and making it into a formative growth opportunity.

Of course the testing change wouldn’t mean much if this were the only change. At the start of a Unit students will take a baseline assessment (or rather a series of baselines) and work on smaller activities tied to both content and skills mastery. This has been a serious cultural change that has been a difficult transition for both the students and myself. Importantly though I am finally starting to see some acceptance and understanding within the students. I hope this lasts beyond next week when I have to put a grade in the gradebook! But that will be a post for next week!