Aaron Schatz was really proud of his Ham Sandwich analogy from my recent feature on ESPN's controversial but promising new proprietary, Quarterback Rating. He was right to be: it was a pretty good analogy for proprietary statistics: "It’s a ham sandwich," Schatz said. "I don’t tell you how many slices of ham it is, or how thick it is, but you know it’s a ham sandwich." QBR, naturally, is a very complicated ham sandwich.
The most controversial ingredient in QBR's mystery sandwich is the Clutch Index, which for sandwich-metaphor purposes we can think of as some controversial aioli and which, in a more literal sense, as ESPN describes it, is a facet that factors in “how critical a certain play is based on when it happens in a game.” As I detailed in my piece, this element is hugely controversial in the stats community. But most people who watch sports tend to believe in the idea of clutch, and QBR is a statistic for everyone, and so there it is.
Before publication, I had reached out to Dean Oliver, Director of Analytics for ESPN Stats and Info and the father of QBR in many key respects, but never heard back. A few days after the article went up, Albert Larcada of ESPN Stats and Info reached out to me to discuss some of the concerns about QBR I raised in the article. He also informed me that Dean was not, in fact, snubbing me; Oliver was on vacation in Europe. To my surprise, Larcada actually wanted to give me more information. Not only that, but he detailed precisely how the Clutch Index was conceived and calculated. He has graciously given me permission to reprint that information here, in what I believe to be Exclusive Coverage of ESPN by The Classical: Brought To You By The Clog.
First, the use of the word “clutch” was a marketing decision. According to Lacarda:
“Clutch weight is literally the same thing as the leverage index concept you see in other sports....It is a way to measure how important a particular play is to the outcome of a game based on how important plays similar to it were in the past. We branded it clutch weight instead of leverage index mostly so the common fan could somewhat understand its intent... which is to measure “clutch” situations. If this was a stat we made strictly for the advanced stat community we absolutely would have called it leverage index.”
Leverage Index (LI) is a pretty standard part of advanced baseball analysis. This isn’t to suggest that Leverage Index is absent its own controversies, but the odd quirks and irks having to do with LI are nothing relative to the high-volume clusterfuck that ensues when “clutch” is mentioned. Mostly, this is because LI and clutch carry different connotations. LI measures situations, whereas “clutch” is colloquially used to distinguish players who perform better in high-leverage situations; one assesses circumstance, the other individuals. Of course, no situation in every game is of equal importance; not even the harshest clutch-haters would argue that. But how individual performances change according to those situations is much more uncertain. This debate is largely but not entirely semantic, although the semantics happen to matter a lot, here: the Clutch Index is exactly the same as another, more widely accepted measure. It’s the name that’s the problem.
I’ll let Larcada explain in his own words how the Clutch Index is calculated into QBR, and then I’ll provide a summary, then the advanced stats people can tell me I interpreted it wrong, and everyone will be happy:
“The math of it is this. We built a win probability model that tells a team’s chance of winning based on the game state. Time and score and most important (particularly late in the game), but down, distance, yards from the end zone, home/road, grass/turf, timeouts left, all of these things matter too. We then look at the WP for a particular game state and find the expected change in WP based on historical situations (our WP model is based on play by play since 2001), and scale it such that the average clutch weight is equal to 1. To avoid extreme high or low clutch weights we eliminated outliers from our model before we set the clutch weights. So the maximum clutch weight ended up being somewhere around 2.7 or 2.8 and the minimum somewhere around 0.2.
For example. Say a team is 1st and 10 from midfield, up 7, first play of the 2nd quarter, home, turf, etc. Given all of that information the WP for the team with the ball is 80.2%. So to find the clutch weight we would just look at the expected change in WP is on that first down play. The average change in WP across all plays is somewhere around 1.6%, so if this particular had an expected WP shift of more than 1.6% it would have a clutch weight of >1. If it’s expected shift was less than 1.6% its clutch weight would be <1. In case you are interested my example has a clutch weight of 0.79.
Then what we do is take that clutch weight and multiply by the expected points added the QB received on the play–that is, the QBs EPA after we divide credit. So if it’s literally an average expected WP shift (+/- 1.6%) then the clutch weight will have no effect on the play since you are multiplying by one. And obviously it could be more or less depending on the leverage index.”
It sounds both complex and math-y, and doubtless people are already preparing to shout at Nate Silver about it, but it actually boils down to a pretty simple concept: Given how this precise situation has unfolded in the past, how important is this play? The answer determines the Clutch Index, which then gets multiplied by all the other things that go into QBR to provide a weight on the play. In the end, the Clutch Index doesn’t significantly alter the overall rating. Lacarda wasn’t specific, but he hinted that by the end of a season it would be less than a few percentage points of difference.
Of course, this doesn’t get us all the way to how the QBR ham sandwich is made. A lot goes into the quarterback’s Expected Points Added, since QBR factors in QB rushes, scrambles, sacks, and other things that have to be normed onto the scale, and the formulas/calculations are just as important as the concepts behind them. But learning more about the Clutch Index is an important step in getting the stats community more involved in the metric. No matter how much ESPN wants QBR to go mainstream, it remains hard to imagine that happening unless the stat community takes it seriously.
I spoke to a post-vacation Dean Oliver recently about his goals for QBR. “I think my goals for anything here [at ESPN] is to have more intelligent and more descriptive discussion of the game,” he told me. “ Any player rating is a means to that end...So that doesn’t include a goal of getting it mentioned anywhere.” He also brought up using some of the pieces of QBR more often–such as expected points and time in the pocket–to feel out what concepts the community will grasp, and which may ultimately lead to QBR being viewed as more than just a number. Oliver stressed that this is a process, but he is pleased with the progress so far. “As Bill James always said, these analyses should open up conversations, not close them down. A rating like this is an explanation, with the ability to be broken down to explain better.”
Which brings us, again, to the question that concluded my previous feature on this stat: can QBR be the centerpiece for a progressively smarter NFL discussion? It’s obviously too early to tell, but there are some positive signals, like QBR being featured at The New York Times’ Fifth Down blog as a demonstration that Andrew Luck has actually been better so far this year than Robert Griffin. This showcases exactly why QBR was created in the first place, and how useful it can be. If you want to see this same data recycled between a couple of bad puns and 5-word paragraphs, you can see Rick Reilly’s column on the same subject. It’s not there yet, but QBR is making its way into the conversation.
I may have done a disservice in my feature framing the challenge as I did. The challenge is not to force Jaws, Gruden, and the Monday Night Countdown crew to talk about QBR or advanced stats in general, in the hope that it will lead to others adopting it. That would be a fool’s errand, but more to the point, as Oliver pointed out, that’s not how revolutions work. “I always want progressively more intelligent use of information, here at ESPN, on CNN/Fox/MSNBC, by politicians and doctors, by everyone growing up in the Information Age,” he says. “We’re doing it. But no revolution goes as fast as you want.”