In Defense Of Success Rates

One of my favorite fellow nerds in the blogosphere, Brett Thiessen (a.k.a. MGoBlog's The Mathlete), took on the concept of Success Rates, and why they may be unnecessary, earlier this week. To make sure I don't forget to make a point along the way, I'm going to respond to it, Fire Joe Morgan-style (quote and respond, quote and respond, etc.), but I don't want to give the impression that this is in any way hostile. It isn't, just like his original post wasn't either. I just love having the conversation, and we might as well have it publicly.

For background on Success Rates, you can start with the three following links:

The official Football Outsiders glossary:

Success Rate: [...]

  • Success Rate (college football): Our Varsity Numbers column calculates Success Rate for teams, not just running backs, using a set of baselines that differ slightly from our NFL Success Rates: 50% of needed yards on first down, 70% of needed yards on second down, or 100% of needed yards on third or fourth down.

Football Study Hall -- The Toolbox: Offensive Success Rates

One of my favorite things about college football is how there are so many different ways to move the chains. Seeing a team like Wisconsin or Navy on the list above would be no surprise -- they're the prototypical grind-it-out, three-yards-and-a-cloud-of-dust teams. But while Wisconsin locked down the three-spot, a run-and-shoot offense took the top ranking, while spread teams filled out most of the Top 10. Invention in college football derives from trying to find different ways to gain five yards, and in college football, there are many, many different ways.

Football Study Hall -- The Toolbox: Defensive Success Rates

While not THE tool for measuring college football proficiency, I really enjoy what success rate brings to the table, especially alongside the slightly more powerful, descriptive Points Per Play measure. But I'll elaborate on that while going through Brett's post below.

Success Rate is a measure is an attempt to measure how good a player or team is at the traditional concept of "staying ahead of the chains." There are some slightly different calculations but for the most part a success is defined as at least 40-50% of yards to go on 1st down, at least 50-70% of yards to go on second down and first down achievement on third or fourth down. Typically the target is 50% success rate.

Although I doubt there is any recorded history on how this came to be (I believe its origin or at least its popularization comes from Football Outsiders) I have two theories. The first is that this is how football fans, players, and coaches have been conditioned to think, especially old school, grind-it-out football folks. You still hear it often among clichéd commentators: the offense’s number-one priority is to stay ahead of the chains, don’t put yourself in bad down and distance, stay away from obvious passing downs. All of these things are good things for a football to do.

The second reason I think it came to be is that advanced football stats came to be after advanced metrics for baseball had come a long ways. One of the key tenants of Moneyball/SABR revolution in baseball is that On Base Percentage >>> Batting Average. On top of that, one of the fundamental advanced baseball stats is OPS, On Base Percentage Plus Slugging Percent, a combination of Success and Magnitude. One paralleled by Football Outsiders* in their S&P metric.

*I want to be clear that this is not a critique of Football Outsiders. They do tremendous work and are at the forefront of advanced football analysis.

When I began playing with play-by-play data in the spring and summer of 2007, I decided to take a look at the success rate measure Aaron had created at F.O. for two reasons: 1) it existed, and 2) it made sense to me. As I've mentioned before, I love baseball stats, but I don't really enjoy baseball. I wanted to see what tenets of baseball statistics could be used for football. Success Rate made sense to me, not simply because of Moneyball or because "On Base Percentage >>> Batting Average," but because efficiency is a good thing, and Success Rate is an efficiency measure (and a pretty good one).

I wasn't exactly sure how much stock to put into success rates, really, until I began to look at the difference in performance rates between standard downs and passing downs. The clichéd "stay ahead of the chains" truism turned out to be, well, true. I began to find that pursuing an element of efficiency (staying on schedule) alongside your explosiveness makes you infinitely more likely to succeed.

Why Football is Not Baseball

Good OBP is critical for baseball because you are dealing with a finite, irreplaceable resource, outs. You get 27 of them per game. Once you generate an out there is no way to get it back; you are 1 step closer to the end of your chance to score, and you only have 27 total steps per game. OBP measures a team or individual’s ability to forego outs when they come to the plate. Not getting out will always improve your chances of winning while getting an out will almost always decrease your odds of winning (this is not an article about the sacrifice bunt).

Contrast that with football, where the only finite resource is time. Even if the quarterback gets sacked and loses 10 yards, one play later the effect of that loss can be wiped out.

Yes, but...

In a sense a set of downs is finite, but not an individual set of downs.

Downs are absolutely finite if you aren't good at staying "ahead of schedule," so to speak. If teams were equally good at converting 2nd-and-20, 2nd-and-8 and 2nd-and-1, then there would be absolutely no purpose for an efficiency measure. Just like (to steal yet another baseball metaphor) batters tend to have a better batting average on a 2-0 count rather than an 0-2 count, offenses that stay ahead of the count are more likely to produce successful drives.

If there were a team correlation, first downs converted would be more appropriate and I don’t really see a true individual equivalent.

Working our way in that direction, really the only truly "appropriate" statistic is points scored, since everything else -- first downs, yards, etc. -- leads to that. The point of success rates (and most advanced stats) is to both describe and evaluate; I guess you can learn a decent amount about efficiency from simply looking at total first downs, but what about first-down efficiency? Rushing efficiency? Passing efficiency? Passing downs efficiency? Third-quarter efficiency? Red zone efficiency? Success Rate is a powerful tool because it lends to deeper analysis. But yes, if you just want to know how a team did as a whole, you could certainly get by in the short-term by looking at total first downs.

The Goal Is To Score Points

Consistently being in good down and distances is not a bad thing, but it’s not nearly as important for today’s offenses. Modern offenses have a much greater ability to convert unfriendly down and distances than offenses of old.

The good ones do, yes. The bad ones are still bad at it. Really, really bad at it.

Plus, the offense’s goal is to score points, not get first downs. Getting first downs obviously helps score points, but a metric like EV/PAN that directly accounts for how each play contributes to scoring is a much stronger measure, not just a complimentary stat like Slugging Percent.

I assume the reference to slugging percentage is intended to also be a reference to EqPts Per Play (PPP), the other half of the S&P measure. I'm not sure I get the reference, however, because the "EV" in "EV/PAN" stands for "expected value," and the entire PPP measure comes from the same approach to expected values.

(In short, EqPts are derived by comparing the expected point value of where a given play began to where it finished. I've long suspected that Brett and I are doing somewhat overlapping work, and this is one of those instances. We're clearly not doing exactly the same thing here, but a lot of the same concepts are at work.)

In baseball the complimentary stat is needed because of the finite nature of outs. In football, everything is a sliding scale and categorizing plays as pass-fail is simply too black and white for a sport that has more gray.

I disagree. Over the course of a baseball season, one successful at-bat tells us almost nothing, just like one successful or unsuccessful play tells us almost nothing. But over the course of the season, it can be incredibly informative, specifically for the gray area it provides to "how many games did Team A win" or "How many yards did they average?"

A couple of examples of how success rate can be misleading (first down gain, second down gain, third down gain):

4,3,2: This is a 67% success rate but is a three and out.

Technically untrue, at least as it pertains to NCAA data. For NCAA, that is actually a 0% success rate. But that wasn't really the point. The point is that you can come up with combinations of plays that result in somewhat silly success rates, and I don't disagree with that. But you're also dealing with three plays in a season where teams will attempt 900 or more.

3,3,4: This is a 33% success rate but a first down, plus the first two plays are nearly identical but the first two downs of the first group are both successes and the second group are both failures. Over a large group of data some of these will iron themselves out, but why put such a black and white metric over something that is not.

Not "some" of these, "most" of these will iron themselves out. And I think the biggest thing to take from these two plays is that, in the first example the fake offense here failed to convert on 3rd-and-3. In the second example, the fake offense successfully converted a 3rd-and-4.

Over the course of the 2011 season, teams converted on 3rd-and-2 60.3% of the time. Third-and-3: 51.3%. Third-and-4: 50.2%. Third-and-5: 43.2%. A single yard can make a significant difference (though in the example above there was not a huge difference between third-and-3 and -4).

2nd and 7 is almost the same as 2nd and 6, but 2nd and 1 is very different from 2nd and 6. Success rate completely misses the magnitude of plays.

Yes. Which is precisely why it is almost always used in conjunction with points per play, which is nothing if not a measure of magnitude.

This is why for football, an Expected Value model is much more valuable. With an enough data, you can get a pretty good description of the expected points based on all down, distance and yardline combinations. Once you have this you can evaluate the shades of gray for each play. A three yard carry on first and ten is nearly as good as a four yard one. A nine yard carry is even better. Expected Value can quantify the subtle and substantial differences between plays. The value difference between first and ten and the twenty and first and ten at the thirty will be the same whether it was one ten yard play or three runs totaling ten yards, although the value per play will justifiably be better.

Agreed. Which is precisely why it is almost always used in conjunction with points per play, which is nothing if not a measure of magnitude.

Success rates can vary wildly based on how you get from point A to point B, EV only carries where you start and where you finish.

Again, I guess this comes down to what you want to get out of stats. When I'm writing game previews, team profiles, etc., having being able to, at a glance, figure out how efficient and/or explosive a team is or isn't can tell me so much. We can get into the idea of the Defensive Footprint (see here and here), the idea of bend-don't-break defenses, more aggressive defenses, etc. Success Rate is a decent evaluative tool, but a) it is incredibly descriptive, and b) when combined with PPP, it is an excellent evaluative tool. S&P+ is both more accurately predictive and evaluative than simple PPP+, though I will say that PPP+ is slightly more connected to overall win percentage.

What is Success Rate Good For?

It is an interesting stat and isn’t totally without value, I just think that it is unnecessary and shouldn’t be a fundamental part of team evaluation.

I vehemently disagree.

There are lots of stats that fit this characterization. For a lot of teams it’s how they mentally operate, especially in the running game. Success rate does a good job evaluating running backs in traditional ground games.

I found this part funny, simply because I've actually stopped using it in evaluating running backs, in favor of Adj. POE (another tool based around expected value) and Highlight Yards (another way to gauge explosiveness, albeit with not "expected value" input). I've come to like what Success Rate tells me about offenses and defenses as a whole instead of what it tells me about individuals.

It might not totally align with scoring points and winning games, but it does align well with accomplishing a team's offensive objectives. Running backs often get tightly bunched near the mean in an EV model but success rate can be a way to further separate individual backs. Success rate will hold up between the tackle pounders but knock down the home run threat. EV may consider them the same (or more likely the home run threat will be higher) but the consistency of the old school back will be valued better by success rates.

Duly noted.

I don’t think success rate has much value for the passing game. Completion percentage and YPA are more than adequate to indicate both explosiveness and consistency.

I love YPA, so you'll get no fundamental disagreement from me here. I've got enough love to go around.

Coming Next: The Wisconsin Case Study and Optimal Offense and Defense Response

The underlying context of "ignore success rates" is that the traditional running game is overrated. If your main goal as an offense is to avoid bad third downs, and you are good at it, you will likely end up with a lot of third and short or third and manageable. Even if you they are all "good" third downs, each third down is a chance for the defense to take the field. We all remember the classic drives with multiple third down conversions, but we forget all the ones that could jump the odds and failed after giving the defense one too many chances to get off of the field. Explosive plays are essential to a productive modern offense and unless you are running a Chip Kelly or RichRod style ground attack, explosive plays are much more likely through the air than on the ground.

Next week I will follow up with a detailed look on the relative values of Russell Wilson and Montee Ball to Wisconsin’s 2011 offense. Ball had the TDs and the hype and Wilson was considered a quality second option. I’ll dig deep into the numbers and show why Wilson was the real threat of the Wisconsin offense.

Following that, I’ll have the final article in this series looking at how offenses (and maybe moreso defenses) can effectively maximize their expected points for and against through a better perspective on managing offensive output versus managing each down’s success or failure.

I look forward to it.

The bottom line here is that I fear a little bit of straw-man action at work here. The argument seems to be that there is a place for success rate, but it shouldn't be the tool for evaluation. It isn't overused in that regard, and it never will be. But it perfectly complements the expected value/explosiveness measure that is PPP+, and when broken down into splits, it can tell you can incredible amount about an offense or defense. It was never intended to simply evaluate, just as I never intended to get into advanced football stats purely to evaluate. It is a wonderfully useful tool if used in the correct way, and I indeed hope that I'm using it mostly in the correct way.