/cdn.vox-cdn.com/uploads/chorus_image/image/61006527/usa_today_9503795.0.jpg)
So I’m really bad at e-mail. Horrid, actually. I’m getting worse at Twitter DMs, too. Basically, any method of communication in which people can send me interesting questions to answer, I’ll basically a) say “Oh, that’s interesting,” b) star the message for follow up when I’ve got more time to marinate on it, and c) completely forget about it for three months. I hate it, and I haven’t figured out how to change it yet.
This has been especially frustrating of late, as I’ve had some high school football coaches reach out, say they’re interested in finding an edge with stats, and ask if I’ve gotten any suggestions.
I’ve gotten quite a few of these and similar queries this offseason (plus some e-mails from teachers and students), and in case that trend continues, I thought I could take the time to publicly respond to some of the topics so that, if future coaches contact me, too, I can at least direct them here. Hopefully this post is useful for you because it’s definitely going to be useful for this irresponsible communicator.
Below are real e-mails, with any identifying personal info removed. Hope this serves as an interesting resource, and if/as different questions come in, I can add them here.
1. How to do explosiveness
I’ve gotten by far more questions about explosiveness. This is a pretty good one, and not just because it starts off with a compliment, ahem:
I am a high school coach, an offensive coordinator to be specific, and I want to use numbers to give my team a greater edge.
I have used the college success rate as the foundation of my offensive philosophy. One of its biggest advantages is its ease of use.
This obviously makes my heart happy. Success rate has become, in my opinion, by far the most useful tool in terms of predictiveness. For pretty much anything that you tell your team is important — big plays, turnovers, etc. — efficiency will to some degree define your ability to win those battles.
Want to make more big plays? There’s no magical big-play down-and-distance combo, so you have to stay on schedule and simply stay on the field longer until the big plays come. Want to win the turnover battle? Avoid passing downs, and force your opponents into them. Nothing turns the turnover faucet on better.
More from that e-mail:
However, most other metrics seem very complicated and difficult to compile. I really like your Point Per Play metric, but from my perspective it’s not an easily “eyeballed” statistic. I need a metric that does a similar job, but is more quickly calculated.
I’ve gotten a lot of questions about my IsoPPP measure and explosiveness in general. If you’re seeing that term for the first time — and I apologize for what an eyesore it is — here’s a quick review of the Five Factors from the Football Study Hall glossary.
I reference the Five Factors in a number of different ways in my previews and stat profiles, some adjusted for opponent, some not. They are all interrelated.
1. Efficiency. Presented through success rate (unadjusted, see definition below), Success Rate+ (adjusted), and Marginal Efficiency (see entry below). As defined above, success rates examine your efficiency and consistency in staying on schedule and putting yourself in position to move the chains. In terms of projection, it is by far the most important of the factors.
My version of success rate, by the way, is defined as gaining 50% of necessary yardage on first down, 70% on second down, and 100% on third down. That’s more aggressive than what most coaches use (typically it starts with four yards on first down), but this definition gives me basically the same approximate success rate (in the 40-45% range) for each down.
One other note: as I’ve begun to work with NFL data, I’ve been fascinated to find that this definition produces almost the same exact results at that level. From 2009-17, the average success rate in college was 42.1%. In the NFL, it was 41.2%. (What is it in high school? Stay tuned. I might have an answer for that soon.)
2. Explosiveness. Presented through Isolated Points Per Play (IsoPPP, which is unadjusted), IsoPPP+ (adjusted), and Marginal Explosiveness (see entry below). IsoPPP is the Equivalent Points Per Play (PPP) average on only successful plays. This allows us to look at offense in two steps: How consistently successful were you, and when you were successful, how potent were you? Big plays often make the difference in a given game, but they are random enough to be unreliable.
3. Field Position. Presented through average starting field position (unadjusted) and FP+ (adjusted). This is mostly self-explanatory, with one important note: You should remember to measure an offense by its defense’s starting field position, and vice versa. Special teams obviously play a large role in field position, but so do the effectiveness of your offense and defense. So in the team profiles, you’ll find Defensive Starting FP in the offensive section and Offensive Starting FP in the defensive section.
4. Finishing Drives. Presented through points per trip inside the opponent’s 40 (unadjusted) and Red Zone S&P+ (adjusted). Also mostly self-explanatory. These measures look not at how frequently you create scoring opportunities, but how you finish the ones you create. And yes, for the purposes of these stats, the “red zone” starts at the 40, not the 20.
5. Turnovers. Using both Turnover Margin and Adjusted Turnover Margin (as defined above), we can take a look at both how many turnovers you should have committed (on offense) or forced (on defense) and how many you actually did. This tells us a little bit about quality and a lot about the Turnovers Luck idea defined above.
Basically, IsoPPP (which I should just call “explosiveness” or something) is used to break football into two questions: How frequently successful are you, and when you’re successful, what’s the magnitude of your success? Are you gaining five yards on first-and-10, or are you gaining 12? Are you gaining four yards on third-and-4, or are you gaining 17?
I use my equivalent points model for calculating IsoPPP; it’s similar to just about any expected points type of model — it basically assigns point values to each yard line on the field based on who’s more likely to score when you’ve got the ball at that yard line. So if you’re at your own 1, the expected point value is negative because your opponent is likely to score next. If you’re at their 1, the expected point value is close to a touchdown.
Expected points are great, but they aren’t necessary.
Using plain old yardage could get you pretty far down the field, so to speak. If you’re breaking the game out into successes and the yardage average of only those successful plays (call it Isolated Yards Per Play if you like, or maybe Yards Per Success), that will give you a lot of useful information, and it won’t require more than two columns in Excel.
By the way, here’s a bit more from the e-mail above:
For the game of high school football, in my opinion, 1st down success and explosive plays that score, determine the outcome of almost every game. Explosives that score eliminate the need of repeated execution. Execution is a tricky thing for your average set of 16 year olds. Do you have any suggestions?
Explosives that score are obviously the goal, but any explosiveness is good, obviously. The trick here is that, if a big play only gets you to the 10, efficiency still has to get you the rest of the way there. Efficiency = execution, and if you master it better than your opponent, you’ll probably win whether your big plays are getting all the way to the end zone or not.
2. Data collection, sample size, etc.
I’ve been following your work quite a bit and have recreated your work in multiple excel spreadsheets that allows me to import our data from Hudl (our video data/editing platform). I have a few questions (I’m sure there will be more) for you.
1. Do you know of any high schools who use you method/work successfully?
Not specifically, though since you brought up Hudl, allow me to mention that I’ve been working with Hudl to hopefully produce an analysis of how high school play-by-play data differs from that of higher levels in the sport. I’m really excited about this.
2. How do you collect your base play-by-play data?
For college (and, in 2018, pro as well), we subscribe to a service that provides that information. At the high school level ... I think Hudl allows data exports? Maybe? (Please, someone contact me if I’m wrong about that.)
Either way, I’m guessing most coaches are documenting some basic tendency data in one way or another. If that makes its way into even a simple Excel sheet, you can easily add a couple of extra columns to measure things like success rate, yards per success, etc.
A success formula in Excel will look something like this:
=If([yards gained]>=(If([down]=1,([distance]*0.5),If([down]=2,([distance]*0.7),If([down]=3,([distance]),If([down]=4,([distance]),”X”))))),1,0)
Where [yards gained], [down], and [distance] all refer to the cells that house that data.
If you’ve got that, then you can easily set up another column to measure success yards.
=If([success]=1,[yards],0)
From there, you can basically just average out the success column to find success rate, and you can use something like this to calculate yards per success:
=SUM([success yards])/SUM([success])
Hopefully that helps.
3. I’ve been using the efficiency measure to evaluate specific concepts that we run. Do you feel that this is an acceptable/effective may to measure the effectiveness of a play? If so, what would a reliable sample size be?
OBVIOUSLY YES to the first part of that. To the second part ... man, you’re always going to be dealing with sample size issues in football. It just is what it is. I will say that this is another way in which efficiency comes in handy. One single explosion play can completely skew a game’s per-play data (and since big plays are so random, it can give you a false impression of whether a team is “explosive” or not), but something like success rate can give you trend data faster than anything else. Granted, one game of 60 plays or whatever probably isn’t enough, but part of the grand flaw of a sport with just 10 or so games is that you’re never going to have the sample size you want.
3. Special instances
Our students have enjoyed digging into some of your posts on this topic, and we have a few questions we would like to run by you. I hope this is ok.
I believe the children are our future.
One of the things that interests them is the Success Rate. We think we know how you calculate this --- 50% of yards needed on 1D, 70% of yards needed on 2D, 100% of yards needed on 3D and 4D. Is this correct? If not, what are the percentages?
Yep!
In addition, we wonder whether the following “special cases” are included in the denominator: punts, field goal attempts, plays on which there is an offensive penalty or defensive penalty, downs that are eventually replayed due to penalty, etc. What can you tell us about these cases?
- If we’re talking fake punts or field goals, yeah, take those out if you can. They obviously don’t tell you much about a team’s offense or defense. (And if we’re talking regular punts/FGs, yeah, take those out, too. Going into special teams is a completely different rabbit hole, one for which I don’t have high school data ... yet.)
- Penalties are interesting. First, if a play is called back due to penalty (in college play-by-play data, it is given a “NO PLAY” designation), don’t count it. If there’s a play that includes, say, a face mask penalty, then don’t count the penalty yardage as part of the overall gain. If it includes something like an illegal block downfield, count the play as a gain up to the point where the penalty happened.
The students are hoping to calculate some of these rates themselves, using play-by-play data to calculate them. Our plan at this point is to have them scrape this data (since it is fairly accessible). Is this the best way to get that data, do you think, or is such data readily available in spreadsheet form already for multiple years?
Scraping data is an insanely useful skill to build, though obviously you can pay for play-by-play data at the CFB or NFL levels. High school, probably not.
(Scraping data is also the only way we’re ever going to get somewhere with injury data and measuring the impact of injuries, suspensions, etc.)
4. Nerds of the world unite
I am a current a junior high school student. I am in an AP Research class, and I am starting to write a year-long paper that explores a gap or builds onto existing research in a certain field.
You [have mentioned] how there is a lack of research in the field of college football statistics, and I immediately thought it would be a great idea for my research paper, as I am a huge college football fan. I was wondering if you could maybe explain to me possible approaches I could go at researching this. Is there a specific area that you know personally that has very little or no research that I could maybe look into? Any help would be appreciated!
I guess this is a question that is hard to answer in an FAQ. I’ve gotten quite a few of these, too, which makes me particularly sad that I’m bad at e-mail. Just keep pestering me. And maybe we form a mailing list or something, I don’t know.
Loading comments...