After hitting a wall on my third down analysis I needed a break from third downs and wanted to look at something else. The following is what came out of that, and now I also have a new topic to explore. I have always been interested in knowing exactly what percentage of possessions end in touchdowns versus punts versus interceptions and all other outcomes, and thanks to CFB Stats I can now answer that question.
I grabbed all of the play-by-play data and drive data provided by CFB Stats for the years 2005-11. I kept 2012 data out of the sample because I would like to keep that data for testing purposes on any models I may develop out of this analysis, but seven years of data is plenty. The play-by-play data and drive data match up with a game code and drive number, so I can link any drive information contained in the drive file to any play-by-play information contained in the the play data. The drive file provides the end result of each drive, so after linking the date files I know the drive outcome of each and every play that occurred. After that I filtered for plays only between FBS teams and also only in non garbage time (using the same conventions as Football Outsiders) and tabulated the outcome percentages by yard line and down. Oh, and I ignored drives that end at the half. So lets get into the results.
First, I want to provide you with the sample sizes I am dealing with. Here is a plot of the total number of plays for each down and yard line combination in my sample set. First down at your own 20 is off the charts with over 15,000 plays that start there. I just remembered about the NCAA Kickoff and Touchback rule changes that went into affect last year so that is something I will need to keep in mind when looking at the 2012 data. But, for the most part there are plenty of plays at each yard line to separate the analysis by down.
(Note: in the CFB Stats data, the 0 yard line is the opponent's goal line (i.e. where the touchdown occurs), while the 100 is the offense's goal line.)
Now for the good stuff. Here is a collection of plots that show you the percentage of drives that end in a given outcome where a play occurred at a given down and distance.
Just to be clear, what these charts show are, "Given a play on a certain down and yard line, how often is the end result of the drive the specified outcome."
Here are some of my observations, feel free to put yours in the comments.
- Touchdown Percentage is pretty much what you would expect, and is essentially declining constantly outside the redzone. The fourth down trend line is interesting. A sharp decline is expected but at about 10 yards out the touchdown % actually starts to rise. Here is my attempt at an explanation. If you are inside 10 yards you probably won't get another chance at a 1st down, so many teams will just accept kicking a field goal. I think this explains the spike right at 10-12 yards out, teams now have the ability to go for it on fourth down and still get a first down, not necessarily having to score a touchdown. The peak at 40 yards out is probably from the fact that few coaches are going to punt (nor should!), and their field goal percentage isn't too high and they decide to go for it.
- Speaking of Punt Percentage, what coaches are punting the ball 25% of the time when they have 4th downs 30-35 yards from the endzone?!?! I know I am not looking at distance to a first down here, but why would you punt the ball in that situation? And its important to note that just because teams punt the ball at the end of that drive 25% of the time when they have a fourth down 32-33 yards out, it doesn't technically mean they punted on that down. They could have gotten a penalty and then punted, but I doubt that happens that often.
- When you have a first down, you are always more likely to score a touchdown than successfully make a field goal. That is kind of surprising to me. Heck, even on second down, no matter where you are on the field, you are more likely to end your drive in the endzone than through the uprights.
- Other than that, everything is kind of how I imagined at the start of this. Interception and Fumble rates increase the farther away you get from the end zone. The farther you are from the endzone the more plays you are gonna run so you will increase your chances of turning the ball over just because you are running more plays.
- Anything interesting you noticed or are curious about?
So what can we do with this information? Well, the first thing I thought of was to calculate an expected point value for any down and and yard line combination. This expected value is the number of points you can expect to score on your current drive given your current play status. How I got the expected value was this formula
7.0*TD% + 3*FG% - 2*Safety%
I know that you don't make every extra point so sometimes your touchdowns aren't worth a full 7 points, but its just easier and I already made the chart.
This is just the raw data and I will probably smooth this if I want to do any predicting, but the raw data still is pretty interesting. The lesson from this chart is convert your third downs! The drop from third-down value to fourth-down value is predictably huge. My next posts will look into not only your current drive's value, but your opponent's next drive following yours, and hopefully even your next possession after that. Any suggestions or comments or criticisms are fully welcomed.
For the fellow nerds; I used Python to manipulate the data and R to make the plots using the ggplot2 package. If you have any questions about how I did anything then fire away in the comments or email me. You can find the actual data files that I used to produce the plots here.