In one of my earlier articles, I looked at how your drive affects your opponent's next drive. My hypothesis was that how you ended your own drive would affect how your opponent did on their drive. One thing that I proved was that how you ended your own drive certainly affected your opponent's starting field position.
So I set out to see if I could prove that how an offense ends its drive will affect how their opponent performs on its next drive, independent of the yard line on which the opponent received the ball.
Once again, big thanks to CFBstats.com for providing College Football Data for the past seven seasons. In that data, there is a file containing information on every drive in college football for that year. That file tells us the starting position and reason for a drive, as well as the ending position and outcome. I just wanted to do a quick and easy analysis on relating the starting position and starting reason to the ending yard line. So I pulled all drives from 2008-12 and went to work.
First, I want to show you what kind of numbers we are talking about here. The most common starting reason for a drive was from a kickoff. This seems obvious with touchdowns, field goals, and the start of halves resulting in kickoffs. Punts were the next most common, and nothing else was really that close to those two. And if you have read anything else I have written this summer you know that coming up next is a chart showing you everything I can't explain in words. Here is a plot showing the number of drives that were started because of a certain reason and that were started at a given yard line (the '0' yard line is the endzone, the '100' means 100 yards to go for a touchdown, and "Possession" is usually related to taking the ball in overtime).
So there is a decent amount of data here for the most part, excluding Possession. And as long as we don't focus too much on the extremes then we should be good to go from here.
I was about to bust out the old Python tools I have gotten so familiar with this summer when I remembered that this would be a perfect time to use a tool in the software package R. R has a package called PLYR that can break down a data set by certain variables, then perform a function on each subset of that data. So instead of running a bunch of for loops in Python I can use one line of code in R that will give me the average end spot of a drive by start reason and start spot.
Here is a plot showing just that: the average end spot of a drive where an offense started the ball at a given yard line and received the ball in a certain way.
The average ending spots for drives that started because of a turnover on downs, a fumble recovery, an interception, a kickoff, or a punt are pretty similar. There is a slight hump at the starting spot of the 45-yard line for kickoffs, and I think this is the effect of teams recovering onside kicks here and then just taking a knee to run out the clock. I didn't filter out for garbage time or anything, so this may just be a residual effect of that. Missed field goals are incredibly spotty, and I think that is just a sample size issue, there really aren't enough missed field goals until about 60 yards out from the endzone. Just looking from the 60-100 yard line it seems to fit in with the rest of the data.
To me it is hard to conclude anything from this plot other than this: How you get the ball has little impact on how you perform on that drive. It is mostly based on where you get the ball. Just in case, here is a plot showing the largest difference in the data, receiving the ball from a turnover on downs or from a punt.
There is a pretty constant five-yard loss or so (remember, the lower the average end spot, the "better" the drive) from receiving the ball from a turnover on downs rather than a punt. Why that is, I have no idea. Any thoughts?
Any other analysis you would like to see that I missed? Let me know, because I'm not really sure what to make of this quite yet, I just wanted to include y'all in the discussion
And seriously to any fellow Stat Nerds, try to learn R. This entire post took 5 lines of code in R; reading the data in, using plyr to calculate the average, then three plots with ggplot2.