Filed under:

# Sacks in college football, Part 3: Sack rates and passing statistics

Determining which passing statistics correlate with sack rates.

In my previous two posts I looked at how sack rates affect a team's point differential, and wether or not the down and distance of a play impacts the cumulative sack rate for that team. In this post I want to look at wether or not a season's worth of passing statistics can help us determine the sack rate for a team.

### The Data

As always I need to give a huge thanks to www.cfbstats.com for providing play by play information on college football for the past 9 seasons. I grabbed all passing plays (including sacks which the NCAA counts as running plays) from the 2005 to 2012 seasons. I then filtered all plays to only include games between FBS opponents and excluded all garbage time plays from the analysis. From there I found season totals for Attempts, Completions, Touchdowns, Interceptions, Passing Yards, Sacks, and Sack Yards for each team from 2005-2012. This left me with 952 team-seasons of passing statistics to delve into.

### Passing Statistics

My initial hypothesis was that teams that throw the ball deeper are more likely to get sacked more, as they have to hold the ball longer and wait for the receivers to come open. But I also wanted to look at the affect of passing efficiency on a team's sack rate. I came up with the following list of passing statistics to examine:

• Yards per Completion: This is simply passing yards divided by the number of completions for a team. This should be a good measure of how deep a team throws the ball on average.
• Adjusted Yards Per Attempt: The formula for this is (Passing Yards + 20*Touchdowns - 45*Interceptions)/Attempts. This rewards teams who throw deep and throw more touchdowns than interceptions.
• Completion Percentage: Completions/Attempts, pretty straightforward.
• Attempts: How many times a team threw the ball.
Instead of doing an individual table for each, I used ggplot2 in R to plot all statistics on one graph using the team's sack rate as the dependent variable for that season and the independent variable as the passing statistic. In addition I included the correlation for each relationship, a linear best fit line (except for Attempts which had a quadratic relationship), and the 95% confidence intervals for the best fit line (the narrow bands on the line).

Here are my observations from this plot:

• Well that is a bummer. Yards per Completion has essentially no impact on Sack Rate. I could have sworn this relationship should exist -- if you throw the ball deeper, you should have to wait longer for your receivers to get open which should lead to more sacks. Of course we do not have "Air Yards" as a statistic in college. It could be that Yards per Completion just doesn't capture how deep teams throw the ball, and more captures how well teams are earning passing yards either before or after the catch. Any ideas?
• The other surprising feature to me is how strong the relationship between non-garbage passing attempts and non-garbage sack rate is. It seems the more you pass the ball the better you get at avoiding sacks on a per play basis. This makes sense, but I just didn't realize it would be this strong. I guess if you pass the ball that much, you can't afford to get sacked a lot. If you add Sacks to Attempts to get "Total Dropbacks" the adjusted R^2 of a quadratic model is .94, which makes sense, using Season Sacks to predict Season Sack Rates is a good idea. With just Attempts and Attempts^2 as your predictor the adjusted R^2 is only .22.

[Note from Bill: There is certainly a correlation between a quarterback's non-sack rush attempts and sacks, meaning mobile quarterbacks are more likely to get sacked more frequently. So a large number of pass attempts suggests a more pass-happy than run-happy offense ... which probably decreases the odds of a mobile quarterback lining up behind center. Just a thought.]

• The relationships between AYA and Completion % is about what I expected. The more efficient you are at passing the less you get sacked.

After finding these relationships I also wanted to look at the correlation between each of the passing statistics. The R package corrgram provides a quick and easy way to visualize the relationships among multiple variables. The following image shows the scatter plot between two variables in the upper right corner and the correlation value in the lower left corner (the bigger piece of the pie the higher the correlation, red is a negative relationship and blue represents a positive relationship):

The three highest correlations among my passing statistics are between Adjusted Yards per Attempts (AYA) and Completion Percentage, AYA and Yards per Completion, and finally Completion Percentage and Attempts. So these variables are obviously correlated a fair amount with each other, there are only so many ways you can measure a team's passing game.

### Conclusions

It seems that very few things matter when predicting a team's non-garbage, FBS only Sack Rate. The largest predictor for a team's sack rate in a season (other than the number of Sacks) is the number of times a team throws the ball. I'm not sure if this is an example of the fact that team's who can't avoid sacks won't have as many opportunities to throw (some sort of survivor bias) or the fact that teams who throw the ball more are better at avoiding sacks. In my next post instead of looking at what season passing statistics can tell us about sack rates I am going to look at what we can use in-season to try and predict a team's sack rate in their remaining games. However, if you have any ideas on things to look at for this post then please let me know in the comments.