Trying to determine who was the best running back, receiver, or quarterback through out the year is usually an exercise in futility. Take one of the most basic statistic we have to measure performance, yards per attempt. If you look at the leaders in rushing yards per attempt last year though, you'll notice it's a flawed statistic.
|Leaders in Yards Per Attempt for the 2014 season
|Derek Di Nardo
Notice anything? The most carries by anyone on the list is 4, not exactly a sustained string of excellence. The range of end of season yards per carry values for college football players is almost entirely dependent on the number of carries a runner gets:
Another issue with yards per attempt is that it doesn't account for the quality of opposition faced or the strength of the blocking by your team. Lucky for us there is a mathematical modeling procedure called Multilevel Modeling that can at least give us a start to accounting for all three weaknesses of our basic measuring tool of skill player performance.
I'll present the basics of how Multilevel Models work, the difference between the model estimates and yards per carry, and how it affects players and teams based on number of carries and quality of opposition. Then I'll discuss some possible next steps. Oh, and all the code for this analysis is available on github, although the data isn't. Sorry, I don't pay the bills. :/
If you want a proper introduction to Multilevel Models, you can't do better than to read the one that was posted this past month to the Stitch Fix Technology Blog. You'll notice that they refer to is as Mixed Effects models, and that is also fine. The modeling procedure has many names, but I'll stick with multilevel modeling. For a very basic summary of multilevel models, here is what Andrew Gelman had to say:
Multilevel (hierarchical) modeling is a generalization of linear and generalized linear modeling in which regression coefficients are themselves given a model, whose parameters are also estimated from data
What this means is that we fit a regression using the players and teams themselves as inputs (as dummy variables) with the yards gained on each carry as the dependent variable. Simple linear regression can't be used in this situation since runners in college football don't ever switch teams; Justin Thomas will only run when Georgia Tech has the ball.
There is collinearity in our regression variables. Multilevel models assume that there is natural variation between the individuals in a group and shrink each individual's estimate towards the group mean. The math behind the mixed effects model will determine when an individual has done enough, either by playing very well or very poorly in a limited number of plays, or by having enough observations to demonstrate his own ability, for the estimate to deviate from the overall group mean. This shrinking allows us to fit a model that we wouldn't be able to in ordinary least squares regression.
Trey Causey has written a post measuring quarterbacks in the NFL using multilevel models, and if you are still really confused and want more detail I'd recommend checking his post out.
Comparison to Yards per Carry
Basically, instead of measuring running backs by their yards per carry, I'm going to fit a Multilevel Model with the runners, offense, and defense as random effects and extract the estimated coefficients from each group. This will allow me to measure the estimated impact each running back has on his yards per carry after taking into account the effects of the rest of their offense and the defenses they have faced. We can compare these to the yards per carry values for the runners in the 2014 season, as the model was fit on the 2014 play by play data.
Here is how the distribution for both Yards per Play and the coefficient estimates from the Multilevel Model look. I trimmed the x-axis just to help with the visuals; there are some runners with yards per attempts outside these values.
As you can see, the Multilevel Model gives a much tighter distribution for the value of a runner, and the huge outliers in yards per carry are greatly reduced. The effect that the model shrinkage has on runners can best be seen in the following scatterplot that compares a player's yards per carry to his coefficient estimate:
There are two main trends at play here. The first is the steeper line of small dots. These are the players with very few carries but extreme values in yards per carry. This is basically the most that a runner's yards per carry can impact their multilevel model coefficient. Without more carries, each additional 10 yards per carry only gets you about a third of a yard of value on the multilevel model coefficient scale.
The second trend at play is the much flatter line of runners with the larger dots. These are runners with enough carries to establish their own skill level in the model and have set themselves apart from the random variation associated with the overall group of runners. If two runners in this group have similar yards per carries and number of carries but different model coefficients then they probably faced tougher opponents or had other runners on their team have similar success. For example, here are the 4 most similar running backs to Dalvin Cook's numbers last year:
| Multilevel Model Coefficient Estimate
A good comparison is Larry Rose. New Mexico State obviously played an easier schedule than FSU did, so Dalvin Cook receives a boost in the Multilevel Model even though he has very similar raw numbers to Larry Rose. I think there is a lot more I can discuss on the differences between the Multilevel Model coefficient estimates and Yards per Carry, but for the sake of brevity I'll continue on.
Team Estimates from Multilevel Models
Here are the same plots as before, comparing yards per play and the model output, except now for both offenses and defenses.
Because teams have a lot more observations, they vary from the yards per carry estimates a lot less than individual players. But there is still some adjustment going on in terms of quality of opposition faced.
Conclusions and More Questions
So what's next? I think there are many more areas of research on this topic that need to be explored. The first step is expanding the scope. Why look at just yards per play when you can do success rate, first downs per play, fumbles per play, etc.? You could turn all of these model outputs into a composite score for skill players.
And why stop with running plays? If you had target data you could fit a multilevel model with random intercepts for quarterbacks, receivers, offenses, and defenses.
By the way, I do and I will :)
The tough question is, how would you determine if these models are any good besides an eye test? Is it important for this model to predict future out of sample yards per play? Do I care if the model is accurate as long as it separates the good running backs from the bad? Maybe an in-sample validation test is more important since I want to determine who has been the best *so far* and not necessarily who's talent level is highest. I honestly don't know the answer to these questions and would love some feedback so please feel free to comment on this article or get in touch on twitter or by email.
And just for fun here are the top 10 runners, offenses, and defenses according to the multilevel model coefficients.