Modeling the 2013
NCAA College Big Ten Football Season
One of the biggest difficulties that I, as a sports modeler, run into when trying to predict the way a season will go is that the only data available is for games that have already been played and seasons that are already complete. That means that any model that relies on on-field performance rather than subjective evaluation (AP, BCS, Harris, etc) must start out with a low level of confidence in the results and move toward a greater level of confidence in the results as the season progresses and better, more current data becomes available.
So, with that very large caveat, for my very first post on Football Study Hall, I’m unveiling my 2013 Season model. In order to keep the development process manageable, I’ve only modeled the Big Ten so far. As we move through the summer I will expand it to include all FBS teams as well as develop some more user-friendly reports.
1. What am I trying to do? My model attempts to estimate the number of wins a team will achieve during the 2013 season. It's a low resolution model that provides me the winner; or if it can't calculate a clear winner, declares the game a toss up. It doesn't provide point totals or margins of victory so it can't be used for 'against the spread' calculations.
2. How do I estimate wins? That’s the secret sauce, and a good cook never reveals the actual ingredients. However, in the interest of not giving the critics additional ammunition to sharpshoot my work, here’s a basic laydown of the components and steps involved:
a. Each team's scoring offense and scoring defense is normalized to a Z-score using the difference between a team's actual offensive PPG and defensive PPG and the average PPG of the entire FBS and dividing it by the standard deviation of all FBS PPGs. This gives me a more usable (apples to apples principle) comparison between teams as well as across statistical categories.
b. I calculate an ‘advantage' score for each team by subtracting an opponent's Def Z-score from the team's Off Z- score (Off Advantage) and by subtracting an opponent's Off Z-score from the team's Def Z-score (Def Advantage). I then add those to advantage scores together to get combined advantage score.
c. Using data from all games from 2007-2011 I calculated the average MOV and standard deviation MOV for each combined advantage score. With this information I used excel to calculate the probability of a MOV of at least 1 (less than 1 would be a loss). For the excel nerds, that calculation is (1-(normdist(1, ave MOV, sd MOV, true) ).
d. Because I'm a firm believer in confronting the reality that chance plays a much larger role in things like sports than we choose to admit, my model deals with chance by calculating the probability of winning for both teams. When I run the model, if both teams are predicted to win or lose, I call the game a toss-up. If there is a clear winner, that game counts as a win.
e. I run the model a couple thousand times and record the results of each run.
f. I count the results for each team. For both wins and toss ups I calculate the average, max, min, and median of the results.
g. Finally, I set predicted number of wins this way: Number of Wins + ½ of tossup games rounded to the nearest integer (if ½ of toss up games is 2.5 I count it as 3).
My very early season predictions for the Big Ten
Again, remember that this season prediction is based on LAST YEAR’s on field performance. It won’t be until around the start of the conference season that I have enough good data for the 2013 season to feel confident in the results. Additionally, at that point teams will have played at least four games, which should help the expected wins accuracy a lot.
First, the average expected wins:
The ‘toss up’ component of the model likely limits the better teams in the model, so a reasonable interpretation of this is that Nebraska, Northwestern, and Wisconsin are 9-win teams and OSU is a 10-win team.
Read this chart as the probability of a team getting AT LEAST a certain number of wins.
Any way you slice it, Illinois, Iowa, and Indiana, Purdue, and Minnesota will have to show major improvement in scoring offense and defense if they hope to reach a bowl game. Nebraska, Ohio State, Wisconsin, and Northwestern are virtual certainties to become bowl eligible, with
Penn State, Michigan, and Michigan State also likely bowl teams.
At the risk of sounding pedantic, these predictions are based on the performance of last year’s teams, which will not, most likely, be playing this year. As the 2013 season progresses and I begin to use data from 2013, I suspect these numbers will change, though probably not by very much. The model may also overestimate the effect of tossup games for the best teams in the conference.
Sensitivity or What-If Analysis
Two things that I think are going to happen this year is that Nebraska’s defense and offense are going to be better. Taylor Martinez will be a 4-year starter, and several young defenders who redshirted or saw limited playing time last year are expected to help the defensive unit improve. And Penn State will begin to show the effects of the scholarship limits. Last year they played with an "us against the world" mentality, but that emotion is not sustainable for the long term. I believe they will see some decrease in on field performance on both offense and defense.
To simulate these changes, I slightly increased Nebraska’s offensive and defensive scores and slightly decreased Penn State’s scores and reran the model.
As you can see, even small changes in average on field performance in two teams noticeably affected the expected win totals of three other teams. To me, this is an example of Chaos Theory or the Butterfly Effect. Small changes in starting conditions of one part of a system can have unforeseen and dramatic effects in other parts of the system. They certainly did in this model.
Once again, I’m excited to be part of Football Study Hall and appreciate the opportunity to interact with college football fans from across the country.
Until next time, GBR!