clock menu more-arrow no yes

Filed under:

Using the Five Factors as predictors

Here on Football Study Hall, "the Five Factors" are often cited and studied. What are the Five Factors? Explosiveness, Efficiency, Field Position, Finishing Drives, and Turnovers: the factors determined to be most important within a game. As mentioned in Bill Connelly's original article detailing the Five Factors, isolating game performance down to a handful of statistics came from Dean Oliver and Four Factors in basketball.

As Bill demonstrates in his article, teams that win the Five Factors in a particular game are substantially more likely to win that game, anywhere between 70 and 85%. This writer likes to participate in an office pool or two, so I wanted to try to leverage the Five Factors into predicting winners.

The gist of my approach was to use season cumulative stats for the Five Factors for each team coming into a particular game and compare how those stats correlated with the final outcomes. There are multiple appropriate metrics available to measure each of the Five Factors; for this study, I'm using PpP (explosiveness), Success Rate (efficiency), Average Starting Field Position (field position), Points per trip inside 40 (finishing drives), and Havoc play percentage (Turnovers). Havoc plays are the plays that end in a TFL, sack, PBU, hurry, int, or forced fumble.

The dataset I used for this analysis consisted of all regular games occurring week 7 and later from 2005-2014 (3,612 games). I'm only using games after week 7 to allow for the season cumulative stats to stabilize a little bit. For every game, we have each team's O and D season stats for each of the Five Factors coming into that game. I wanted a single value for each of the Five Factors instead of 4 values (Home team O, Home team D, Away team O, Away team D), so I combined them into a single metric. The metric for each of the factors is built so that positive values imply the home team will be better in the game and, accordingly, negative values imply the away team will be better. Therefore, we'd add Home O and Away D stats (higher better for home team) and subtract Home D and Away O stats (lower better for home team). Let's look at Kansas @ Kansas State from 2014 Week 14 as an example:

UNIT PpP NCAA Avg. Difference
Kansas State Offense 0.4284 0.345 +0.0834
Kansas Defense 0.3715 0.345 +0.0265
Kansas State Defense 0.2783 0.345 -0.0667
Kansas Offense 0.2137 0.345 -0.1313
Single Metric (SUM with + + - -) 0.3079 +0.3079

The single metric is positive, indicating that PpP would select Kansas State as the winner. This value would also indicate that, according to un-adjusted season totals, Kansas State would be expected to have PpP value of .345 + .3079 = 0.6529 (.3079 better than average) in this particular game.

So what were the results? Let's start with the simple stuff…

Bill gives more weight to Explosiveness and Efficiency than the other factors, and when thinking of the factors in a predictive way, this makes sense. Those two stats are more stable game to game - Field Position, Finishing Drives, and Turnovers are much more finicky.  That's also evidenced when looking at the individual Five Factors as predictors:

Factor Win as predictor%
PpP (Explosiveness) 71.1%
Success Rate (Efficiency) 70.0%
Field Position (Avg. Starting Field Position) 63.1%
Finishing Drives (Pts. Per Drive Inside 40) 53.0%
Turnovers (Havoc Rate) 51.0%

Just by using two teams' O and D PpP you can expect to pick 71% of games correctly. That's not too bad! Algorithms tracked by Prediction Tracker average around 75% and can be much, much more complex.

It's also pertinent to look at how many of the factors side with one team. In games where all 5 factors side with the same team, the pick was correct 79.1% of the time. When 4 or 5 factors agreed, the Five Factors' pick was correct 75.4% of the time.

What about a more complex approach?

I also attempted to fit a decision tree to the data. The results and variables selected by the algorithm weren't surprising (using only PpP and Success Rate), but it does help isolate scenarios where the Five Factors can be even better at picking winners.

(Click image to enlarge.)

If you stick to games that fall outside the circled decision tree nodes, the win % jumps to 83.3%. Those nodes account for 49.7% of games, so on about half of the games, we can expect to pick 83.3% of the games correctly; again, pretty good - if you're not worried about picking every game.

Are the results here surprising or earth-shattering? No. But, it again shows the value of identifying important core statistics describing a team's performance, without complicating things too much. Long live the Five Factors.