I scraped this data during the second week of CFB and the first week of the NFL. Excuse my delay on getting around to the analysis.
Naturally, with every new season comes new AP rankings, new potential all-Americans, etc. That is when I decided to start thinking about some trends to look for in College Football. Since style of play (offensively) is one of the hottest topics in the game, why not find a way to look at that.
Win Distribution by Conference
Win Distribution (Conferences)
As we all know, College Football is being taken over by the Spread Offense. There are other styles, such as the option and pro style. Let's see if we can use stats to identify various styles of play. Additionally, I am hoping to see whether style of play corresponds to a higher win percentage. There are various styles of play, option, spread, and pro style offenses. The four main [schools of spread offense](http://www.footballstudyhall.com/2016/5/6/11606684/the-4-main-schools-of-spread-offense-smashmouth-option-air-raid-pro-style) are: (1) Air Raid, (2) Spread Option, (3) Smashmouth Spread and (4) Pro-style Spread.
Traditionally speaking different styles are defined by the types of sets they come out of. A pro style offense will traditionally come out in the I-formation or the pistol formation and very rarely line up in shotgun. Whereas a traditional spread team will line up in the shotgun and very rarely anything else. Of course each coach has their own wrinkles.
The question was how to mathematically quantify styles of play. Luckily enough, each style of play has distinct characteristics and these can be seen in the stats. For instance, a *succesful* spread team usually runs a high number of plays per game. Spread offenses are also extremely quick and incorporate no-huddles when they catch defenses in the wrong set, this correlates to lower Time of Possession (than Pro-Style offenses) and very short time per play. It is important to note, that traditionally when talking about time and spread offenses, ESPN and other TV broadcasts highlight the time from the end of the previous play to the snap of the next play. Sadly, we don't have access to such data.
I chose the following statistics to identify team styles via a clustering analysis. These statistics were deemed to be the best indicators of style. They are as follows:
* Number of Plays Per Game
* Yards Per Play
* Time Per Play
* Total Offensive Yards Per Game
* Ratio: (Passing Yards Per Game/Rushing Yards Per Game)
The next question is how to use stats that have different units. The stats are scaled using the scale function in R. The scale function centers a set of values by subtracting each value by the mean and dividing by the standard deviation. At this point, we should be able to start running our analysis.
One of the most popular clustering methods in data science is K-means clustering. The clustering method partitions the observations into *k* clusters. Clustering analysis requires that you indicate the number of clusters expected from the data. They have different methods to identify the number of potential clusters. Both the elbow method and gap statistic show that there should be 3 clusters in the data.
Time Per Play vs. Total Offensive Yards
In the above plot, the horizontal and vertical lines to highlight the mean values for their respective axes. The left corner is the epitome of a fast pace offense, quick hitting plays and tons of offensive yards per game. Interestingly enough, the pro-style offenses fall in multiple quadrants. In general, it seems the pro-style offenses take quite a bit of time per play, regardless of the number of yards gained per game.
This is just a preview of the full post, I can't seem to post tables and such on here. If you want to take a look, here is the link: http://meysubb.github.io/sports%20analytics/2016/10/29/CFB.html.