Time To Share Some Data
One of the reasons I thought it would be interesting to start Football Study Hall, in addition to the work I was already doing at Football Outsiders, is because I firmly believe that there is no better place to build a community than SBN. And in building a community within the SBN confines, I could also attempt to start a bit of a data-sharing co-op.
The simple fact is, college football data is hard to come by. That's what has made CFB Stats such a revelatory experience over the years. You can find professional baseball data back into the 1800s, and NFL and NBA data well back into the 1960s, but Marty's work in making sure we have full data just from 2004 was such an incredibly welcome experience. But let's take it a bit further back.
Think of all the pieces of college football data you cannot easily access. Complete play-by-play data from before the middle portion of the last decade is nearly impossible to compile. For per-game stats before 2004, you basically have to go to a school's official website; for stats before about 1998 ... good luck. Sports Reference has done a solid job in pulling together a decent amount of individual player data, but it is still sparse. And ... scheme data? Figuring out who was running a 3-4 defense in 2002? Who was running the wishbone in the early-1990s? Forget about it.
So it's time to get started. I have uploaded two documents for your perusal/enjoyment/use in writing your own posts. The documents give you historical S&P+ rankings for offense and defense. (For overall S&P+, go here.)
Now let's get some more. I'll soon be posting other things like F/+ history, pace, run-pass ratios, etc., but basically everything I have to provide is from my own database and from the 2005-10 range. I'll need your help to get more. So in the coming days/weeks, I'll be creating a series of editable documents for the following:
- Links to old play-by-plays. This will be a list of games from probably the 1998-2004 range. I want help in figuring out which games can be accounted for on teams' official sites (and, in the absence of official site data, ESPN.com's no-tackles-data box scores will suffice.)
- Game Stats. We will probably not be able to come up with complete play-by-play data for any year in the 1998-2004 range, but we probably can at least come up with box score data for them. So this file will be a list of games with a spot for the basic stats: rushing and passing yards, first downs, sacks, turnovers, etc. This can be found if you, like I, have a complete back log of Phil Steele mags. I haven't had the time to enter things from those mags into spreadsheets, however, so the goal is to get some help with that.
- Coaching Staffs. This will be a document listing every team going back for years/decades. There will be columns for head coach, offensive coordinator, defensive coordinator, etc. If a coach has a Wikipedia bio, you can probably follow his career progression, but if you are wondering who was on Bob Stoops' 1999 staff and you aren't an OU fan who can immediately recite this, digging up this type of resource is difficult.
- Schemes. This might meld into the Coaching Staffs document, honestly, but the idea is to record schemes and styles for previous teams, not just the staffs themselves. Obviously we can't go too far into detail here, but having data regarding Flexbone vs I-formation vs spread vs 3-4 vs 4-3 vs 4-2-5 would be wonderful, even if it is incomplete.
- ?. I am very much open to suggestion as to what else to add.
I will also be figuring out the smoothest way for users to upload their own files and share them publicly. Interested in sharing data you've collected regarding the weight of defensive lines? Old recruiting data? Whatever? You can do that too.
I'm assuming that a lot of readers, both of this site and of SBN sites in general, would be interested in contributing to (at least a bit) and using these types of documents, and I hope to get a lot of assistance with it. If you're interested in such a thing, feel free to either leave a comment here or contact me at billconnelly1 at gmail.com. I get jealous of fans of other sports like baseball and pro basketball; they have access to decades (or, in baseball's case, over a century) of data, and we've got next to nothing for college football. It's time to do something about that.
26 comments
|
Do you like this story?
Comments
This is Awesome
Two quick questions:
When you say game stats will be made available, will this be filed under invidual games, or will it be whole season ‘game stats’ for each team.
Will all the data you’re publishing only include non-garbage time?
Formerly known as 'stilts'
Unless it is derived from play-by-play data...
…there will be no way to differentiate garbage time vs non-garbage time. So if, in the absence of pbp data, we just use the complete game stats, then garbage time is not an option.
When you say game stats will be made available, will this be filed under invidual games, or will it be whole season ‘game stats’ for each team.
The idea would be that we compile all the stats from the individual games. If we get all the stats from all the games, then we could obviously compile it as well.
I'm in
Florida has a lot of its basic game stats and coaching staff lists here, so I’d be willing to help turn it into something useful.
As for getting play-by-play data, maybe reaching out to SIDs would be helpful? I’ve not talked to one, but if anyone has that data, they would.
Team Speed Kills -- SBNation's SEC Blog
If you're so inclined, follow me @Year2
The SIDs should certainly have the play-by-play data.
It would be a massive project collecting it and scanning/transposing it from paper into data we can use. I contacted Notre Dame’s SID recently and received paper copies of play-by-play from every ND game from 1981-2000. I presume the same stuff is available at every major program, if not every single program.
by Brian Fremeau on Jul 8, 2011 10:31 AM EDT up reply actions
If you could craft a form letter based on the one you sent, that could help others with requests to other SIDs. It’s a huge task to digitize it all, but step one is getting it in the first place.
Team Speed Kills -- SBNation's SEC Blog
If you're so inclined, follow me @Year2
Yeah, the form letter's probably the way to go...
…or form e-mail, I should say. I’d happily send it along, though I might need help compiling the contact information. Sounds like another google doc, ahem…
Here was my email
Dear X,
I am a college football statistics analyst and writer for Football Outsiders. For my analysis, I collect and process large amounts of college football drive and box score game data which is easily accessible on the internet for recent college football seasons (about 2001-present). I would appreciate any access you can provide to copies of historical statistical summaries, play-by-play records, box scores, etc of Notre Dame football games prior to 2001.
Please let me know if you need more information from me at this point, or if it would be more useful to set-up an appointment to visit your offices. Thank you,
by Brian Fremeau on Jul 8, 2011 11:04 AM EDT up reply actions
Per game stats to 2000 are available on ncaa.org
You just have to know how to get to it. If you go to the main page you’ll see this in the web address bar:
http//web1.ncaa.org/mfb/natlRank.jsp?year=2011&div=IA&site=org
Just change the year query and you’ll get the data for that season.
http://web1.ncaa.org/mfb/natlRank.jsp?year=2003&div=IA&site=org
It’s pretty comprehensive. For example, here’s game by game data for Middle Tennessee State in 2000:
http://web1.ncaa.org/d1mfb/2000/Internet/ranking_summary/2000000000419teamoff.html
Bloggin' at JoePasDoghouse.com
Interesting...
…I knew that went back to 2003-04, but I don’t think I realized it went back to 2000. That’s great news, even though this further proves just how incredibly user-unfriendly that site is … I’ve had many arguments with that site over the years.
Box Score Data to 2001 as well.
Example: a New Mexico State-Louisville game from 2001
http://web1.ncaa.org/d1mfb/worksheet.jsp?year=2001&game=200100000036720010823.xml
Keep in mind that the NCAA didn’t officially track bowl game stats until the middle part of the last decade, so these would need to be added.
Bloggin' at JoePasDoghouse.com
Sheesh.
Is this a new development? Probably not, but … I noticed the “plug a different year into the URL” trick a while back and somehow didn’t end up coming up with this…
Like you said, it's not user friendly.
You have to hit the correct buttons in the right sequence to get stuff you were trying to find.
Bloggin' at JoePasDoghouse.com
As for the picture.
It makes sense because:
- 1994 would be a great benchmark since it was the start of the 85 scholarship limit.
- The 1994 Nittany Lions weren’t on your S&P+ Top 100 even though they are anecdotally considered to be one of the top teams of the past 20 years. By factoring out garbage time scores you may get a better idea how this and other recent teams may have stacked up.
Bloggin' at JoePasDoghouse.com
I caught a little hell for that one when the countdown came out last summer...
…and yeah, the general idea was that garbage time dinged them quite a bit. Using only points (which is all we have for that year), they were only excellent and not best ever. But hey, they still finished ahead of 1976 Pitt, so they had that going for them… :-)
Unless it's just me
http://espn.go.com/ncf/playbyplay?gameId=222350275
It appears that ESPN.com has play-by-play going back to 2002. I’m a Fresno State fan, so I just ran a quick search and the above link was one of the results.
Please please please help Red Wave Central become the official Fresno State Bulldogs blog of SB Nation! If you're a Bulldog diehard who wants to help, check here for contact information.
I view ESPN's pbp as a last-resort option...
Their special teams data is often odd (lots of minus-15 yard kickoffs, things like that), and there’s no tackler data. It’s obviously better than nothing, but the ‘official’ pbp’s are still the No. 1 choice.
Oh, right
I must’ve overlooked that in the original post. :|
After digging a little further, I found that the Fresno State Bulldogs football site has complete box scores and play-by-play that features the tackler data you’re looking to have. This is one example, the 2002 season opener at Wisconsin. Unfortunately, 2002 is as far back as it appears to go, but it does have tackler data for the 2002, 2003 and 2004 seasons. I don’t know if I’ll have to parse through everything myself, but I’d be happy to pass along the link if someone else wants a crack at it.
Please please please help Red Wave Central become the official Fresno State Bulldogs blog of SB Nation! If you're a Bulldog diehard who wants to help, check here for contact information.
Er
That was supposed to say “have time to parse through everything myself.” My mistake.
Please please please help Red Wave Central become the official Fresno State Bulldogs blog of SB Nation! If you're a Bulldog diehard who wants to help, check here for contact information.
Thanks for the spreadsheets too!
I probably won’t post anything for a few weeks on this but I have a lot to digest.
One thing, do you have a graph somewhere so people understand point value of every yard line? That would help explain PPP+.
Bloggin' at JoePasDoghouse.com

by 








