I am wondering about the choice of 50% and 70% thresholds used to define a successful play in the formula for Success Rate.
Using a threshold percentage definition requires scale invariance - a 5-yard gain on 1st-and-10 should be equivalent to a 10-yard gain on 1st-and-20. The problem is that success does not appear to be scale-invariant, at least in college football.
I defined "success" as "getting another first down, or a touchdown, before turning the ball over (via int/fumble/downs/safety/etc.), punting, or attempting a field goal." I found that when looking at the past several years of play-by-play data via cfbstats, the plot of P(success) vs. YTG for each of 1st/2nd/3rd/4th down was best fit by a (scale-variant) exponential function, but was not well fit by a (scale-invariant) power function.
Basically, I am wondering how I could get results that differ from "convention." The possibilities I came up with are:
1) I did not adjust for "garbage time," end-of-half, red zone, and other plays that artificially depress P(success), thus changing the shape of the curves
2) I am using a different definition of success than that used in Success Rate
3) P(success) curves are scale-invariant in NFL but not college football, so one of the assumptions made in converting Success Rate from an NFL stat to a CFB stat is invalid, but it works, so nobody cares
4) Success Rate is inherently scale-variant, but the current definition is relatively computationally simple and only slightly sacrifices accuracy, so the trade-off is worth it
5) Success Rate is inherently scale-variant, so one of the key assumptions of Success Rate is invalid, but it works, so nobody cares
6) I screwed up somewhere (in combining the csv files, in the R code, etc.)
Do any of the regulars here (or any other viewers) have any suspicions which of those (or something I didn't think of) is most likely? I suspect #2, but have no idea whether that's a better or worse definition of success than what's out there, or how to really use this new definition to come up with a new efficiency measure.