Regression to the Mean

Editor’s Note: In this section, I’ll break down some of the key aspects in probability theory that shape the basis for this website. To conclude this section, I look at how regression to the mean can be applied for player and team projections. Please note these explanations won’t be 100% up to mathematical textbook standards, simply because these explanations need to be shaped into a sports context. If there are any concerns or criticisms about the process of applying probability theory into a sports context, please contact me at tabmathletics@gmail.com.

For the fans that love reading sports news and analysis online, but don’t quite get too much into the mathematical perspectives presented by certain websites, the term “regression to the mean” may seem like an often-used yet relatively foreign term. However, after reading the previous three chapters of this section, hopefully the pieces will all fall in place as we discuss regression to the mean.

Regression to the mean will occur when the previously-established conditions are met:

  1. Using independent statistics that have particular relevance to the game, a sample set is created for a particular time frame (game, multiple game, season or multiple seasons).
  2. The sample size is large enough to calculate or reasonably assume normal distribution, per the Central Limit Theorem. Or, the sample set passes normal distribution tests.
  3. Outliers are discovered, and these outliers are located on the outer edges of the normal distribution graph. The outliers discovered with simple reasoning or by creating the graph itself with sufficient data.

Once these conditions are met, we can safely conclude that the outlying statistic(s) will regress to the mean in the next time frame. For example, if it’s determined that Quarterback X’s 45 touchdown passes in a single NFL season is a statistical outlier, then Quarterback X will very likely throw fewer than 45 touchdown passes the next season. Sure, it’s not a lock that will happen, but it’s a near lock. (Remember, we’re either calculating or reasonably assuming very low odds for throwing 45 touchdown passes, based on evaluating the entire NFL statistical history for touchdown passes. The odds do NOT equal zero percent, so there’s NO “lock” per say.)

RegressionRegression to the mean can be a very useful tool for projecting how a team or player will fare in the future, as long as the statistics are correctly manipulated through normal distribution and the time frame used for the study is reasonable for analysis. For example, it’s more reasonable to conclude that Quarterback X won’t throw 45 touchdown passes in the next season than to conclude that Quarterback X will never throw 45 touchdown passes again. The conditions for throwing touchdowns may change over time, and the odds will increase over multiple years if the original study was made to account for only one season.

Throughout this site, you will see numerous conclusions based on regression to the mean. While you’re at it, you can even try some on your own to see how it works! Hopefully, in the near future, the probability theory used in this website will open up new avenues for sports analysis, especially when it comes to making statistical projections.