Normal Distribution, Part 2

Editor’s Note: In this section, I’ll break down some of the key aspects in probability theory that shape the basis for this website. Here, I look at a wrinkle involved with basic concept behind normal distribution. Please note these explanations won’t be 100% up to mathematical textbook standards, simply because these explanations need to be shaped into a sports context. If there are any concerns or criticisms about the process of applying probability theory into a sports context, please contact me at tabmathletics@gmail.com.

Now that I described how normal distribution plays a huge role in statistical analysis, it’s very important to note how football analysis plays a much more unique role in probability theory than other sports. Strangely enough, it has to do with a lack of normal distribution.

Let me explain to avoid the confusion. Because an NFL regular season is only 16 games, and a college football season at most 14 games, normal distribution isn’t as likely to occur in these statistical sample sets, per the Central Limit Theorem. Therefore, when analyzing season statistics, one cannot assume that stats are nearly as reliable as in other sports. Consider the graph below, as illustrated by Mathematics Illuminated:

In the case of this website, I hope to use the proper amount of rows to help illustrate normal distribution. That’ll come with intuition and experience from analyzing stats for Cold, Hard Football Facts or my own personal work. However, I cannot influence how many trials (or in this case, the total number of games) occur to create the sample set. That comes with the events from a season in itself.

With a sample size so low, someone would need to accomplish tests to determine normal distribution. However, in the cases like the one illustrated in the bottom left example, the frequency of events on the left tail of the graph distorts the overall outcome. Therefore, we cannot rely as much on these totals, as compared to the normalized totals. However, there’s good that comes with it, as we can use this to our advantage in statistical analysis.

With a lower sample sizes, and therefore less normal distribution, outliers are much more common in each football season. (Meanwhile, when multiple seasons are considered, there will be some sort of normal distribution that allows us to see which stat totals project to be outliers. It’s true, we may not know for certain what the outliers are unless we did the actual mathematical tests. However, I’m at the learning stages in this research, so we’ll save the specific numbers for another time and just get a general, yet significantly accurate picture illustrated.)

Therefore, most of the regression analysis will involve the NFL. I’m not trying to pick favorites, but it’s the one sport with the most friendly numbers for this study and the one league with the most consistency. In college football, the gimmicky offensive systems can skew most of those stats and make those sample sets unreliable.

Later on as we go, I will explain how certain stats that are inherently less reliable, due to sample size and other factors. However, for now, we move on in this series to explain how probability density plays a key role in situations when normal distribution applies.