ncs logo


Buzzwords & Bullsh!t: Statistical Significance photo
share icon

Buzzwords & Bullsh!t: Statistical Significance

Isaac Freitas, 11/16/18

What’s the difference between the right amount of ketchup on a burger and too much or too little? One might think they could just eyeball what is right, but when it comes to large quantities of burgers, a specific measurable process would be needed to define what is or is not abnormal.  For those of us not making hamburgers, the same principles apply to figuring out whether a difference in any two or more groups is statistically significant.  These principles can be applied in any industry including the adult beverage industry where the origins of statistical significance found root.

To understand statistical significance, one needs to understand the origins of Student’s t-distribution and the t-statistic. William Sealy Gosset was a young statistician working at the Guinness brewery in Dublin in the early 20th Century (link). Gosset figured out that determining the resin content in small sample batches of hops could be used to infer the resin content of the whole batch. By taking just a few samples, he eventually came up with the t-statistic which allow for the comparison of sample means and population means.  The t-statistic takes into account how greater accuracy is gained from using increased numbers of observations in samples. Guinness was worried about giving up proprietary trade secrets, but eventually allowed Gosset to publish his work on the t-statistic pseudonymously as “Student’s t- distribution” regularly found in introductory statistics as tables to be used with the t-test.

Student’s t-distribution inspired other statisticians like R.A. Fisher who later came up with several variations and started the infamous obsession with setting the measurement of significance at “0.05”. Statistical significance measures, generally noted as p-values as they represent proportion of the area under the distribution’s curve, are noted as decimals between zero and one. If comparing two samples to see whether there is a significant difference, larger p-values indicate that there does not appear to be a great difference between the two samples. A small p-value indicates that a significant difference is more likely. Fisher took the t-distribution tables and adapted them. He chose to note the thresholds for significance with t-statistics by designating how many observations in samples are needed with which t-statistic score to reach 0.05 and 0.01 p-value levels for significance. His choices have defined the minimum basic standards for detailing statistical significance for the t-test and other significance tests developed later. The choice of 0.05 as the minimal level was made arbitrarily, and has caused headaches due to inappropriate adoption of this standard in areas where a stricter standard for significance matters.

Modern statisticians have developed numerous additional measures to apply statistical significance tests to find p-values from the venerable t-test to ANOVA tests and to the many forms of regression. Students are often taught and told the basic techniques. It’s no surprise that many of them go into the world naively applying these standards when lesser or greater levels may be more appropriate.

While the arbitrary selection of 0.05 and 0.01 as common thresholds below which a p-value indicates statistical signification is random, researchers need to be confident they are providing good guidance with their statistics. With greater familiarity in their field, economists, sociologists, health researchers, and other quantitative data analysts can recognize the tradeoff of greater accuracy versus the expediency and efficiency of smaller sample sizes. An expert in any given field can be expected to know and implement best practices. So next time you’re at the bar, raise a glass of Guinness that inspired the revolutionary idea of statistical significance.