That’s a question every psychology student has asked at one time or another! Well, I’ll tell you.
In order to understand Bonferroni, there is some prerequisite knowledge you need to possess. You need to understand what null hypothesis significance testing is, p values, and Type I/Type II errors. If you understand these things, read on. If not, read on also, but this will make less sense to you (I haven’t yet covered these things on this blog, so you’ll have to do some Googling, or buy my book where it is all explained in the absolute best way that is humanly possible. Ahem).
You’re still with me! That’s good. I wonder what percentage of readers have already pressed the back button? Hmm.
So, Bonferroni correction. You know that with a p value set at .05 were looking for a less than 5% chance of getting our result (or greater) by chance, assuming the null hypothesis is true. 5% is an arbitrary significance level (or ‘alpha’); not too high that we’re making too many Type I errors (assuming an effect where there isn’t one), but not too low that were making too many Type II errors (assuming there isn’t an effect where there is one).
Imagine that we did 20 studies, and in each one we got a p value of exactly .05. A 5% chance of a fluke result over 20 studies means it’s odds on that one of these results really was a fluke. Now think about how many thousands of studies have been done over the years! This demonstrates the importance of replicating studies – fluke findings have definitely happened and will continue to happen.
However this situation isn’t limited to findings spread over multiple papers. Sometimes in larger papers with several studies and/or analyses rolled into one, you might get a similar predicament. Simply, the more tests you do in a paper, the more chance there is that one of them will have come about through pure chance.
This would be a bad thing – a theory that is modified as a result of an incorrect finding would, of course, be a weaker reflection of reality, and any decisions that were made based on that theory (academic or not) would also be weaker.
So, we need a way to play a little safer when doing multiple tests and comparisons, and we do this by changing the alpha – we look for lower p values than we normally would before we’re happy to say that something is statistically significant.
This is what Bonferroni correction does – alters the alpha. You simply divide .05 by the number of tests that you’re doing, and go by that. If you’re doing five tests, you look for .05 / 5 = .01. If you’re doing 24 tests, you look for .05 / 24 = 0.002.
Bonferroni correction might strike you as a little conservative – and it is. It is quite a strict measure to take, and although it does quite a good job of protecting you from Type I errors, it leaves you a little more vulnerable to Type II errors. Again, this is yet another reason that studies need to be replicated.
There you go! An answer to an age-old question. Up next; does the light in the fridge stay on when the door is closed??