That’s a question every psychology student has asked at one time or another! Well, I’ll tell you.

In order to understand Bonferroni, there is some prerequisite knowledge you need to possess. You need to understand what null hypothesis significance testing is, p values, and Type I/Type II errors. If you understand these things, read on. If not, read on also, but this will make less sense to you (I haven’t yet covered these things on this blog, so you’ll have to do some Googling, or buy my book where it is all explained in the absolute best way that is humanly possible. Ahem).

You’re still with me! That’s good. I wonder what percentage of readers have already pressed the back button? Hmm.

So, Bonferroni correction. You know that with a p value set at .05 were looking for a less than 5% chance of getting our result (or greater) by chance, assuming the null hypothesis is true. 5% is an arbitrary significance level (or ‘alpha’); not too high that we’re making too many Type I errors (assuming an effect where there isn’t one), but not too low that were making too many Type II errors (assuming there isn’t an effect where there is one).

Imagine that we did 20 studies, and in each one we got a p value of exactly .05. A 5% chance of a fluke result over 20 studies means it’s odds on that one of these results really was a fluke. Now think about how many thousands of studies have been done over the years! This demonstrates the importance of replicating studies – fluke findings have definitely happened and will continue to happen.

However this situation isn’t limited to findings spread over multiple papers. Sometimes in larger papers with several studies and/or analyses rolled into one, you might get a similar predicament. Simply, the more tests you do in a paper, the more chance there is that one of them will have come about through pure chance.

This would be a bad thing – a theory that is modified as a result of an incorrect finding would, of course, be a weaker reflection of reality, and any decisions that were made based on that theory (academic or not) would also be weaker.

So, we need a way to play a little safer when doing multiple tests and comparisons, and we do this by changing the alpha – we look for lower p values than we normally would before we’re happy to say that something is statistically significant.

This is what Bonferroni correction does – alters the alpha. You simply divide .05 by the number of tests that you’re doing, and go by that. If you’re doing five tests, you look for .05 / 5 = .01. If you’re doing 24 tests, you look for .05 / 24 = 0.002.

Bonferroni correction might strike you as a little conservative – and it is. It is quite a strict measure to take, and although it does quite a good job of protecting you from Type I errors, it leaves you a little more vulnerable to Type II errors. Again, this is yet another reason that studies need to be replicated.

There you go! An answer to an age-old question. Up next; does the light in the fridge stay on when the door is closed??

Dear Warren

Your post is commendable to say the least. You are my hero when it comes to Bonferroni corrections. Very easy, very lucid, and very accessible explanation.

Thank you very much.

Best and quirkiest explanation ever. I love it! Thanks for helping me with my exam prep!

Thank you- by far, the simplest explanation and the easiest to understand. Love statistics!

“5% is an arbitrary significance level (or ‘alpha’); not too high that we’re making too many Type I errors (assuming an effect where there isn’t one), but not too low that were making too many Type II errors (assuming there isn’t an effect where there is one)”

Concerning type 1 and 2 errors, it’s the opposite, no?

Nope. If alpha is higher, you’ll tend to make more type ones, if alpha is lower, you’ll tend to make more type twos.

Great post!

One thing that I’ve been unclear on with Bonferronis is what counts as “multiple tests”. If I am running 20 tests in my study, but 10 of those tests are to do with one explanatory variable (weight, levels 1-3 for example), and 10 are to do with another explanatory variable, where do I apply the correction? Do I apply a correction with n=20 to all values? or a correction of n=10 to each explanatory variable? or some other correction taking into account the 3 levels of each test?

This was a really good explanation of WHAT a bonferroni correction is. It would be really great to see an expansion in another post (or in this one) about how the correction can be applied 🙂 and really helpful!

Dean

That’s a really good question, and statistically speaking it shouldn’t matter. If you do 20 studies of different topics, doing one test per papers, and get p = .05 for each of them, statistically it’s odds on that one of them is a false positive. However, you’ll note that research reports don’t tend to do this, and that’s because there are arguments against it.

I have never come across a hard and fast rule for this and I don’t believe there is one, so I can’t answer your question, but I can ramble on for a bit.

In my opinion, certain tests in a paper aren’t a problem. One example is, they might do a t test on the ages between groups, or do a few bivariate correlations of demographics against the DV but not include this when correcting p values. That’s fair enough in my book, since with these tests you’re just looking for large differences that might upset the results, rather than trying to protect yourself against false positives. So when I have my devil hat on and I’m looking for flaws in a paper, I don’t worry about that.

Note also that Bonferroni also increases your chance of making a Type II error, so it has its own pitfalls too.

I’ve noticed that in many papers the researchers have adjusted for multiple comparisons on each separate test. So say you have 2 ANOVAs with 10 follow up tests each, what you’d probably see is an adjustment of 0.05 / 10 for each set of tests, rather than 0.05 / 20 for all.

Some argue that planned comparisons don’t need to be adjusted, some say they aren’t needed when the previous research strongly points to their being an effect, some say a different method of adjusting would be preferred in cases like that, some say in certain types of research adjustment is less necessary than others, and on and on and on.

It’s more important, I think to be aware of the consequences of adjusting and not adjusting, be cautious about drawing conclusions from results that haven’t been corrected – especially when there have been no replications – and keep your skeptical hat on.

Hope this helps even a little bit. 🙂

http://www.jerrydallal.com/LHSP/mc.htm

http://www.ncbi.nlm.nih.gov/pubmed/2081237

Just double checking,

If I am studying the performance of the same group of people using 4 different procedures (A,B,C,D), and I have to compare the performance, what’s the number of tests and final p value?

Tests: AB, AC, AD, BC, BD and CD (6) tests?

therefore, p after Bonferroni would be 0.05/6=0.0087?

Hamilton.

I make it 0.0083, but yep, that’s right.

If you’re doing multiple pairwise comparisons, you should be using ANOVA and Tukey’s test, rather than many individuals tests with Bonferonni’s correction.

hmm. so i’m doing a gene-gene vs. gene-environment analysis. i have 3 environments and 10 genes to look at with only one dependent variable (y) and i plan to do a regression of each on Y (i think that what i’m supposed to be doing), does that mean my bonferroni calculation would be .05/13 ?

If you’re doing 13 separate regressions, yes.

i measuring 16 different items in 3 groups of people using anova.will the bonferroni correction be 0.003.how do i report the analysis done.can i use the term bonferroni corrected anova test

if you’re doing 16 separate tests, yes, 0.003125.

But are you doing follow up comparisons? Or just 16 ANOVAs?

Personally I would say “x tests were conducted with alpha adjusted to x for multiple comparisons.”

yes i am doin a follow up comparison for the same items so is the bonferroni correction 0.05/16 0r 0.05/32?

0.05/16

One quick question. I have used a ch-square test to compare use of different habitats (9 habitats). Is the correction made for the alpha value for the original test or to each pair-wise comparison? Or both? I assume that the correction would be 0.05/9?

Thanks

I’ll be honest with you Rachel, I haven’t used a Chi Square since I was sat in Research Methods classes 6-7 years ago, but yes you’d divide your alpha by the number of pairwise comparisons you were doing.

Hi this is probably a studpid question but I’m still lost at this Bonferroni deal.

Lets say I have two treatments A and B and then I do tests on 8 variables in between A and B. Is the bonferroni adjustment then 0,05/8?

And then let’s say I take A and look within this group in the same eight + 4 new variables. Is the adjustment then 0,05/12 or is it now 0,05/(12+8)? :S

And if I then divide the A into a subgroup of AI and BI and run X tests – do I then have to divide the alpha by (8+12+X)? :S

Would so much love a reply because Altman still confuses the hell out of me! 🙁

Sincerely,

Anna

great article – have you got one on Odds ratios and apologies if I have missed it

Hi,

Just an critical question for me please. I run ANOVA for repeated measures on one variable for 3 factors which each one has 2 levels. Does i means that my alpha should dropped to 0.05/6=0.008?

I found significant p value for one factor. What is the next step? Should I run 6 paired t test? Or another ANOVA tests? How the ANOVA tests should be conducted?

your reply will save me from the hell of ANOVA, please.

Thanks

Marzi