TY - JOUR
T1 - Ophthalmic statistics note 7: multiple hypothesis testing—to adjust or not to adjust
JF - British Journal of Ophthalmology
JO - Br J Ophthalmol
SP - 1155
LP - 1157
DO - 10.1136/bjophthalmol-2015-306784
VL - 99
IS - 9
AU - Cipriani, Valentina
AU - Quartilho, Ana
AU - Bunce, Catey
AU - Freemantle, Nick
AU - Doré, Caroline J
Y1 - 2015/09/01
UR - http://bjo.bmj.com/content/99/9/1155.abstract
N2 - Investigating multiple research questions, or hypotheses, within one study is a common scenario in biomedical research with many examples in ophthalmology. As the number of statistical tests increases, the overall chance that we draw an erroneous conclusion in our study gets higher in a predictable manner. Each statistical test conducted at the conventional 5% significance level (α) has a one in 20 chance (or 0.05 probability) of appearing significant simply due to chance (a type I error) and a 1−0.05=0.95 probability of being non-significant. If we test two independent true null hypotheses, the probability that neither test will be significant is 0.95×0.95=0.90. Likewise, if we test 14 independent hypotheses, the probability that none will be significant is 0.9514=0.49, and the probability that at least one will be significant is 1−0.49=0.51, that is, we are more likely than not to find at least one test significant. In other words, if we go on carrying out tests of significance we are very likely to find a spurious significant result. In the field of statistics, this phenomenon is known as the problem of multiple testing or the multiplicity problem.1Consider the ABC study which compared bevacizumab for neovascular age-related macular degeneration (nAMD) with standard National Health Service (NHS) care.2 This study was conducted on 131 patients and found that 21 (32%) of patients treated with bevacizumab gained ≥15 letters compared with two (3%) of those in the standard care group with an OR of 18.1 (95% CI 3.6 to 91.2; p<0.001). The primary objective of this study was to determine whether bevacizumab was superior to standard NHS care and this single test of significance provided strong evidence. Closer inspection of the study reveals however that a variety of different treatments were used within the NHS standard care arm …
ER -