Article Text

other Versions

PDF
Ophthalmic statistics note 7: multiple hypothesis testing—to adjust or not to adjust
  1. Valentina Cipriani1,2,
  2. Ana Quartilho1,
  3. Catey Bunce1,3,
  4. Nick Freemantle4,
  5. Caroline J Doré5
  6. on behalf of the Ophthalmic Statistics Group
    1. 1NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK
    2. 2UCL Genetics Institute, London, UK
    3. 3London School of Hygiene & Tropical Medicine, London, UK
    4. 4Department of Primary Care and Population Health, University College London, London, UK
    5. 5UCL Comprehensive Clinical Trials Unit, University College London, London, UK
    1. Correspondence to Dr Valentina Cipriani, NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, EC1V 9EL, UK; v.cipriani{at}ucl.ac.uk

    Statistics from Altmetric.com

    Defining the problem

    Investigating multiple research questions, or hypotheses, within one study is a common scenario in biomedical research with many examples in ophthalmology. As the number of statistical tests increases, the overall chance that we draw an erroneous conclusion in our study gets higher in a predictable manner. Each statistical test conducted at the conventional 5% significance level (α) has a one in 20 chance (or 0.05 probability) of appearing significant simply due to chance (a type I error) and a 1−0.05=0.95 probability of being non-significant. If we test two independent true null hypotheses, the probability that neither test will be significant is 0.95×0.95=0.90. Likewise, if we test 14 independent hypotheses, the probability that none will be significant is 0.9514=0.49, and the probability that at least one will be significant is 1−0.49=0.51, that is, we are more likely than not to find at least one test significant. In other words, if we go on carrying out tests of significance we are very likely to find a spurious significant result. In the field of statistics, this phenomenon is known as the problem of multiple testing or the multiplicity problem.1

    Consider the ABC study which compared bevacizumab for neovascular age-related macular degeneration (nAMD) with standard National Health Service (NHS) care.2 This study was conducted on 131 patients and found that 21 (32%) of patients treated with bevacizumab gained ≥15 letters compared with two (3%) of those in the standard care group with an OR of 18.1 (95% CI 3.6 to 91.2; p<0.001). The primary objective of this study was to determine whether bevacizumab was superior to standard NHS care and this single test of significance provided strong evidence. Closer inspection of the study reveals however that a variety of different treatments were used within the NHS standard care arm …

    View Full Text

    Request permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.