Continuous variables (such as intraocular pressure (IOP), visual acuity, contrast sensitivity) are commonly measured in clinical ophthalmology and vision research. In clinical practice, a ‘status’ (category) can sometimes be assigned to an individual patient using a cutpoint in the value of a continuous variable; for example, a diagnosis of glaucoma might be confirmed by an elevated IOP measurement (eg, IOP >21 mm Hg). Indeed much of medicine revolves around an implicit classification of individuals into diseased and non-diseased. In clinical research, continuous variables may likewise be converted to categorical variables, assigning individuals to one of two groups. Although this may be appropriate for some specific studies where the underlying distribution of the variable shows a clear grouping, such dichotomisation has several drawbacks.

Dichotomisation may be driven by the research question, for example, a study to investigate the health service needs of those with low vision, in which dichotomisation uses WHO visual acuity threshold for low vision.^{2} test) and the presentation and interpretation of data. However, this simplification has a cost in terms of loss of information

Dichotomisation results, first, in the loss of descriptive information on the study population. For example, the nature and extent of differences between individuals with low vision is lost when visual acuity is dichotomised as having/not having low vision. Second, with dichotomisation there is loss of information on between-subject variability in the study population as, for instance, subjects with similar outcome measures but on either side of the threshold will be described and analysed as different while two subjects with values that are on the same side of the threshold, but one near and another a long way from the threshold, will be treated as if they are the same. In addition, it is not possible to quantify linear relationships after dichotomising a variable (eg, it is not possible to quantify the change in mm Hg of IOP per mm Hg of systolic blood pressure (SBP) increase if IOP has been dichotomised), and any non-linear relationship would be masked by dichotomisation.

There may also be a loss of statistical power (the probability of detecting a true effect of a particular size should it exist) associated with dichotomisation. To maintain statistical power equivalent to that for continuous data, dichotomised data require an increase in sample size. _{o}) and (b) IOP dichotomised using three different cutpoints (ie, different values for the threshold of IOP that defines the two IOP categories; sample size denoted as n_{d}). Sample size values were calculated using the sample size formulae available for the correlation coefficient_{d} and the reduction in power for scenario (b) shown in

Impact upon power and required sample size due to dichotomisation

Power to detect association (%) | IOP as a continuous variable n_{o} | IOP as a binary variable | ||
---|---|---|---|---|

Cutpoint (mm Hg) | n_{d} | Power if n=n_{o}* (%) | ||

90 | 119 | 14.5 | 175 | 73 |

16 | 207 | 67 | ||

13 | 212 | 67 | ||

80 | 90 | 14.5 | 133 | 61 |

16 | 161 | 55 | ||

13 | 162 | 54 |

Assumptions of the model: both IOP and SBP follow a normal distribution with means equal to 14.5 and 135 mm Hg, respectively, and SDs equal to 2.4 and 20 mm Hg, respectively.

*Statistical power if sample size=n_{o} (as for IOP as a continuous variable) and IOP is dichotomised and analysed accordingly.

IOP, intraocular pressure; SBP, systolic blood pressure.

When IOP is dichotomised, a larger sample size (n_{d}) is needed to detect a significant association while maintaining the same power as an analysis with sample size n_{o} using IOP as a continuous variable. For example, when IOP is analysed as continuous, the sample size required is 119 individuals for a power of 90%. If IOP is dichotomised using the mean as the cutpoint (14.5 mm Hg), then the sample size required to maintain 90% power increases to 175 individuals—56 additional patients. If the condition of interest is rare, this increase in the required number of patients might render a study infeasible. Alternatively, a reduction in power of at least 15% would occur if the sample size remains at n_{o}=119 and IOP was dichotomised.

In clinical research, the association observed between a risk factor and an outcome can be affected by background factors (such as age) that are associated with the risk factor while also having an influence on the outcome. These background factors are known as confounders. If confounders are present, the estimation of the association of interest between the risk factor and outcome can be biased. Clinical trials are designed to minimise the effect of confounding, with subjects being randomised to intervention or control groups to ensure the groups are balanced with regard to the background factors. However, in epidemiological and other clinical studies, estimates may be biased if the effect of the confounding variable is not properly accounted for in the analysis. If a confounder is taken into account but is dichotomised, this may remove some but not all of the effects of confounding and hence still result in bias.

For example, let us investigate if IOP is affected by whether an individual has diabetes. The existing evidence suggests that SBP is related to IOP and also diabetes and as such is a potential confounding variable for the relationship between IOP and whether or not a patient has diabetes (

Systolic blood pressure (SBP) as a potential confounder of the relationship between diabetes and intraocular pressure (IOP).

We can fit a linear regression model to estimate IOP with SBP as a continuous covariate and diabetes as a factor with two levels (Yes/No). Let us assume that the

Biased estimates of the effect of diabetes on intraocular pressure (IOP) when confounder systolic blood pressure (SBP) is dichotomised. This simulation assumes that IOP follows a normal distribution and increases on average by 0.035 mm Hg per 1 mm Hg increase in SBP. We also assume that SBP follows a normal distribution with means of 135 and 145 mm Hg for the non-diabetic and diabetic groups, respectively (A) and 135 and 155 mm Hg for the non-diabetic and diabetic groups, respectively (B). Finally, we assume that mean IOP is the same in those with and without diabetes. The estimated effect of diabetes on IOP is erroneously positively biased if SBP is dichotomised (see solid curves). To illustrate the average bias, this simulation is based on large number of individuals: 50 000 diabetic and 50 000 non-diabetic.

It is not good practice to power a study, obtain data from a number of individuals and then after completing data collection to underpower the analysis by dichotomisation, thus discarding a substantial amount of the data and information.

In summary, researchers and clinicians need to be aware of and consider the potential loss of information, decrease in statistical power and the bias that may be introduced by dichotomisation of continuous data.

The following members of the Ophthalmic Statistics Group contributed valuable comments and suggestions: Jonathan Cook, Abdel Douiri, Chris Rogers and Irene Stratton.

The Ophthalmic Statistics Group.

PMC, GC and MG-F designed and drafted the paper. CB, CJD and NF conducted an internal peer review of the paper.

None of the authors have any conflicts of interest to declare.

PMC is supported by the Ulverscroft Foundation. This work was undertaken at UCL Institute of Child Health/Great Ormond Street Hospital for children, which receives a proportion of funding from the Department of Health's NIHR Biomedical Research Centres funding scheme. Gabriela Czanner and Marta García-Fiñana are grateful to the Clinical Eye Research Centre, St. Paul's Eye Unit, Royal Liverpool and Broadgreen University Hospitals NHS Trust for supporting this work.

Commissioned; externally peer reviewed.