Previous notes in this series have been concerned with the common situation in ophthalmic and other clinical fields of describing relationships between one or more ‘predictors’ (explanatory variables) and, usually, one outcome measure (response variable). A classic method used in deriving relationships between outcomes and predictors is linear regression analysis. Linear regression is a member of a family of techniques known as general linear models, which also include analysis of variance and analysis of covariance; the latter of which was covered in a previous Ophthalmic Statistics Note.

A key feature of all these models is that the outcome measure—for example, postoperative refractive prediction error or intraocular pressure—is continuous. While other notes in the series

Example 1: A study was conducted on 137 patients to identify risk factors for intraoperative retinal breaks caused by induction of a posterior hyaloid face separation during 23-gauge pars plana vitrectomy.

Example 2: A study was conducted on 58 patients undergoing surgery for idiopathic macular hole identifying whether or not a patient develops an outer foveal defect (OFD).

In both examples, our objective is to examine relationships between a single outcome variable and several predictors. Typically, when faced with this challenge, we would use linear regression. Linear regression, however, requires a continuous outcome and thus if we were to use this method we would be violating a statistical assumption. In our last statistical note, we introduced the concept of transforming data in order to conduct valid statistical analyses. Focus in that note was on transformations of the explanatory or independent variables. It is, however, also possible to conduct transformations on outcomes so that while the outcome itself is not continuous, a transformation based upon that outcome is. We can then apply regression in the same manner we are accustomed to and identify associations between outcomes and risk factors, acknowledging that our associations actually relate to the transformation. As was the case in our previous note, the challenge, therefore, is in the interpretation of results after application of the transformation.

The transformation that we use to achieve this is called logistic regression. We assign our outcome variable numerical values of 1 and 0, representing yes and no, respectively. If we had 10 subjects and 5 had breaks and 5 did not, we would say intuitively that the probability of an event (p) was 5/10—the proportion of our group who had the event of interest. In logistic regression, our outcome of interest is based on this probability. However, probabilities are bounded by 0 and 1, where 0 indicates impossible and 1 indicates certainty. It, just like our original outcome, is not therefore normally distributed. A transformation of probability, known as the logit transformation, is not, however, constrained by bounds of 0 and 1 and logistic regression may then be used to explore associations between the covariates of interest and our logit transformation, where

While this transformation may appear unintuitive, it should be noted that the quantity

Logistic regression was used in a study

The estimated probability of anatomical success can then be calculated, so that for a patient with a macular hole inner opening of 650 μm, the logit of p is given by

Logits have no direct interpretation, and so to interpret this equation in a useful predictive sense, we need to ‘undo’ the logistic transformation. This can be achieved in two steps. First, the odds of the event are calculated by exponentiating or ‘antilogging’ the regression function:

Next, a bit of simple algebra is used to convert these odds to a probability:

So, preoperatively, our patient is predicted to have a 62% chance of anatomical success. This procedure (exponentiation and algebra) would not normally be the responsibility of the researcher: most statistical packages will routinely perform these transformations as part of their logistic regression function. In fact, unlike simple linear regression, in which parameters may be estimated using the least-squares method, it is not generally practical to conduct logistic regression, in which parameters are generally estimated using other means, by hand: computer software is usually required.

Assessing the effect of a covariate also requires us to undo the logistic transformation. The computer output (slightly edited) summarising the model above (

Computer output from macular hole study (edited)

95% CI for OR | ||||||||
---|---|---|---|---|---|---|---|---|

B | SE | Wald | df | p Value | OR | Lower | Upper | |

Macular hole inner opening | −1.637 | 0.539 | 9.214 | 1 | 0.002 | 0.195 | 0.068 | 0.560 |

Constant | 10.890 | 3.293 | 10.938 | 1 | 0.001 | 53647.735 |

The OR for a particular parameter is not the same as the risk ratio (relative risk), although for rare events it is a reasonable approximation. Although it is not as intuitive as the risk ratio, it possesses certain advantages; for instance, it is not constrained by large baseline risks. The relationship between odds and risk ratios, and other quantities such as prevalence and exposure rates, may be found in many standard texts, for example.

The estimation of the OR may be considered to be the back-transformation of the results into the original data units. In this example, we see that an increase of 100 μm in macular hole inner opening leads to a significant reduction (p=0.002) in odds of anatomical success of 80.5% (calculated by multiplying 1–0.195 by 100). The associated CI for the OR (0.068 to 0.560) confirms that this reduction is statistically significant as it excludes the value 1.00, which corresponds to no effect. We can ignore the line of the output for the constant: these statistics have little practical value.

Mathematical functions (transformations) may be applied to outcome (explanatory) variables.

Studies exploring relationships between one or several predictor variables and a dichotomous outcome typically make use of one such transformation the logit in a technique known as logistic regression.

Logistic regression typically yields ORs with 95% CIs. An OR of 1 corresponds to no association with the predictor variable and so a CI excluding 1 is evidence of association.

JS drafted the paper. CB, CJD and NF critically reviewed and revised the paper. JS and CB redrafted the paper after review. JS, CB and CJD critically reviewed the redraft.

CB is partly funded by the National Institute of Health Research (NIHR) Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology.

None declared.

Not commissioned; externally peer reviewed.