In clinical trials, continuous outcomes, such as intraocular pressure and visual acuity, are often measured both before treatment (ie, at baseline) and after treatment. Having the baseline measurement allows us to account for the initial differences between patients, which may well have arisen by chance, when comparing the outcomes of alternative treatments. While randomisation provides a good basis for comparisons between treatments and lowers the probability of baseline imbalance, imbalances can occur simply due to the play of chance, particularly when modest numbers of patients are randomised.

Whether or not baseline measurements are accounted for in the analysis may have an impact on the results of a trial. Imagine a randomised controlled trial with participants randomised to receive either treatment A or treatment B. The primary outcome for the trial is Best Corrected Visual Acuity (BCVA) in the eye with the poorer vision at recruitment, as measured by the number of letters read on an Early Treatment for Diabetic Retinopathy Study EDTRS chart at a distance of 4 m. For the purpose of this scenario, we are assuming that only data from this eye contributes to the analysis. The mean BCVA after treatment is higher in the group allocated treatment B than the group allocated treatment A. However, by chance, the mean BCVA before the start of treatment was also higher for those allocated treatment B. Therefore, although the mean values after treatment suggest that treatment B is better, there may have been as much, or indeed more, improvement with treatment A. This is something that needs to be considered when deciding how the data will be analysed. Decisions about how the data will be analysed should be made before looking at the data (so that they are not influenced by the results) and documented in a Statistical Analysis Plan.

This leaves us with the question: what is the preferred way to analyse data of this nature?

There are three common approaches to the analysis of clinical trial data when we have both baseline and post-treatment values, namely:

Using a linear regression model, fitting baseline measurement as a patient-level explanatory variable. This method of analysis is known as analysis of covariance (ANCOVA).

Analysing post-treatment values only (ie, ignoring baseline measurements).

Analysing change scores (ie, the difference between the post-treatment measurement and baseline measurement for each participant).

The merits of ANCOVA and its advantages over the other two approaches are discussed in detail by Vickers and Altman.

When carrying out ANCOVA, a regression model is fitted to the data of the form:

The variable ‘treatment B’ takes the value of 0 if the participant was allocated treatment A, and 1 if the participant was allocated treatment B. Using a statistical package, we can obtain estimates of a, b_{1}, b_{2} and e. The estimate a is a constant, and b_{1} quantifies the size of the treatment effect (ie, the mean difference between treatments A and B). The baseline measurement is termed a covariate, and e is an error term. The value of the estimate b_{1} depends on the baseline measurements (and the coefficient b_{2}), and hence we say that our estimate, b_{1}, ‘

The post-treatment measurements or change scores (methods (

The estimate, b_{1}, corresponds to the difference in mean BCVA after treatment between the groups. The baseline measurement is not included, which is equivalent to setting the estimate b_{2} in model (

Looking then at change scores, the model would be:

An analysis of change scores is equivalent to setting the estimate b_{2} in model (

Data from the Inhibition of VEGF in Age-related choroidal Neovascularisation (IVAN) randomised controlled trial comparing ranibizumab (Lucentis) and bevacizumab (Avastin) for the treatment of age-related choroidal neovascularisation

Mean values at baseline and post-treatment (24 months) (number of letters on ETDRS chart)

Variable | Ranibizumab (n=268) | Bevacizumab (n=249) |
---|---|---|

BCVA at baseline | 62.9 (14.6) | 62.0 (15.3) |

BCVA at 24 months | 67.8 (17.0) | 66.1 (18.4) |

Change from baseline | 4.9 (15.0) | 4.1 (13.5) |

BCVA, Best Corrected Visual Acuity.

Analysis results

Model | Number of patients included in the analysis* | Treatment difference†‡ | 95% CI | SE |
---|---|---|---|---|

Number of ETDRS letters | ||||

Method (2), model 2: Post-treatment values only | 517 | −1.7 | (−4.8 to 1.4) | 1.55 |

Method (3), model 3: Change from baseline | 517 | −0.8 | (−3.3 to 1.6) | 1.25 |

Method (1), model 1: Post-treatment values with baseline value as a covariate | 517 | −1.1 | (−3.4 to 1.3) | 1.21 |

*Although 525 patients reached the 24-month visit in the IVAN trial, BCVA data are missing for 8 of these patients.

†Difference in mean BCVA (bevacizumab—ranibizumab).

‡The results presented differ from the published results.

BCVA, Best Corrected Visual Acuity.

Looking at post-treatment values only, we do not take into account the higher baseline values in the ranibizumab group, and we therefore potentially overestimate the differences between the treatments (or had the chance imbalance been in the opposite direction, we would potentially underestimate the differences). From the results of this analysis, we would say that at the end of the trial, the average BCVA was 1.7 letters higher in the ranibizumab group compared with the bevacizumab group. The 95% CI tells us that we are 95% confident that the difference in mean BCVA is somewhere between 4.8 letters in favour of ranibizumab and 1.4 letters in favour of bevacizumab. As this CI includes zero, we would not infer a statistically significant difference in mean BCVA at the end of the trial between the two drugs.

The analysis of change scores provides us with an estimate of the difference in the mean change from baseline between the two treatment groups. Here, we are taking account of how good the vision was at the start of the trial (through the calculation of the change score), but not of differences in starting BCVA between the two groups. From the results, we find that the mean increase in BCVA was 0.8 letters larger in the ranibizumab group, with a 95% CI from 3.3 letters in favour of ranibizumab to 1.6 letters in favour of bevacizumab. Again, the CI includes zero, so a difference between the groups is not indicated.

This is the preferred method of analysis. Here, we are estimating the difference in the mean BCVA between the two groups, again taking account of how good the participant's vision was at the start of the trial, but relaxing the restriction on the relationship between baseline and post-treatment measurements. From the results, we would conclude that BCVA improved by an estimated 1.1 letters more, on average, in the ranibizumab group than in the bevacizumab group, with a 95% CI from 3.4 letters in favour of ranibizumab to 1.3 letters in favour of bevacizumab. As with the other two models, there is no suggestion of a difference between the groups because the CI includes zero. The estimated relationship between the baseline and post-treatment measurements for each drug is illustrated in

Baseline and post-treatment Best Corrected Visual Acuity (BCVA) for the subset of patients with a post-treatment BCVA of more than 50 letters (n=425 patients). The estimated difference in mean BCVA between the two drugs groups from the analysis of covariance is the vertical distance between the two regression lines shown on the plot.

In this example, all three methods led to the same conclusion, namely that mean BCVA at 24 months was similar between the two drugs. However, the change score analysis which made use of both baseline and post-treatment measurements gave more precise estimates (as shown by smaller SE and narrower CI) than the analysis which just considered the post-treatment measures. Further efficiency was then gained using the more flexible ANCOVA model compared to the analysis of change scores (SE 1.21 vs 1.25 letters, 95% CI (−3.4 to 1.3) vs (−3.3 to 1.6)).

While in this example all three methods led to the same conclusion, it is possible for different models to yield estimates that might lead to different conclusions. If, for example, the more precise estimate had had a CI which excluded zero, while the less precise estimates did not, we might infer evidence of a treatment effect from one model only. Some statisticians feel so strongly about the use of ANCOVA that they describe other methods as a hallmark of second-rate analysis!

Another approach to the analysis which we would

An additional consideration that has not been explored here is how to handle missing data. The methods described would exclude any participant with missing data for any of the measurements included in the analysis. While every effort should be made to prevent missing data by study design and management, missing values can and do occur. For example, in the IVAN trial, participants underwent optical coherence tomography to assess retinal thickness and other lesion morphology, but on occasions, the machine was not working and the measurements were not taken. There are different ways in which missing data can be handled and you should consult a statistician on the best way to proceed. Approaches include omitting the cases with missing data, which is an inefficient use of the data, reducing precision and power; imputing the missing values, which must be done with care

In summary, while there are different methods available for analysing trial data with baseline and post-treatment measurements, the recommended approach is ANCOVA for the reasons outlined. The choice of analysis method can impact the results of a trial, and therefore, it is important to choose the most appropriate method in advance to ensure precise and unbiased conclusions.

The Ophthalmic Statistics Group: David Cairns, Valentina Cipriani, Jonathan Cook, David Crabb, Phillippa Cumberland, Gabriela Czanner, Paul Donachie, Andrew Elders, Marta Garcia Finana, Neil O'Leary, Krishna Patel, Toby Prevost, Ana Quartilho, Luke Saunders, Selvaraj Sivasubramaniam, Simon Skene, Irene Stratton, Joana Vasconcelos, Wen Xing, Haogang Zhu.

RN and CAR designed and drafted the paper. RN, CAR, CB, NF and CJD reviewed and revised the paper.

RN is funded by a National Institute for Health Research (NIHR) Research Methods Fellowship. The post of CAR is funded by the British Heart Foundation (BHF). The post of CB is partly funded by the NIHR Biomedical Research Centre based at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology. The views expressed in this article are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

Commissioned; internally peer reviewed.