Article Text

Variation of clinical outcomes used in glaucoma randomised controlled trials: a systematic review
  1. Rehab Ismail1,
  2. Augusto Azuara-Blanco2,
  3. Craig R Ramsay1
  1. 1Health Services Research Unit, University of Aberdeen, Aberdeen, UK
  2. 2Centre for Vision and Vascular Science, Institute of Clinical Science, Queen's University Belfast, Belfast, UK
  1. Correspondence to Dr Rehab Ismail, Health Services Research Unit, University of Aberdeen, Health Sciences Building, Foresterhill, Aberdeen AB25 2ZD, UK; r01rai11{at}abdn.ac.uk

Abstract

Purpose In randomised clinical trials (RCTs) the selection of appropriate outcomes is crucial to the assessment of whether one intervention is better than another. The purpose of this review is to identify different clinical outcomes reported in glaucoma trials.

Methods We conducted a systematic review of glaucoma RCTs. A sample or selection of glaucoma trials were included bounded by a time frame (between 2006 and March 2012). Only studies in English language were considered. All clinical measured and reported outcomes were included. The possible variations of clinical outcomes were defined prior to data analysis. Information on reported clinical outcomes was tabulated and analysed using descriptive statistics. Other data recorded included type of intervention and glaucoma, duration of the study, defined primary outcomes, and outcomes used for sample size calculation, if nominated.

Results The search strategy identified 4323 potentially relevant abstracts. There were 315 publications retrieved, of which 233 RCTs were included. A total of 967 clinical measures were reported. There were large variations in the definitions used to describe different outcomes and their measures. Intraocular pressure was the most commonly reported outcome (used in 201 RCTs, 86%) with a total of 422 measures (44%). Safety outcomes were commonly reported in 145 RCTs (62%) whereas visual field outcomes were used in 38 RCTs (16%).

Conclusions There is a large variation in the reporting of clinical outcomes in glaucoma RCTs. This lack of standardisation may impair the ability to evaluate the evidence of glaucoma interventions.

  • Clinical Trial
  • Glaucoma

Statistics from Altmetric.com

Introduction

Randomised controlled trials (RCTs) represent the gold standard for gathering evidence about treatment effectiveness. Clinical trials seek to evaluate whether an intervention is effective and safe by comparing the effects of different alternatives on outcomes that are chosen to identify its benefits and harms. There is recognition that insufficient attention has been paid to the selection of proper outcomes to be used in clinical trials. Outcomes need to be relevant to patients, clinicians and policymakers if the research is to influence practice.1 A variety of outcomes can be selected for clinical trials, and these can be measured and reported in different ways. Problems arise if the selection, measurement and reporting of outcomes are inconsistent across clinical trials. First, important outcomes can be overlooked. If trialists are not aware of the importance of outcomes, it is likely that factors relating to the conduct of the trial, such as sample size, will determine which outcomes are measured.2 A lack of standardisation may also lead to inefficient trials that involve multiple outcomes, larger than required sample sizes and multiple interpretations of results.3 Second, meta-analysis will be difficult if different outcomes have been used. Even if the same outcomes are selected, they may be measured and analysed in different ways, and this too can impair the ability to synthesise their results. Finally, the third problem is selective outcome reporting bias (SORB), defined as the results-based selection for publication of a subset of the original measured outcome variables4; in the absence of a set of core outcomes, trialists may decide to omit certain results from the final report.2 Missing outcome data can affect systematic review in two ways. Publication bias, when a study is not published on the basis of its results, can lead to bias in the analysis of a particular outcome. However, in a published study that has been identified by the reviewer, SORB can arise if the outcome has been measured and analysed but not reported.5 The selection of appropriate outcomes in a trial, therefore, is crucial to the assessment of whether one intervention is better than another. One way to address these issues is the development of agreed standardised sets of outcomes, known as ‘core outcome sets’.6 Some work has been undertaken to identify and validate the different patient-reported outcome measures used in glaucoma trials. This previous effort provided a framework (and critique of existing tools) for selecting the core patient reported outcomes in glaucoma studies.7 However, no attempt has been undertaken on the standardisation of the clinical outcome measures in glaucoma RCTs. Therefore, as a first step towards this goal, we aim to identify the breadth of clinical outcomes reported in glaucoma RCTs.

Methods

We conducted a systematic review of RCTs in glaucoma between January 2006 and March 2012. The reason for the time frame was that we aimed to retrieve approximately 200 RCTs and we chose to start our search from the most recent RCTs. n=200 is an arbitrary number that we proposed would be sufficient to identify any variation in outcomes reported in glaucoma RCTs. We searched for RCTs in glaucoma published in English only due to resource constraints; however there were no restrictions on the population studied, applied interventions, phases of the trials, or types of glaucoma.

The following electronic databases were searched: Ovid MEDLINE (R) In-Process & Other Non-Indexed Citations and Ovid MEDLINE (R) from January 2006 to the first week of March 2012. A sensitive search strategy with controlled subject headings and text terms relating to glaucoma and randomised controlled trials was run. The abstracts were graded as ‘included’, ‘possible’ or ‘excluded’. Abstracts were excluded if they were not evaluating patients with glaucoma and/or the study design was not an RCT or not in English language. The investigators planned to identify further relevant RCTs (original studies) from references of the retrieved articles. All abstracts were reviewed independently by one reviewer (RI). Another senior investigator (AAB) reviewed approximately 10% of all abstracts (400 consecutive abstracts) to evaluate agreement. This was a pragmatic approach that would have resulted in a larger duplicate grading if the agreement had been suboptimal. To assess the eligibility of RCTs, full text copies were assessed by one reviewer (RI). A second masked senior reviewer (AAB) reviewed 10 consecutive papers.

All clinical outcomes and variations of reporting were included in the analysis. We collected information on population, duration of follow-up (months), sample size, type of intervention (medical, surgical, laser, others) and outcomes. We recorded which outcomes were considered to be primary or secondary and documented methods used to measure them, by whom, and when they were measured and analysed. Other types of outcomes such as patient-reported outcome measures (PROMs), pharmacokinetic evaluation, and economic outcomes were excluded. We agreed to contact trial authors as needed to clarify any details necessary to make a complete assessment of the study.

The data were initially abstracted on a ‘Microsoft Office Access 2007’ database and then statistics were run on an Excel database. The data were tabulated in an Excel sheet (Microsoft Corporation, Redmond, Washington DC, USA). Comprehensive pilot testing with two investigators (a clinician (RI) and a glaucoma expert (AAB)) was undertaken. As an effect, taxonomy of glaucoma outcomes was developed. Outcomes were grouped into one of eight main outcomes: intraocular pressure (IOP), visual field (VF), safety, haemodynamics, anatomical, aqueous humour (AH) dynamics, surgical related, and persistence/adherence to therapy.

Outcomes were further classified into different measures (specific measurement, specific metric or methods of aggregation). For example, for IOP, the predefined measures were as follows: IOP level, which was further subclassified into three possible measurements (table 1); IOP change from baseline, which was also subclassified according to the methodology of measurement; composite definition of success/failure using IOP as part of the definition, for which three different measures were identified; and other descriptions of IOP outcomes as in correlations/associations using IOP (table 1).

Table 1

Clinical outcome measures identified in randomised controlled trials between January 2006 and March 2012

The predefined measures for the VF were as follows: evidence of progression of the VF, which was further subdivided into different metrics according to the methodology of measurement; rate of VF progression/slope; and time to VF progression, which was also classified according to the methodology used. A similar classification of measures was used for the other clinical outcomes (see supplementary table S1 for classification). Designated primary and secondary outcomes were also identified. Descriptive statistical analysis was undertaken.

Results

The search identified 4323 potentially relevant abstracts. Of these, 238 abstracts were retrieved as ‘include’ and 56 retrieved as ‘possible’. From these abstracts 215 articles were included whereas 79 articles were excluded because either the study was not an RCT or it was an RCT but not for a glaucoma intervention. An additional 24 potential RCTs were identified from the references; 3 of these were excluded for not being an RCT. These RCTs covered the period between 1988 and 2005 for recruiting participants; for example, the Advanced Glaucoma Intervention Study (AGIS) recruiting period ranged from 1988 to 1992.8 Two hundred and thirty-three full articles were finally included for data synthesis (figure 1, modified Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) flow chart).9 Regarding abstract selection there was one disagreement between the two reviewers (0.25%): one abstract was identified by the second reviewer (AAB) as ‘possible’ which had been excluded by the first reviewer (RI). Following full text review, there was a disagreement between the first two reviewers to include two articles which were excluded by the senior reviewer (AAB). This disagreement was resolved by discussion and the decision was to exclude them for not being RCTs.

Figure 1

Modified Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) flow diagram. RCT, randomised controlled trial.

A total of 967 clinical measures were reported. There were large differences in the definitions used to describe different outcomes and their measures. IOP was the most commonly reported outcome (422 times, 44% of all measures) (table 1). IOP was reported as an outcome in 201 of the 233 RCTs (86%). Among the IOP-related measures (table 1), the most commonly used was mean IOP (n=143, 15% of all measures) while composite definition of success/failure using IOP as part of the definition (eg, achieving a percentage of IOP reduction or target IOP with or without medication) was the second most commonly used (n=135, 14% of all measures) (table 1). AH dynamics was only reported in 5 of the 233 RCTs (2%). However, safety outcomes were reported in 145 RCTs (62%). Safety measures were described a total of 346 times (36%) (table 1). Visual acuity was the commonest among the safety measures (n=75, 8% of all measures) (table 1). However, VF was used in 38 RCTs (16%; reported 50 times, 5% of all measures) (table 1), with VF progression measured with global indices/score the most common (n=30, 3% of all measures) (table 1). Ocular and systemic haemodynamics (reported in 32 RCTs; n=91, 9% of all measures), anatomical outcomes (reported in 25 RCTs; n=34, 4% of all measures) and surgical outcomes (reported in 19 RCTs; n=19, 2% of all measures) were measured and reported. Optic nerve head morphology was the commonest among the anatomical-related measures (n=19, 2% of all measures) (table 1).

Systemic haemodynamics (blood pressure and pulse rate) was the most commonly reported measure among the haemodynamics (n=44, 5% of all measures) (table 1).

Primary outcomes were defined in 143 of 233 RCTs (61%). Thirty-one of the 143 studies (21.6%) determined more than one primary outcome; the total number of primary measures identified was 186, with IOP-related measures being the commonest (n=135, 73% of primary measures) (table 1). Mean IOP/mean diurnal IOP/24 h IOP was the most common primary measure (n=51, 27% of all primary measures) (table 1).

Discussion

To the best of our knowledge, this is the first study to describe the selection of clinical outcomes in glaucoma RCTs. The study demonstrated a large variation in the way clinical outcomes were reported and a total lack of standardisation which may impair the ability to robustly evaluate the evidence of glaucoma interventions.

This variation was greatest for IOP and safety measures, the most commonly used clinical outcomes. Even though VF tests were uncommonly used as an outcome, a diversity of measurements was also apparent (with 12 different ways of evaluating VFs), whereas persistence/adherence to therapy has never been reported as an outcome.

While the review was conducted in a rigorous, systematic manner, it is possible that some studies may have been missed by not searching the ‘grey’ literature. The review had good external validity—the studies we identified included different glaucoma populations and different interventions. We excluded observational studies, but we were confident that these reports would be unlikely to add to the breadth of outcomes observed in this review. Although the search for RCTs covered the period between January 2006 and March 2012, original reports of the included RCTs had covered the period between 1988 and 2005 as regards study design and recruiting participants.8 ,10–13 However, we acknowledge the potential limitations of time frame and language restrictions. We acknowledge that different clinical trials may have different objectives that lead to variation in primary outcomes. Phase I and II trials might be most concerned with effects on IOP and later phase trials might be most concerned with effects on VF. However, our aim was to determine the breadth of used and reported clinical outcomes as a first step in the development of core clinical outcome sets for glaucoma interventions. A Delphi survey is ongoing to get consensus on clinical outcomes and identify further clinical measures for implementation in glaucoma trials.

The selection of appropriate outcomes is crucial when designing clinical trials to directly compare the effects of different interventions in a way that maximises evidence synthesis. There is a growing recognition that insufficient attention has been paid to the selection of outcomes when conducting a clinical trial.1 Clinicians often assess glaucoma treatment in terms of the effect on IOP. However, being a surrogate outcome, it may be inappropriate to use IOP as the sole consideration for the evaluation of a glaucoma intervention14; besides there are different ways of evaluating IOP changes. In the published literature over a 5-year period (between 2001 and 2005), Rotchford and King15 identified variation in reporting of IOP success rates in glaucoma surgical trials. The authors concluded that there were nearly as many different IOP-related definitions of success after glaucoma surgery as there were articles on the subject, and more than 70% of definitions appeared in only a single article. The authors recommended that standardisation of published outcome parameters is essential to allow meaningful comparisons between different study reports.15

The outcomes reported by studies are key for clinicians when making decisions about healthcare; however, there is a general lack of consensus regarding the selection of outcomes in clinical settings, which affects trial design, analysis and reporting.16 Measuring outcomes that will not change healthcare decisions leads to a waste of resources and a failure to capitalise on the potential power of research to improve healthcare.4

The difficulties caused by differences in outcome measurement are well known to systematic reviewers. For instance, the five most accessed and top cited Cochrane Reviews in 2009 all reported problems related to outcomes in eligible trials due to inconsistencies in the outcomes reported in the primary reports.4 There is still great uncertainty on how best to select the most appropriate outcomes for use in a clinical trial.17 Some of the factors underlying this uncertainty may include ambiguity regarding which outcomes are of relevance to patients, unknown performance characteristics of potential outcomes (eg, reliability) or improvements in general care of patients with emergence of new technologies, with the result that previously used outcomes are no longer relevant.18

The Core Outcome Measures in Effectiveness Trials (COMET) Initiative19 was launched in January 2010 to address this lack of standardised outcomes in clinical trials and to develop a minimum set of measures named ‘core outcome sets’, which include endpoints to be reported as a minimum. There is an expectation that the core outcomes will be reported to allow the results of studies to be combined as appropriate; and that researchers will continue to collect other outcomes as well.6 If the outcomes in the core outcome sets are reported then reviewers or people looking at studies in isolation will always have access to the same set of key data.6

To solve the issue of outcome heterogeneity, we aim to reach consensus on important clinical outcomes and how to measure the outcomes in glaucoma intervention trials by taking into account the views of clinicians and trialists through a Delphi survey of expert opinion and by implementing the COSMIN (Consensus-based Standards for the Selection of Health Measurement Instruments) approach20 as a final step.

In summary, the selection of outcomes and their measures was inconsistent among glaucoma RCTs. In addition, there was a wide variety of description within individual clinical outcomes. There is an obvious need to define a core clinical outcome dataset for glaucoma trials.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • Contributors All authors declare that they have participated sufficiently in the conception and design of this work or the analysis and interpretation of the data, and the writing of the manuscript to take public responsibility for it. Neither this manuscript nor one with substantially similar content under our authorship has been published or is being considered for publication elsewhere.

  • Funding RI is funded by the James Mearns Trust for PhD Studentship. The Health Services Research Unit is funded by the Chief Scientist Office of the Scottish Government Health Directorate.

  • Competing interests RI had financial support from the James Mearns Trust for PhD studentship for the submitted work.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.