Surrogate endpoints are often used as replacements for true clinically relevant endpoints in several areas of medicine, as they enable faster and less expensive clinical trials. However, without proper validation, the use of surrogates may lead to incorrect conclusions about the efficacy and safety of treatments. This article reviews the general requirements for validating surrogate endpoints and provides a critical assessment of the use of intraocular pressure (IOP), visual fields, and structural measurements of the optic nerve as surrogate endpoints in glaucoma clinical trials. A valid surrogate endpoint must be able to predict the clinically relevant endpoint and fully capture the effect of an intervention on that endpoint. Despite its widespread use in clinical trials, no proper validation of IOP as a surrogate endpoint has ever been conducted for any class of IOP-lowering treatments. Evidence has accumulated with regard to the role of imaging measurements of optic nerve damage as surrogate endpoints in glaucoma. These measurements are predictive of functional losses in the disease and may explain, at least in part, treatment effects on clinically relevant endpoints. The use of composite endpoints in glaucoma trials may overcome weaknesses of the use of structural or functional endpoints in isolation. Unless research is dedicated to fully develop and validate suitable endpoints that can be used in glaucoma clinical trials, we run the risk of inappropriate judgments about the value of new therapies.
- Clinical Trial
- Intraocular Pressure
Statistics from Altmetric.com
Definitions of biomarkers and surrogate endpoints
Clinical trials are the standard scientific method for assessing the benefits and risks of new therapeutic interventions. For phase III clinical trials, the primary endpoint should be a clinical event that is relevant to the patient, that is, an event which the patient is aware of and wants to avoid. These endpoints are usually referred to as ‘hard’ or ‘true’ endpoints. For example, for an anticancer drug, the true endpoint would be survival, whereas for antihypertensive or cholesterol-lowering drugs, it would be reduction in the incidence of myocardial infarction, stroke or death.
When the true endpoints are infrequent, or only observed after long periods of follow-up, clinical trials may become impractical and very expensive. In this situation, an attractive solution is to replace the true endpoint by a biomarker that can be measured earlier, more conveniently or more frequently.1 Biomarkers are measurements that indicate biological processes, including physiological measurements, blood tests, genetic, metabolic data, or measurements from images.2 Examples of biomarkers include cholesterol level, blood pressure, and measurements of tumour size from MRI. However, although many biomarkers can be associated with a disease and have a wide array of uses, only a few potentially qualify as surrogate endpoints. In order to qualify as a surrogate endpoint, a biomarker needs to demonstrate significant ability to predict the clinically relevant outcome as well as the effect of treatment on this outcome.3–5
The use of validated surrogates in clinical trials may offer several advantages. They enable shorter and less expensive trials as it is generally less expensive and takes less time to see the effect of the intervention on the surrogate rather than on the ‘hard’ clinical endpoint. In fact, studying short-term changes in blood pressure is far easier than following thousands of subjects for several years to assess mortality rates from cardiovascular disease. From a practical standpoint, shortening the duration of a clinical trial also limits possible problems with non-compliance and missing data, which are more likely in longer studies, therefore increasing the effectiveness and reliability of research. The use of surrogates may also allow observation of a greater number of endpoints during follow-up than what would be achieved with observation of ‘hard’ endpoints, further reducing sample size requirements.
Despite their attractiveness, the use of surrogate endpoints has the potential to cause harm.6–9 Unless fully validated, surrogates may waste resources, provide ambiguous evidence, and not measure what one really wants to study.10 The main potential disadvantage of surrogates is that favourable effects on surrogates do not automatically translate into benefits to health.11
Validation of surrogate endpoints
It is a common misconception to accept that if a biomarker is correlated with the true clinically relevant outcome, it can be used as a surrogate endpoint. However, correlation is a necessary, but not sufficient condition for surrogacy.6 In a landmark study, Prentice4 formulated a set of operational criteria for validating a surrogate endpoint. These criteria can be succinctly summarised in two parts: correlation and capture. Under correlation, the surrogate endpoint must be statistically correlated to the clinical endpoint. Under capture, an intervention's ‘net effect’ on the clinical endpoint should be fully captured by the intervention's effect on the surrogate endpoint. The net effect is the aggregate effect accounting for all mechanisms of action of the intervention. Although the first criterion is usually easy to verify, the second is not. In fact, inappropriate checking for the second condition in early attempts to use surrogates led to erroneous and even harmful conclusions for some disease conditions.6–9
An ideal surrogate endpoint is one in which all mechanisms of action to the true endpoint are mediated through the surrogate endpoint, as shown in figure 1.6 Specifically, the surrogate is the only causal pathway in the disease process, and the intervention's entire effect on the true endpoint is mediated through its effect on the surrogate. Such ideal surrogate endpoints, however, are not known at present. Even widely accepted surrogates such as blood pressure or HIV viral load or CD4 counts, do not explain the full effect of treatments on the true endpoints. In practice, successful surrogates have been shown to explain only part of the treatment effect and several statistical methodologies have been developed to quantify this effect.12
How could a treatment significantly affect a biomarker that is correlated with a clinically relevant endpoint, but at the same time not provide a meaningful effect on that endpoint? figure 2A illustrates such a situation, where a disease causally influences a biomarker as well as the true clinical endpoint. As a result, the biomarker is correlated with the clinical endpoint. However, if this biomarker does not lie in the biological pathway by which the disease process actually influences the occurrence of the clinical endpoint, then affecting the biomarker might not affect the clinical endpoint.
Figures 2B, C show other examples of invalid surrogacy. In these cases, there are different pathways through which the disease process influences the risk of the true endpoints. If the proposed surrogate endpoint lies in only one of these pathways, and if the intervention does not actually affect all pathways, then the effect of treatment on the true endpoints could be overestimated (figure 2B) or underestimated (figure 2C) by the effect on the candidate surrogate. Finally, the intervention might actually affect the true endpoint by unintended mechanisms of action that are independent of the disease process (figure 2D).
Validation of a surrogate should be based on sound research on the biological plausibility and also on in-depth clinical insights and empirical evidence. Ideally, one should have a comprehensive understanding of the causal pathways of the disease process and of the intervention's mechanisms of action. However, achieving such understanding may be challenging. The proper development of surrogate endpoints requires conducting a trial with a given treatment, and analysing the true and surrogate endpoints. Ironically, to fully validate a surrogate, investigators may actually have to end up performing the very trial that they wanted to avoid. Once a surrogate is validated for a specific treatment or a given class of agents, it is tempting to consider that it can be used as a replacement endpoint when evaluating other classes of agents as well. However, it is uncertain if the same surrogacy relationship is applicable to that demonstrated for previous treatments.13
Clinically relevant endpoints in glaucoma
Glaucoma is the main cause of irreversible blindness in the world. It is a disease associated with progressive retinal ganglion cell (RGC) loss leading to characteristic changes in the appearance of the optic nerve head and retinal nerve fibre layer (RNFL).14 The damage to RGCs can lead to functional deficits that, if severe enough, may result in loss of vision and decreased vision-related quality of life. The fundamental goal of glaucoma treatment is to prevent patients from developing visual impairment that is sufficient to produce disability in their daily lives and impair their health-related quality of life.15 Therefore, in the context of glaucoma, the true endpoints would be significant loss of vision with decrease in quality of vision or quality of life, or development of functional disability.
As glaucoma is generally a slowly progressive disease, direct observation of disability endpoints is difficult. A randomised clinical trial using these endpoints would be lengthy and difficult to perform. Visual field changes as measured by standard automated perimetry (SAP) have been accepted as representing clinically relevant endpoints. However, it should be noted that from a patient perspective what really matters is a change that affects his or her life. Unfortunately, no longitudinal studies have been conducted evaluating the relationship between longitudinal change in quality of life and progressive visual field loss in glaucoma. From cross-sectional studies, the relationship between SAP results and measures of quality of life or disability in the disease have been proven weak at best.16 This may represent an inability of SAP in capturing how functional losses impair quality of life or daily activities, but may also be related to weaknesses of currently available methods for assessing functional impairment. Additionally, the large variability in patients’ perceptions about quality of life may weaken the associations seen in cross-sectional data. Therefore, there is a compelling need to better characterise measures of functional impairment and quality of life in glaucoma over time, and to understand how they relate to conventional clinical tests.
Surrogates in glaucoma: the case of intraocular pressure
Intraocular pressure (IOP) has traditionally been used as a surrogate endpoint in clinical trials. Use of IOP as a surrogate endpoint is based on epidemiologic evidence relating IOP to the risk of development and progression of glaucoma. However, even though IOP is the most important known risk factor for glaucoma, it is clearly an imperfect correlate for the clinically relevant outcomes of the disease. Many glaucoma patients can progress to visual loss despite low IOP levels.17 On the other hand, many subjects with high IOP never develop any significant functional changes indicative of glaucoma.18
It is startling to verify that even though IOP has been used as a surrogate endpoint and basis for regulatory approval of new treatments, no proper validation of this surrogate has ever been conducted for any class of IOP-lowering medications. To properly validate IOP as a surrogate endpoint, studies would need to demonstrate that the effect of the drug on IOP is a reliable indicator of its effect on a clinically relevant endpoint such as visual field loss.
Could a treatment provide significant IOP-lowering effect but at the same time not provide a meaningful effect in preventing visual loss from glaucoma? As indicated in figure 2D, this situation is possible. A drug could successfully lower IOP, but at the same time have unintended detrimental effects on the clinically relevant outcome. These detrimental effects could offset the benefits caused by IOP lowering resulting in no net benefit or even patient harm.
Timolol is a topical β-adrenergic antagonist that has been used for several decades for IOP lowering. Until recently, timolol was the most commonly prescribed drug for IOP lowering, and is still considered the ‘gold-standard’ against which new proposed treatments need to be compared. However, as surprising as it may seem, there have been no randomised clinical trials in the literature to demonstrate that timolol can significantly reduce vision loss in glaucoma compared to placebo.19 A recent randomised study (Low-pressure Glaucoma Treatment Study (LoGTS)) compared timolol maleate 0.5% versus brimonidine tartrate 0.2% in preserving visual function in patients with low-pressure glaucoma.20 The results of the study showed that, despite similar mean treated IOP in both groups, patients using timolol had much higher incidence of visual field progression (39.2%) compared to those using brimonidine 0.2% (9.1%). The higher incidence of visual field progression in the timolol group, despite nearly identical IOP-lowering effects, could indicate a relatively harmful effect of timolol in causing visual field progression, a relative beneficial effect of brimonidine, or both. Timolol is known to decrease heart rate and arterial blood pressure, which could potentially reduce ocular perfusion pressure, a potential risk factor for glaucoma progression.21–23 Reduction of ocular perfusion pressure could then be an unintended effect of timolol, which would reduce its benefit in preventing disease progression. In this situation, timolol would behave like a treatment depicted in figure 2D, and mean IOP would not be a proper surrogate endpoint, overestimating the effect of the drug on the clinically significant endpoint. As an alternative explanation for the LoGTS results, the investigators suggested that brimonidine could be offering a neuroprotective effect by reducing the incidence of visual field progression more than what would be expected by its effect on IOP. In this case, brimonidine would be partly acting like a treatment depicted on figure 2C. In this situation, IOP would also not be a proper surrogate endpoint, underestimating the beneficial effect of the treatment on the true endpoint.
Even though randomised studies such as the Ocular Hypertension Treatment Study (OHTS)17 and the Early Manifest Glaucoma Treatment Trial (EMGT)24 have provided important evidence for the role of mean IOP as a risk factor for glaucoma development and progression, their reported results cannot be used to fully support mean IOP as surrogate endpoint in clinical trials. OHTS and EMGT used multiple different classes of medications in the treatment arm. As pointed out above, surrogacy needs to be evaluated in the context of a particular class of treatment regimens. When multiple drugs are used in the treatment arm, possible unintended effects of different classes of medications may confound the assessment of IOP surrogacy.
The UK Glaucoma Treatment Study (UKGTS)25 was a randomised placebo-controlled trial aimed at investigating whether latanoprost is able to reduce visual field deterioration in glaucoma. The study was concluded recently after a 2-year observation period and results should soon be reported in the literature. As IOP and visual field endpoints were assessed as part of the randomised trial, the UKGTS will have the opportunity to evaluate whether IOP served as a proper surrogate endpoint for assessing the efficacy of prostaglandin analogues. It should be noted, however, that a single randomised study might not provide sufficient evidence for validating a surrogate. A meta-analysis approach analysing data from various trials may be the most promising way for surrogate validation, because of its avoidance of the need for strong assumptions regarding confounding.26 ,27
Surrogates in glaucoma: the case of imaging
IOP is clearly an inappropriate surrogate endpoint for clinical trials evaluating potentially neuroprotective agents. This fact illustrates clearly why surrogacy should be evaluated in the context of specific classes of treatments. Given the inadequacy of IOP as a surrogate, clinical trials evaluating neuroprotection drugs need to use other endpoints. The use of visual fields as the sole endpoint in glaucoma trials is potentially limited by the need for large samples, long-term follow-up, and variability of results.28 In the past two decades, evidence has accumulated with regard to the role of structural measurements of the optic disc topography and RNFL for diagnosing and detecting glaucoma progression. The use of structural measurements as surrogate endpoints in glaucoma clinical trials could potentially have a number of advantages, including faster acquisition of a sufficient number of endpoints with potential reduction in sample size requirements, enabling shorter, more effective, and less expensive trials.
An analysis of the potential use of structural measurements of the optic nerve and RNFL as surrogate endpoints should be made in terms of biological plausibility, prognostic value, and whether treatment effects on the surrogate correspond to effects on the clinically relevant outcomes.
The biological plausibility is clear. The hallmark of glaucoma is progressive RGC loss, which results in loss of the RNFL and change in optic disc topography. This is supported by strong clinical, epidemiologic and experimental data.14 In fact, the evidence linking structural damage of the optic nerve to visual field loss in the disease is actually stronger than that for IOP.29
There is also evidence about the prognostic value of structural optic disc assessment.30 ,31 These structural changes can be objectively quantified in a reproducible way with imaging technologies, such as optical coherence tomography (OCT), scanning laser polarimetry and confocal scanning laser ophthalmoscopy (CSLO).32–34 In several studies, changes in neuroretinal rim area, as measured by CSLO, were shown to predict future visual field losses.35–40 More recently, longitudinal studies using spectral domain OCT have shown that changes in RNFL thickness were also highly predictive of future functional losses in glaucoma suspects.41 ,42 However, in order to show that structural measurements can be used as reliable surrogate endpoints, one has to also demonstrate that the effect of treatment on changes in structure is a reliable predictor of the effect of treatment on changes in function. A recent study attempted to verify whether CSLO neuroretinal rim area measurements could satisfy Prentice's criteria for surrogacy.36 The study demonstrated that, even though the effect of treatment on rim area did not fully explain the effect of treatment in preventing visual field loss, it explained a considerable part of it. Using a measure called proportion of treatment effect (PTE), the authors showed that rim area measurements were able to explain 65% of the effect of treatment on the risk of development of visual field loss. Although this effect can be considered only moderate, it should be noted that a PTE of 100% has not been demonstrated for any surrogate endpoint in medicine. Further, it is possible that stronger effects could be demonstrated for other imaging measurements potentially more sensitive than rim area, such as spectral domain OCT-measured RNFL thickness.
A caveat needs to be mentioned with regard to the above studies on the predictive value of imaging measurements. These studies have only linked changes in structural measures to changes in automated perimetry. They have not directly shown a prognostic relationship between structural measurements and the endpoints directly representing measures of functional impairment or disability. However, this limitation pertains to any other potential surrogate used in glaucoma trials, such as IOP or SAP measurements. Additionally, as structural changes are predictive of visual field losses, it is expected that if persistent, these losses would eventually result in significant decline in measures of quality of vision, patient performance, and quality of life.
The use of structural measurements as sole endpoints in clinical trials is limited by the known relationship between disease severity and ability of these measurements to detect change.43 Imaging assessment of the RNFL and optic disc topography seem to perform relatively poorly for detecting progression in advanced stages of the disease.43 Therefore, changes in visual function may be seen in the absence of detectable structural losses. However, this limitation could be addressed by the use of composite endpoints1 including structural measurements as well as functional endpoints.44–46 Another potential solution is the use of combined metrics of structure and function. Recent studies have shown that a combined metric of estimating ganglion cell loss in glaucoma using structural and functional tests performed significantly better than isolated structural or functional tests for diagnosis, staging and detecting disease progression.47 ,48
It is important to emphasise that validation of surrogacy of structural measurements has not yet been made in the context of neuroprotective therapies. Extrapolation of surrogacy from studies evaluating IOP-lowering therapy could be inappropriate. It is possible that a candidate neuroprotective drug could have a beneficial effect on the structural surrogate while not showing net beneficial effects in the functional clinically relevant outcome. For example, a drug could preserve tissue anatomy without really preserving function. If only structural measurements are used as surrogate endpoints in this situation, they would tend to overestimate the benefit of the treatment. This highlights the importance of a comprehensive understanding of the mechanisms of action of the proposed therapy.
In order to fully engage our patients in treatment decisions and to practice truly patient-centred medicine, we must understand how therapies affect outcomes that are important to them. Surrogate endpoints may be viable alternatives when obtaining the true endpoints would result in unfeasible studies. However, these surrogates need to be properly validated before widespread use in practice. Unless research is dedicated to fully develop and validate suitable endpoints that can be used in glaucoma clinical trials, we run the risk of inappropriate judgments about the value of new therapies, or worse, we run the risk of harming those that we want to protect.
Funding National Eye Institute grant EY021818.
Competing interests FAM: research support from Allergan, Alcon, Merck, Carl-Zeiss Meditec, Heildelberg Engineering, Topcon, Reichert Inc.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.