Article Text

PDF

Photodynamic therapy with verteporfin is effective, but how big is its effect? Results of a systematic review
  1. C Meads,
  2. C Hyde
  1. West Midlands Health Technology Assessment Group, Department of Public Health and Epidemiology, University of Birmingham, UK
  1. Correspondence to: Dr C Meads West Midlands Health Technology Assessment Group, Department of Public Health and Epidemiology, University of Birmingham, Edgbaston, Birmingham, UK; c.a.meadsbham.ac.uk

Abstract

Background: In 2001 the National Institute for Clinical Excellence (NICE) was asked to issue guidance for England and Wales on the use of photodynamic therapy (PDT). This process has been protracted, partly because of a dispute over the magnitude of beneficial effect. This article examines the origins of the debate about the true treatment effect size for PDT with verteporfin.

Methods: A systematic review of the clinical effectiveness of PDT compared with current practice was undertaken. Searches in Medline, Embase, the Cochrane Library, and the Internet, updated to January 2003, revealed two fully published and four ongoing randomised controlled trials.

Results: The results of the two published trials (TAP and VIP) consistently showed that overall, PDT with verteporfin is more effective than placebo in slowing the rate of vision loss. In the TAP trial, 12 or more subgroup analyses were undertaken on the primary outcome measure and in VIP, 10 subgroup analyses but only on a subset of the trial participants. Subgroup analysis results were found to be inconsistent between the two trials, with VIP suggesting that verteporfin was equally effective in occult as in mixed lesions and TAP suggesting that verteporfin was more effective in the predominantly classic subgroup.

Discussion: For several reasons it was considered that the most likely estimate of the predominantly classic subgroup effect size was the whole trial result. This has implications for the relationship between cost and benefit, the subject of intense debate. Results of the ongoing trials should help to clarify this subgroup effect size issue.

  • photodynamic therapy
  • systematic review
  • age related macular generation
  • AMD, age related macular degeneration
  • CNV, choroidal neovascularisation
  • NICE, National Institute for Clinical Excellence
  • PDT, photodynamic therapy
  • RCT, randomised controlled trial
  • photodynamic therapy
  • systematic review
  • age related macular generation
  • AMD, age related macular degeneration
  • CNV, choroidal neovascularisation
  • NICE, National Institute for Clinical Excellence
  • PDT, photodynamic therapy
  • RCT, randomised controlled trial

Statistics from Altmetric.com

Age related macular degeneration (AMD), particularly the wet form, is an important cause of blindness and a serious public health challenge in older people. Until recently little could be done either to prevent or treat the condition, beyond laser photocoagulation for extrafoveal lesions. The development of photodynamic therapy (PDT) with verteporfin to treat subfoveal wet AMD consequently attracted much attention and it was licensed for use in predominantly classic wet AMD in 2000. This licence has recently been extended to occult wet AMD but not to minimally classic wet AMD.

In 2001 the National Institute for Clinical Excellence (NICE) was asked to issue guidance for England and Wales on the use of PDT with verteporfin, for its then main licensed indication (predominantly classic wet AMD). This process has been protracted (and indeed is still ongoing at the time of writing this article). Although there is clear randomised controlled trial (RCT) evidence on the effects of PDT with verteporfin on visual acuity and contrast sensitivity, debate about its cost effectiveness has been intense.1–3 In assessing the balance between costs and benefits, the magnitude of any beneficial effect assumes as great if not greater importance than the presence of benefit alone. Contention about size of effect has been an important component of the uncertainty about cost effectiveness of PDT with verteporfin.

In the context of a systematic review on the effectiveness of PDT, undertaken as part of the health technology assessment to inform the decision of the NICE appraisals committee, we reconsider the evidence on the effectiveness of PDT. The purpose is to expose to wider scrutiny the origins of the debate about the true effect size of PDT on visual acuity in those with predominantly classic wet AMD.

METHOD

A systematic review was undertaken with reference to a predefined protocol lodged with the National Co-ordinating Centre of Health Technology Assessment and published online.4 There were no major departures from this protocol. The review question was the effectiveness of PDT in wet AMD relative to current practice. Randomised controlled trials were included where PDT using any photosensitive drug was compared with no specific treatment, placebo, or laser photocoagulation in adults with wet AMD. There was no restriction on outcomes, but information on visual acuity, contrast sensitivity, quality of life, and side effects of treatment were particularly sought. The primary search, conducted up to September 2001, included major databases (Cochrane Library, Medline, and Embase), Internet sites, conference abstracts (ARVO), and checking of reference lists of included studies. The searches on Cochrane Library, Medline, Embase, and the Internet were updated to January 2003. Data extraction and quality assessment of included studies using the Jadad checklist5 were done in duplicate. Analysis was principally qualitative, although meta-analysis was used to explore some aspects of the data, using Metaview 4.1 in Cochrane Collaboration Review Manager 4.1 software (copyright 1999, Update Software). Fixed effects models were used where there was no statistical heterogeneity, and random effects used where it was present. Statistical heterogeneity was tested for by whether the magnitude of the χ2 values exceeded the degrees of freedom in the Forest plots.6

RESULTS

Six RCTs were identified: four ongoing and two completed. The additional searches up to January 2003 found three papers with additional material on one of the RCTs already identified7–9 but no new RCTs. All RCTs compared PDT with placebo. Five were industry sponsored10–15 and the single independent ongoing RCT was small.16 In all but one RCT the photosensitive dye was verteporfin, manufactured by Novartis Ophthalmics AG. The other, large, ongoing RCT used tin ethyl etiopurpurin, manufactured by Miravant Medical Technologies.13 None of the studies addressed the treatment of wet AMD lesions outside the subfoveal location. Regarding the expanding evidence base of PDT with verteporfin, two of the ongoing RCTs addressed effectiveness in patients with minimally classic (VIM)14 and occult only (VIO)15 wet AMD. The ongoing RCT using tin ethyl etiopurpurin completed the planned two year follow up but has not been published to date (January 2003). It appears that it “did not meet the primary efficacy endpoint”17 but did have “statistically significant treatment effects in select patient populations”.18

Only the two completed and published RCTs were analysed in the systematic review. The Treatment of Age-related Macular Degeneration with Photodynamic Therapy (TAP) study10,11 included 609 people (402 PDT; 207 placebo) with classic only or classic plus occult wet AMD, whose initial visual acuity in the affected eye was 73 to 34 letters (20/40 to 20/200; 6/12 to 6/60). The Verteporfin in Photodynamic Therapy (VIP) study12 included 339 people (225 PDT; 114 placebo) mainly with occult only wet AMD, whose initial visual acuity was greater than 50 letters (20/100 or 6/30). VIP also included patients with mixed classic and occult if visual acuity was greater than 70 letters, although the numbers in this category were small (59 of 225 and 22 of 114 in PDT and placebo respectively). In both RCTs participants were white, with equal numbers of men and women. The mean age in both was around 75 years. Participants were examined at three monthly intervals and either treated or not. The trials extended for two years, so patients could potentially be treated eight times. The outcomes, also measured at three monthly intervals were visual acuity, contrast sensitivity in the study eye, and side effects. The primary outcome was loss of 15 or more letters of visual acuity in the eye under investigation. In the text of the TAP trial report, this was reported as loss of 15 or more letters in some tables and as loss of less than 15 letters in others, with their relevant summary statistics and p values. Subgroups for analysis were apparently pre-specified, 14 being listed in protocol for TAP. Both RCTs were well conducted as judged by maximum Jadad scores of 5. In particular, allocation was truly random and appeared to be concealed. Double blinding is claimed, but must have been difficult considering that verteporfin is green and the 5% dextrose solution used as placebo was uncoloured.

The key results for the included studies at two years are summarised in table 1.

Table 1

Results of included studies

This shows that at two years PDT with verteporfin is effective in slowing the reduction in visual acuity as measured by loss of 15 or more letters, in both TAP and VIP. It should be emphasised that patients in both PDT and placebo arms suffer mean loss of visual acuity—in TAP the mean visual acuity in the PDT arm fell from 52.8 letters to 39.4 letters and from 52.6 to 32.9 letters in the placebo arm. The relative risk (RR) of 0.75 (95% CI 0.65 to 0.88) in TAP is equivalent to a number needed to treat (NNT), to avoid one person losing 15 or more letters of vision over two years, of 7 (95% CI 4 to 17). For VIP, the RR of 0.81 (95% CI 0.68 to 0.96) is equivalent to an NNT of 8 (95% CI 4 to 49). The differences in the study populations seem to make little difference to the result, as is emphasised in the Forest plot, shown in figure 1.

Figure 1

TAP and VIP number of people losing 15 or more letters at two years.

The test for heterogeneity confirms that the observed difference in results between TAP and VIP at two years is no greater than would be expected by chance alone, if they were both measuring the same treatment effect.

For the primary outcome, loss of 15 letters or more, the TAP results are similar at one year to the two year results. The one year RR of 0.72 indicated that the advantage of PDT over placebo established itself quickly. The results for VIP were not as similar between first and second years (RR 0.93 and 0.81 respectively) although the difference is accounted for by increased numbers of people attaining this outcome in the placebo arm by the second year (54% to 67%), the number with the outcome in the PDT arm remaining relatively constant from first to second year (51% to 54%).

Concerning other outcomes, table 1 indicates that the effectiveness of PDT in terms of prevention of visual acuity loss is augmented by a reduction in loss of contrast sensitivity in the TAP trial (mean letter loss of 1.3 in intervention group v 5.2 in control group). Curiously this outcome was not reported for VIP even though it was measured. In both RCTs significantly more side effects were reported in the PDT group compared with placebo, but the majority of these were minor, with the noted exception of >20 letter visual loss within one week of treatment. Fortunately, severe loss of visual acuity is uncommon with three people affected in the PDT arm of TAP and 10 in VIP. Assuming no other major impacts on patients, particularly those affecting function and quality of life, the net effect of PDT appears to be beneficial. However, without direct measures of impact on quality of life, it is difficult to gauge the degree to which these beneficial effects might be offset by increased adverse events, particularly during the procedure itself.

The results for the subgroup analyses in the TAP trial at two years are summarised in table 2. Three of the subgroup analyses listed in the RCT statistical analysis plan19 (iris colour, ethnicity, and new v recurrent lesion) are not reported subsequently in the RCT journal article.10 Subgroup analysis 10 in table 2, evidence of prior laser photocoagulation, is not mentioned in the statistical plan but may be an alternative statement of new v recurrent lesions. Of the subgroup analyses planned and reported, two (% lesion composed of classic choroidal neovascularisation (CNV) and evidence of occult CNV) emerged as being statistically significant by test of interaction (p<0.05). This is the best method to test for the presence of subgroup effects, but still needs to be considered in light of the number of subgroups examined. Thus, if 20 tests for interaction were performed on 20 different subgroups, one would expect one p value of <0.05 by chance alone. Subgroup analyses were also conducted and reported for the VIP trial, but were restricted to participants with no classic CNV (166 of 225 receiving PDT; 92 of 114 randomised to control). Thus there is no direct information on whether the VIP subgroup analyses reinforce those in TAP. By deduction one can calculate that the two year RR for loss of 15 letters or more of visual acuity for the subgroups of occult only and more than 0% classic (mostly minimally classic)—see table 3. The results for these two subgroups appear to be similar in the VIP trial as opposed to the results in the TAP trial although the numbers are small, the confidence intervals are wide and overlap for each subgroup.

Table 2

TAP study reported subgroup analyses at 24 months by treatment group and baseline characteristics*

Table 3

VIP trial results by absence or presence of classic neovascular lesions

Of the subgroup analyses reported in VIP on the subset of patients without classic CNV, two again emerged as statistically significant using an interaction test: initial number of letters read in study eye (⩾65 v <65) p = 0.004 and lesion size in MPS disc areas (⩽4 disc areas v >4 disc areas) p = 0.04.

DISCUSSION

The key result of the TAP and VIP trials indicates that PDT with verteporfin is more effective than placebo in terms of the primary outcome (loss of 15 letters or more of visual acuity) and it is very unlikely that this result is a chance finding. Furthermore, considering information on the other outcomes measured such as contrast sensitivity and side effects, the benefits seem to outweigh the harms so that PDT with verteporfin is effective overall in slowing the rate of vision loss. However, without information on effects of treatment on quality of life it is difficult to gauge the impact of treatment on patients. This set of findings is consistent with previous reviews and systematic reviews on this topic.20–22

For the two RCTs there is consistency between the results, particularly on relative effects such as RR of the primary outcome measure at two years. Because of the focus of publicity and licensing on the predominantly classic subgroup, this similarity of main effect across patient groups has not been highlighted. A strong view seems to have become established that PDT shows a differential effect in different patient groups but this was not supported in this review. The larger effect size suggested by the subgroup analysis of the TAP trial was not mirrored by findings from VIP. Presumably the TAP subgroup findings reported here were the basis for licensing the predominantly classic patient group and the VIP trial results the basis of being recently granted licence for use in occult wet AMD. It is unknown whether further applications will be made to extend the licence to treat minimally classic lesions, particularly small lesions (less than four disc areas) in people with initially good visual acuity.

The key issue we set out to explore in the systematic review is not what the direction of effect is and whether this is statistically significant, but what the true size of effect is based on the published evidence, particularly for participants with predominantly classic wet AMD. The answer to this is apparently straightforward—the results obtained for those participants in the TAP and VIP trials who had predominantly classic wet AMD. The numbers of these in VIP are not clear, but undeniably small, so the estimate of effect can reasonably be based on the results of those in the predominantly classic subgroup of the TAP trial. The quoted RR is 0.60 (95% CI 0.47 to 0.65) or an NNT of 4 (95% CI 2 to 7). This estimate is however very different from the whole trial estimate, and using it has implications for assessment of cost effectiveness. Using one estimate in preference to another will influence cost effectiveness by an approximate factor of 2, thus it is important to assess which alternative is the most appropriate. The critical component of this is whether the subgroups’ estimate for predominantly classic is or is not a chance finding. If it is not, then using the subgroup estimate is appropriate. If it is, then even though counter intuitive, using the whole trial estimate will give the most accurate estimate of effect of those with predominantly classic wet AMD. There are several arguments for and against this subgroup finding being a statistical artefact. The arguments against this being a chance finding are firstly that these subgroups were prespecified in the RCT protocol and statistical plan. Also the analysis adhered to statistical guidelines, particularly in the use of the test for interaction. It is claimed also that the first subgroup effect (predominantly classic being more responsive to PDT) has a strong biological plausibility23 and this has been widely accepted, particularly by licensing authorities.24

We shall now set out a number of arguments for the subgroup effect being a chance finding.

(1) The prespecification argument above is undermined by number of groups prespecified—ideally the number should be small in order to reduce difficulties of multiple significance testing. Of the number of groups prespecified originally (14) it would be expected that one statistically significant test of interaction would occur by chance alone. In fact two were obtained, but % classic and presence of occult are interdependent in the context of TAP because if one is statistically significant, the other will automatically be. It is essentially the same data expressed in a slightly different way, so effectively there was only one statistically significant result.

(2) The statistical plan, in describing subgroup analysis a priori, makes no reference to biological plausibility, suggesting that this may have been arrived at post hoc. Furthermore, the description of the rationale for the subgroup analysis is in terms of assessing general consistency across the subgroup levels, rather than a targeted investigation to a very limited number of factors for which there was a high level of initial suspicion about presence of a subgroup effect. Also, before the TAP trial was published, the groups normally mentioned were classic only, mixed classic and occult, and occult only. After the TAP trial was published, predominantly classic as a subgroup appeared; there is no mention of the term in Medline or Embase before this time.

(3) Data across TAP and VIP are not consistent with the subgroup effect identified in TAP. At trial level (the level at which the studies were designed and carried out) there is relatively little difference in effect between TAP and VIP despite their very different patient populations.

(4) The pattern of results in subgroups is odd (see fig 2). If predominantly classic AMD is more aggressive and sight threatening and so more susceptible to PDT treatment, a gradient of effect between 100% classic and 100% occult would be expected. The TAP subgroup analysis however suggests that occult has a similar effect size as predominantly classic, with minimally classic having a worse outcome than both. The VIP trial suggests a similar effect size in minimally classic as occult. For there to be a biologically plausible gradation of effect from pure classic to pure occult, the RR for occult would be near to or even greater than 1. This is incompatible with the actual result of the VIP trial in which the RR for occult only is 0.8.

Figure 2

Forest plot of TAP trial subgroup analysis of lesion composition at two years.

Influenced by the considerable volume of methodological literature suggesting a tendency to accept subgroup analyses which do not exist is more likely than rejecting important subgroup effects which are truly present,25–27 we felt that it was most likely that the subgroup effect in the TAP trial was a chance finding. Such a view is also consistent with comments in the most recent update of the Cochrane Review on PDT.28

The practical importance of the observations about true size of effect is particularly important in assessing cost effectiveness. If the whole trial estimate is used this may give a relatively pessimistic result. In contrast, use of subgroup analysis will result in cost effectiveness being approximately halved in comparison to that from the whole trial estimate.

This area should be kept under close review. The results of ongoing trials (especially the VIM and VIO RCTs) should provide further data to create a more complete picture of the relationship between the nature of the lesion and effect size. This picture would be greatly enhanced if the data could be analysed at an individual patient level, something that we would suggest the principal investigators of the published and ongoing trials urgently consider.

REFERENCES

View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • BJO at a glance
    Creig Hoyt