Article Text

PDF

An interinstitutional comparative study and validation of computer aided drusen quantification
  1. V Sivagnanavel1,
  2. R T Smith2,
  3. G B Lau1,
  4. J Chan2,
  5. C Donaldson3,
  6. N V Chong1
  1. 1Retinal Research Unit, King’s College Hospital, University of London, UK
  2. 2Department of Ophthalmology, Columbia University, New York, NY, USA
  3. 3Department of Biostatistics, Research and Development, Kings College Hospital, University of London, UK
  1. Correspondence to: Miss V Sivagnanavel Retinal Research Unit, King’s College Hospital, University of London, UK; vasuki_siva1yahoo.co.uk

Abstract

Aims: To assess the portability and clinical applicability of a software program based on Photoshop (Adobe Systems Inc, San Jose, CA, USA) for digital drusen quantification.

Methods: Independent graders from the Digital Fundus Photo Reading Center of Columbia University and King’s College Hospital used macular background levelling software to quantify the percentage of drusen in the central and middle Wisconsin subfields. 100 images of consecutive patients with choroidal neovascularisation in one eye and significant drusen in the other eye were analysed to determine suitability, and 10 were chosen for assessment by this software.

Results: Of the 10 images used in the interinstitutional validation, the random effects ANOVA for the central and middle subfields showed a high degree of interobserver agreement. The ICC for interobserver reliability was 0.83 (95% CI: 67 to 95) for the central subfield and 0.84 (95% CI: 69 to 99) for the middle subfield. Overall agreement with the manual grading results was good and the within patient coefficient of variation was about 20% for all the pairwise comparisons between observers and the manual stereo gradings. Of the 100 images used to assess practical applicability of the software, 79 were suitable for semiautomated analysis. 13 had extensive mixed retinal pigment epithelial (RPE) changes limiting drusen identification, five had a significant number of reticular drusen, which are poorly identified by the software, and three had multiple small areas of RPE atrophy, which are difficult to distinguish from drusen.

Conclusions: The software was successfully used by two institutions demonstrating portability, with good correlation between graders and to the manual stereo grading. Digital drusen quantification was possible in 79% of the images analysed.

  • AMD, age related macular degeneration
  • ARM, age related maculopathy
  • ICC, intraclass correlation coefficient
  • RPE, retinal pigment epithelium
  • drusen
  • macula
  • AMD, age related macular degeneration
  • ARM, age related maculopathy
  • ICC, intraclass correlation coefficient
  • RPE, retinal pigment epithelium
  • drusen
  • macula

Statistics from Altmetric.com

Age related macular degeneration (AMD) is the leading cause of blindness in the developed world.1–7 A hallmark feature is the presence of drusen, and increases in drusen load have been correlated with advanced stages of AMD.1–4 Several studies have attempted drusen reduction as a means of preventing visual loss.5,6,7,8,9,10 Manual drusen quantification is laborious and costly.11,12 Automated drusen quantification has value in furthering our understanding of the natural history of AMD and in trials of drusen reduction. We used a Photoshop (Photoshop 5.5, Adobe Systems Inc, San Jose, CA, USA) based semiautomatic drusen quantification software developed at Columbia University,13 and evaluated it for interinstitutional portability and clinical applicability.

METHODS

Colour fundus images (Topcon TRC 50IX retinal camera) from 10 patients were selected at random from the digital database at Kings College Hospital (KCH) with stage 2 or 3 age related maculopathy (ARM) (as defined by the international grading system). Extensive hyperpigmentary or hypopigmentary abnormalities were excluded. Images were analysed by the methods described previously.13 Briefly, the images used had minimum resolutions of 2700 pixels/inch. The images were saved as 24 bit RGB TIFF files, with 256 levels of intensity value for each colour channel. Images were then resized so that the distance from the centre of the macula to the temporal disc edge was 490 pixels, allowing uniformity of processing. The regions studied were the central 1000 μm diameter circular subfield and the 1000–3000 μm diameter annular subfield centred on the fovea, the central, and middle subfields defined by the Wisconsin grading template. Drusen area was measured as a percentage of the 1000 μm and 3000 μm subfield, and was unaffected by variable image size. The variation in brightness found in most fundus photographs was normalised using the red, green, and blue channels to create a standardised image in Photoshop, with nearly identical mean background colours, establishing a uniform basis for drusen segmentation. Contrast enhanced versions of the images (Photoshop/autolevels) were created for ease of visual recognition of drusen. Drusen analysis was carried out on the green channel of the standardised image using a digital template.13

After background levelling,13 the optimum threshold level for drusen segmentation in the selected subfield is chosen by flicker comparison with the contrast enhanced image. For a given threshold, the drusen image is segmented such that pixels with brightness intensities above the threshold are coloured green, to label as drusen, and the rest darkened. Each such drusen image is superimposed on the contrast enhanced image. The optimised threshold is selected by visually inspecting the correspondence of the boundaries of the segmented drusen objects to those of the contrast enhanced objects. The threshold is then adjusted so that this visual fit is optimum in the aggregate as judged by the user (fig 1A–C). The total drusen area as a percentage of the selected subfield is then read directly (Photoshop/Histogram).

Figure 1

 Images illustrating drusen segmentation. Contrast enhanced layer for drusen identification (A), selection of best fit threshold (B), alternative threshold over representing drusen load (C).

As part of the interinstitutional study, one expert and one non-expert grader from each institution (Eye Institute, Columbia University, USA and Kings College Hospital, London (KCH)) independently performed drusen quantification on the 10 images. A random effect ANOVA was used to assess the interobserver agreement in terms of the intraclass correlation coefficient (ICC). The interobserver and interinstitution effects were fitted in a random intercept (mixed) linear model in order to determine if the two institutions were related to any measure disagreement. The automated measurements were also compared against the stereo manual grading and the difference was estimated using the method suggested by Bland and Altman.14

Secondly, as part of a pretrial assessment for a potential drusen reduction randomised controlled trial, 100 consecutive fluorescein angiograms taken at KCH between April 1999 and November 2002 were reviewed. Patients included had choroidal neovascularisation as a result of AMD in one eye and significant drusen in the fellow eye (defined as five large drusen or more than 20 small drusen in the macular area). Colour images of the fellow eye were analysed to determine whether they were suitable to be assessed by this software based on its previously determined limitations.13

RESULTS

Interinstitutional validation

The most labour intensive process in our method was in background levelling. For simple images this took about 1 minute and for more complicated images about 7 minutes. The total time taken for complete image evaluation and drusen segmentation varied from 4–10 minutes compared to 20–30 minutes per image with manual tracing.

There was good consensus between graders in the selection of the final threshold for drusen quantification. The random effects ANOVA showed a high degree of interobserver agreement as most of the variability was due to the interpatient variation (F (9,30) = 20; p = 0.00001 and F (9,30) = 22; p = 0.00001 for the central and middle subfields). Although the results were rather similar for the middle and central subfields, the middle subfield showed better agreement in general. The ICC for interobserver reliability was 0.83 (95% CI: 67 to 95) for the central subfield and 0.84 (95% CI: 69 to 99) for the middle subfield. The random effects linear mixed model confirmed good interobserver agreement (mean difference of 4.7; 95% CI: −7 to 17.6; p = 0.44 3.6; 95% CI: −2.4 to 9.6; p = 0.24, for the central and middle subfields).and, in addition, it showed a non-significant disagreement between the two countries.

When the automated grading results were compared to the manual stereo grading results, we found that the automated measures tended to underestimate for large drusen values in both subfields. In addition, in the central subfield, the automated measures tended to overestimate for smaller drusen values. Optimum agreement with manual grading was obtained when the percentage of drusen in the measured area was 25%. Overall agreement with the manual grading results remained good and the within patient coefficient of variation was about 20% for all the pairwise comparisons. Figure 2A shows the plot of the automated versus manual measurements for each observer for the middle subfield, with the line of equality for comparison. The Bland and Altman plots of the difference versus mean of the automated and manual measurements for each observer are presented in figure 2B. The estimates of the disagreements between automated and manual gradings for each observer are shown in table 1, together with the test of whether the disagreement was significantly different from zero, either overestimating or underestimating the true drusen value. There was no significant deviation from the manual gradings for the central subfields for all four graders. Underestimation of drusen levels in the middle subfield reached significance for grader RTS.

Table 1

 Mean deviation for each observer from manual grading (graders JC and RTS from Columbia University, USA and graders VS and BL from Kings College Hospital, UK)

Figure 2

 (A) Plots of the automated v manual measurements in relation to the line of equality for each observer (middle subfield). (B) Difference v mean of automated and manual measurements for each observer (middle subfield).

Practical applicability

Seventy nine images were found to be suitable for analysis by the software. Of the 21 considered unsuitable, 13 had extensive mixed retinal pigment epithelial (RPE) changes limiting drusen identification (fig 3A). Five had a significant number of reticular drusen, which are poorly identified (fig 3B), and three had multiple small areas of RPE atrophy, which are difficult to distinguish from drusen (fig 3C). Significant thinning of RPE with baring of choroidal vessels can also make drusen recognition difficult (fig 3D).

Figure 3

 Images illustrating limitations of the practical application of software. Extensive RPE changes (A), reticular drusen (B), mixed drusen and RPE atrophy (C), baring of choroidal vessels (D).

CONCLUSION

Previous attempts at automated quantification have had limitations.15–17 Shin et al described a method of computer assisted, interactive image processing which afforded higher accuracy.18 They achieved an ICC of 0.92 and 0.93 for comparison of expert manual grading with automated supervised grading by two observers. However, major problems included identification of soft drusen with indistinct borders, large size drusen, and contrast confusion from darker blood vessels.

Comparison of the results of our semiautomated method with stereo manual grading and intraobserver reproducibility have been reported previously.13 Good interobserver reproducibility has been demonstrated in the present study by graders from two institutions. Our digital method requires two supervised steps which are potential sources of interobserver variation—firstly, in background levelling for uniformity of drusen analysis and, secondly, in the selection of the threshold for drusen quantification. Levelling of the macular background is an approximation that may make a given section too bright or too dim. Consequently, drusen would be over-represented or under-represented. This was not a significant source of variation in this study. Disagreements between graders were predominantly the result of the subjective choice of final threshold selection. Large amounts of soft drusen with indistinct borders were more likely to be underestimated. Also, drusen underlying mixed RPE changes could potentially be excluded. Poor image quality and lack of stereo caused a tendency to include RPE atrophy as drusen. These confounding factors would have to be removed manually or by additional software and are a source of potential interinstituional and interobserver variation.

Although a semiautomated method requires greater time from the grader than a fully automatic system, it is an acceptable compromise for improved accuracy and reproducibility in relation to some published fully automated methods.19 Rapantzikos et al have had greater success using a histogram based adaptive local thresholding.20 However the limitations of confounding lesions has not been explored. Our semiautomated software has the potential to assess the change of drusen area in the majority of high risk patients with AMD. It has value in trials of drusen dynamics and reduction.

REFERENCES

View Abstract

Footnotes

  • Support: NY Community Trust, King’s Ophthalmic Fund.

  • A part of these results was presented as posters at the Association for Research in Vision and Ophthalmology, Fort Lauderdale, 2003 and in abstract form in

    and

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.