Article Text

Download PDFPDF

Sensitivity and specificity of a new scoring system for diabetic macular oedema detection using a confocal laser imaging system
  1. L Tong,
  2. A Ang,
  3. S A Vernon,
  4. H J Zambarakji,
  5. A Bhan,
  6. V Sung and
  7. S Page
  1. Departments of Ophthalmology and Endocrinology, Queen's Medical Centre, University Hospital, Nottingham NG7 2UH, UK
  1. Mr S A Vernonstephen_vernon{at}


AIM To assess the use of the Heidelberg retina tomograph (HRT) in screening for sight threatening diabetic macular oedema in a hospital diabetic clinic, using a new subjective analysis system (SCORE).

METHODS 200 eyes of 100 consecutive diabetic patients attending a diabetologist's clinic were studied, all eyes had an acuity of 6/9 or better. All patients underwent clinical examination by an ophthalmologist. Using the HRT, one good scan was obtained for each eye centred on the fovea. A System for Classification and Ordering of Retinal Edema (SCORE) was developed using subjective assessment of the colour map and the reflectivity image. The interobserver agreement of using this method to detect macular oedema was assessed by two observers (ophthalmic trainees) who were familiarised with SCORE by studying standard pictures of eyes not in the study. All scans were graded from 0–6 and test positive cases were defined as having a SCORE value of 0–2. The sensitivity of SCORE was assessed by pooling the data with an additional 88 scans of 88 eyes in order to reduce the confidence interval of the index.

RESULTS 12 eyes in eight out of the 100 patients had macular oedema clinically. Three scans in three patients could not be analysed because of poor scan quality. In the additional group of scans 76 out of 88 eyes had macular oedema clinically. The scoring system had a specificity of 99% (95% CI 96–100) and sensitivity of 67% (95% CI 57–76). The predictive value of a negative test was 87% (95% CI 82–99), and that of a positive test was 95% (95% CI 86–99). The mean difference of the SCORE value between two observers was –0.2 (95% CI –0.5 to +0.07).

CONCLUSIONS These data suggest that SCORE is potentially useful for detecting diabetic macular oedema in hospital diabetic patients.

  • Heidelberg retinal tomograph
  • screening
  • macular oedema
  • diabetes

Statistics from

Diabetic maculopathy is the major cause of visual loss in diabetics. In the early stages most patients have good vision and a recent large audit indicated that 86.5% of those cases requiring treatment have no symptoms. This underpins the importance of screening for maculopathy in diabetics.

Recently, we have described a new method for the detection and quantification of macular oedema by volumetric analysis using the Heidelberg retinal tomograph (HRT). This study was performed in patients referred to the ophthalmology department with a suspicion of maculopathy.

This study evaluates the role of the HRT in screening the maculas of unselected patients attending a hospital based clinic for control of their diabetes. In an attempt to improve on our original method of analysis, we have developed the System of Classification and Ordering of Retinal Edema (SCORE) which uses the subjective evaluation of the HRT colour map and the reflectivity images for the detection of macular oedema.



The entry criteria for the study were:


Patients attending a hospital diabetic clinic for management of their diabetes.


Best corrected visual acuity in each eye of 6/9 or better.


There must be no impairment of ability to perform the HRT scanning—that is, no serious musculoskeletal deformity or inability to understand simple instructions.


No previous ophthalmic treatment such as laser treatment or treatment for glaucoma. Patients with previous squint surgery who fulfilled the other criteria were included.

 All suitable patients were recruited when they were reviewed at a hospital diabetologist's clinic.

All patients gave informed consent and the study was approved by the ethics committee of Queen's Medical Centre, Nottingham.

All patients had cyclopentolate 1% eye drops in each eye before examination. Slit lamp biomicroscopy using a 78D lens was performed by an experienced ophthalmologist (LMT). A Goldmann fundus contact lens was used for borderline cases with possible macular oedema. The presence of clinically significant macular oedema (CSMO) according to ETDRS criteria was noted, and if so the eye was deemed to be “positive for oedema”.


Details of the acquisition of HRT images have been described in a previous paper. HRT software 2.01 was used in image acquisition, image alignment, as well as data analysis. An operator with at least 1 month's experience performed HRT with the machine (LMT).

One good quality scan of each eye was utilised in all analyses. A 2 mm diameter circle was drawn using the circle draw facility of the HRT and this was centred on the fovea. This was performed by visualising the pattern of the retinal vasculature on the reflectivity image. The positioning of the reference plane was performed as described previously.

We devised a novel system for detecting macular oedema using the HRT, called SCORE. SCORE represents the sum of two grades. The first grade is classified from the extent of retinal elevation using the HRT colour map facility, and the second grade from the HRT reflectivity image. A combined score of 0–2 was designated to “abnormal” or requiring “ophthalmic referral” and 3–6 was “normal” or “not requiring referral”.

Two observers, trainees in ophthalmology (AB and VS), were trained in SCORE by familiarising themselves with a set of standard images (see Figs 2 and 3) and some instructions on the use of these images. These observers had no knowledge pertaining to the clinical findings in these patients. The ability of these observers to use these standard images to grade patients' images was assessed by a set of “test images” of patients with and without macular oedema. This was to ensure that the observers could use SCORE in a reliable and consistent way. In order to minimise observer fatigue, the grading of the study patients was performed over a few sessions lasting not longer than 1 hour each.

Figure 2

Standard colour maps. From top to bottom, Grade 0, 2, 3. Any maps with extent of green area intermediate between 0 and 2 would be graded as 1. Green equates with an area of retina above the curved surface (defined in Fig 1), blue relates to the area of retina whose height lies below the curved surface but above the plane defined by the lowest point on the contour line, and red defines the area of retina lying below the plane of the lowest point of the contour line.

Figure 3

Examples of standard reflectivity images. From top to bottom, grade 0 (showing discrete yellow spots and blot retinal haemorrhages), grade 3 (example 1), grade 3 (example 2). Grade 1 would be designated if a grader is more than 50% but less than 90% certain that the image should be graded as grade 0. Grade 2 would be designated if a grader is more than 50% certain but less than 90% certain that an image should be graded as grade 3.

Colour map scoring

Under the stereo measurement feature of the HRT software, it was possible to superimpose a colour map over the 2 mm diameter circle of the topographic image. As the topographic map was coded by the position rather than magnitude of the maximum reflectivity along the z-axis (axis perpendicular to the two dimensional HRT images), we speculated that it would detect macular oedema more predictably than the reflectivity image. Visual evaluation of the topographic image without further software analysis would be difficult, as there is little contrast between points on this image. The colour map illustrates the topography of the macula within the 2 mm diameter circle using colour codes. The colour map was graded 0–3 where 0 corresponds to the greatest area of retinal elevation and 3 to the least (Fig1).

Figure 1

Diagram showing the definition of coloured regions within the 2 mm diameter circle. The top circle is the mean height of the contour line shown as the curved line. The curved surface is shown as a series of lines joining a central point of the circle at the level of the mean height of the contour line to an infinite number of points along the contour line at the circle perimeter.

Reflectivity image scoring

The reflectivity images were also graded by the two observers independently from 0 to 3 where 0 corresponds to the most abnormal image and 3 to a definitely normal image. A grade of 0 was given in cases where discrete white spots or grains were observed within the temporal arcades; or in cases where blot haemorrhages could be seen in all four quadrants of the macula. A grade of 3 was given for a “normal reflectivity image” which could be quite variable in appearance (Fig 3). Normal appearance may include that of red “dots” over the fovea without involving all quadrants of the macula, or small “dot haemorrhages” without “blot haemorrhages in the macula”, or a “bow tie” type of appearance over the centre of the macula (Fig 3).


The number of ungradable images was recorded. The reasons for any failure to assign SCORE were noted.

For the first observer, the specificity and predictive value calculations, together with 95% confidence intervals using SCORE to detect eyes “positive for oedema” were performed in each eye as well as in each patient.

The post-test likelihood of oedema (predictive value of positive test) was calculated using the Bayes theorem:

Post-test probability = ((pretest probability) × (sensitivity))/(((pretest probability × sensitivity) + (1 − pretest probability)(1 − specificity)))

The pretest probability would be the disease prevalence in the screening population.

Because of the low prevalence of cases “positive for oedema” expected in such a population, we doubted whether it would be possible to obtain a sensitivity and positive predictive value with narrow 95% confidence limits. The sensitivity and positive predictive value of SCORE was therefore assessed on the scans above pooled with an additional 88 scans of 88 eyes of diabetic patients with a greater prevalence of clinical macular oedema. These patients were referred to the ophthalmic department for suspected retinopathy. The corrected visual acuity in this group of eyes was similar to that of the first cohort—that is, 6/9 or better. The clinical examination, HRT imaging, and analysis and SCORE values were obtained in an identical fashion to that described above, although the HRT imaging was performed by another operator experienced in the technique (HJZ). It is important to note that the graders performed many grading sessions over different days, without any knowledge regarding the proportion of cases with clinical oedema in each session. This proportion could vary greatly between sessions as the scans were randomly distributed to different sessions.

In order to assess the interobserver agreement of the technique, the raw SCORE value of the second observer was compared with that of the first. For this purpose only the SCORE from all the right eyes of the 100 patients screened in the diabetic clinic was used. A plot of the SCORE difference between each pair of readings was made against the mean of each pair of SCORE readings to ensure that there is no systematic relation between the magnitude of the differences when the mean increases. The mean difference and 95% confidence interval of the differences were calculated. The repeatability coefficient, defined as 2 × (standard deviation of the differences), was also calculated as was the kappa statistic.


We studied 200 eyes (100 patients). There were 51 males and 49 females. The patients' mean age was 58.51 years (SD 13.06 years, range 27–81 years).

Twelve eyes in eight patients were “positive for oedema” clinically. Three scans in three patients could not be analysed because of poor scan quality, none of whom had clinical macular oedema.

The sensitivity, specificity, and predictive values of the scoring system are shown in Table 1. As expected, the sensitivity and positive predictive values of the test were difficult to interpret because of the wide confidence interval. For this reason, Table 1 also shows results following pooling of the additional 88 scans. Figure 4 shows the 2 × 2 tables for observer 1, where the indices were calculated. Within the subgroup of 88 scans, the prevalence of eyes “positive for oedema” on clinical examination was 86% (76/88). Analysis of the “pooled scans” was only performed on eyes rather than on patients as not all patients in the additional group had both eyes included. (Some of these patients, unlike those from the “screening population”, had one eye which failed the visual acuity criterion.)

Table 1

Specificity, sensitivity, positive and negative predictive values for observer 1 in percentages (95 CI)

Figure 4

Two × two tables illustrating the results of screening by observer 1: 0 refers to a negative result; 1 refers to a positive finding.

Using the Bayes theorem, the post-test probability of a “positive test patient” to have macular oedema in at least one eye would be 100%.

In the interobserver agreement section of the study, the SCORE difference of each SCORE pair from the two observers was not related to the magnitude of the mean SCORE. The correlation coefficient (r = −0.1) was close to 0 and hence the differences were not correlated with the mean of each pair. The mean difference was found to be −0.2 (95% CI −0.5 to +0.07). The repeatability coefficient was 3.0 and the kappa statistic was 0.45 (indicates good agreement).


Our data suggest that, in a hospital setting, the use of the HRT with SCORE may be potentially useful to screen for asymptomatic diabetic macular oedema.

The main strengths of using SCORE in screening are its high specificity and relative ease of performance. In our previously published report the volume above reference plane (VARP) was used to identify CSMO. Using the suggested cut off points of the VARP index at 1.8, the sensitivity to detect CSMO (effectively eyes positive for oedema) per eye was 58% (95% CI: 38%–78%). The specificity, however, was 75% (66–84%), and the predictive values of a positive test and a negative test were 39% (23–59%) and 87% (79–94%), indicating a high rate of false positive results. In the same study, when reflectivity images and pseudo three dimensional maps were evaluated subjectively in 41 patients for CSMO, the sensitivity was 100% and predictive value of the negative test was 91% (71–100%); however, both the specificity and the positive predictive value were only 37% (20–56%). The corresponding figures using SCORE in this study represent a significant improvement.

Altering the cut off value to “a positive screen equates with a SCORE of 3 or less” increases the sensitivity to 87.5% but reduces the specificity to 84.3% when pooled eyes were examined. This indicates that increasing the sensitivity cannot be achieved without a significant loss of specificity and that our original point on the receiver operating curve (SCORE of 2 or less equals a positive screen) is the most appropriate.

The differences between the two SCORE observers were not significantly different from zero at the level of p=0.05, as illustrated by the confidence interval. The measured coefficient of repeatability may reflect the ability of SCORE to detect finer levels of differences between images.

In theory the colour maps used in this study lie within a circle of 2 mm diameter and may miss cases of significant oedema. In practice, SCORE is a composite grade also taking into account the features of the reflectivity images outside the above circle.

Essential to our system of oedema detection are good quality scan images. The scan quality of images with the HRT may vary with experience as at present there are no formal training courses for HRT operators. Furthermore, experience with using SCORE may also influence the grading capability of assessors. We have found it quite demanding for an observer to assess SCORE on a large number of images in one sitting. Although we have not formally tested the following, one method to reduce observer errors would be to introduce “check images” which were identical to the standard images. Failure to grade these “check images” correctly would strongly suggest assessor fatigue. We do not recommend assessing too many images in one sitting. Assessment of less than 60 scans in one session would be reasonable.

This study involved patients from a diabetic medical clinic, and therefore our results may not be applicable to diabetic patients from the general population, where a lower incidence of true positives might be found. For example, diabetic patients under the care of only general practitioners may have less severe systemic disease and a lower incidence of retinopathy and maculopathy. In addition, patients in the hospital may be better at keeping still and maintaining fixation and therefore be more likely to provide a well centred scan.

The detection of CSMO is adopted by many clinicians as the indication to perform laser treatment. A strength of this study was the use of a clinically accepted gold standard for the identification of treatable disease—that is, stereo biomicroscopy of the macula. Current screening for maculopathy by non-ophthalmologists relies largely on the presence of macular exudates and haemorrhages rather than macular oedema but treatment for maculopathy might not be necessary in the presence of macular exudates. Moreover, about 20% of maculopathy presents as retinal thickening as the sole manifestation without exudates. Identification of oedema rather than the signs often associated with oedema could therefore be considered advantageous when screening for treatable maculopathy from diabetes.

In addition, treatment in cases of early disease (that is, patients with good vision) is more effective because the treated patients tended to retain good vision years later. Hence it is important to pick up macular oedema early so that it can be assessed and treated.

Currently, stereoscopic methods of examination such as slit lamp biomicroscopy or stereo pair photographs are not widely used in screening systems. In previously reported findings on screening systems:


The precise method of ophthalmoscopy was not stated in an optometric study. It was possible that some optometrists used direct ophthalmoscopy whereas other might have used stereoscopic slit lamp systems or a combination of these, depending on available facilities or their own competence. The sensitivity was 77% for detecting “moderate to severe maculopathy”.


Many reports on ophthalmoscopy did not compare their maculopathy findings with a gold standard examination—that is, contact lens examination with slit lamp biomicroscopy.


The results of some studies on ophthalmoscopy would not apply to screening in a larger population because of the extensive clinical experience of the screener (registrar in ophthalmology, or doctor with special interest in diabetes and previous training in eye clinics).


No specific figures for ability to detect maculopathy were reported in some studies.


In one study, the sensitivity of photography with mydriasis was 61% for detecting sight threatening maculopathy, and the specificity was 99%. A significant proportion of cases were ungradable in some photographic studies. In the Liverpool study ungradable cases were the result of media opacity in 9% and for other reasons in 2%.

 The cost effectiveness of our technique also needs to be evaluated before considering introducing the HRT in a screening programme. As the SCORE system is currently relatively labour intensive, it would be helpful if future HRT software packages could incorporate SCORE in a semiautomated fashion to enable the HRT operator to refer eyes failing the test directly to an ophthalmologist.

The high specificity supports the possible role of the HRT in screening a primary care population, where one would expect the prevalence of CSMO to be lower than that found in our study. Is the level of sensitivity acceptable for screening? There are two important features of this study which may be relevant to this discussion. Firstly, the eight “positive” patients in the group from the diabetic clinic had been screened and found to be negative by the diabetic physicians (by direct ophthalmoscopy). Secondly, our definition of a true positive relied upon a contact lens examination of the macula which would be expected to be more sensitive than the methods used in other screening studies for the detection of early retinal thickening (this could also go some way towards explaining the previous observation). Thus we consider that the sensitivity found in our study should not be considered too low to countenance continuing our investigation of the efficacy of SCORE. Indeed, it is possible that some of the cases missed by SCORE would not progress significantly between screening intervals and might be detected at the next screen. Only a randomised controlled trial of different screening modalities over time against an acceptable gold standard will identify if this is the case.


Proprietary interests: none.

Financial support: none.



Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.