Article Text

other Versions

Download PDFPDF
Telemedical Diagnosis of Retinopathy of Prematurity: Accuracy of Expert vs. Non-expert Graders
  1. Steven L Williams1,
  2. Lu Wang1,
  3. Steven A Kane1,
  4. Thomas C Lee2,
  5. David J Weissgold3,
  6. Audina M Berrocal4,
  7. Daniel Rabinowitz5,
  8. Justin Starren6,
  9. John T Flynn1,
  10. Michael F Chiang1,*
  1. 1 Columbia University College of Physicians and Surgeons, United States;
  2. 2 Childrens Hospital Los Angeles, United States;
  3. 3 Retina Center of Vermont, United States;
  4. 4 Bascom Palmer Eye Institute, United States;
  5. 5 Columbia University, United States;
  6. 6 Marshfield Clinic, United States
  1. Correspondence to: Michael F. Chiang, Columbia University College of Physicians and Surgeons, 635 West 165th Street, Box 92, Columbia University Medical Center, New York, 10032, United States; chiang{at}


Background/Aims: To assess accuracy of telemedical retinopathy of prematurity (ROP) diagnosis by trained non-expert graders compared to expert graders.

Methods: 248 eye examinations from 67 consecutive infants were captured using wide-angle retinal photography (RetCam-II, Clarity Medical Systems, Pleasanton, CA). Non-expert graders attended two hour-long training sessions on image-based ROP diagnosis. Using a web-based telemedicine system, 14 non-expert and 3 expert graders provided a diagnosis for each eye: no ROP, mild ROP, type-2 prethreshold ROP, or treatment-requiring ROP. All diagnoses were compared to a reference standard of dilated indirect ophthalmoscopy by an experienced pediatric ophthalmologist.

Results: For detection of type-2 or worse ROP, the mean (range) sensitivities and specificities were 0.95 (0.94-0.97) and 0.93 (0.91-0.96) for experts, 0.87 (0.71-0.97) and 0.73 (0.39-0.95) for resident non-experts, and 0.73 (0.41-0.88) and 0.91 (0.84-0.96) for student non-experts. For detection of treatment-requiring ROP, the mean (range) sensitivities and specificities were 1.00 (1.00-1.00) and 0.93 (0.88-0.96) for experts, 0.88 (0.50-1.00) and 0.84 (0.71-0.98) for resident non-experts, and 0.82 (0.42-1.00) and 0.92 (0.83-0.97) for student non-experts.

Conclusions: Mean sensitivity and specificity of trained non-experts are lower than that of experts, although several non-experts had high accuracy. Development of methods for training non-expert graders may help support telemedical ROP evaluation.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Linked Articles

  • At a glance
    Harminder S Dua Arun D Singh