Table 1

Results of the artificial intelligence (AI) models and the second human grader evaluated on the test set for (A) colour fundus (CF) and (B) fluorescein angiography (FA)

AccuracyPrecisionRecallF1-scoreAUC-ROCAUC-PRC
(A) CF
Contrast*
 Manual0.7770.5530.9380.6960.8280.791
 DL0.8520.6530.9770.7830.9560.903
Focus*
 Manual0.8160.6600.9820.7890.8520.849
 DL0.9050.7940.9860.8800.9740.959
Illumination
 Manual0.8470.9540.3670.5300.6840.748
 DL0.6560.4000.9210.5580.8650.737
Shadow and reflection
 Manual0.7050.4900.9400.6440.7680.732
 DL0.7310.5170.8060.6300.8540.751
Overall quality
 Manual0.9040.8490.9580.9000.9120.928
 Deep Learning AI model0.9190.9370.8790.9070.9630.966
Average*
 Manual0.7710.5900.9460.7170.8190.796
 Deep Learning AI model0.8130.6600.9140.7510.9220.863
(B) FA
Contrast*
 Manual0.6510.4290.9730.5960.7450.709
 Deep Learning AI model0.7750.5490.8400.6640.8820.717
Focus*
 Manual0.6720.5310.9170.6730.7180.748
 Deep Learning AI model0.8180.7470.7620.7550.8800.802
Noise
 Manual0.7730.4300.8460.5700.8030.670
 Deep Learning AI model0.7000.3540.8390.4980.8730.722
Overall quality*
 Manual0.7800.6641.0000.7980.7950.839
 Deep Learning AI model0.8300.7550.9030.8220.9180.889
Average*
 Manual0.7190.5130.9340.6590.7650.742
 Deep Learning AI model0.7810.6010.8360.6850.8880.782
  • Accuracy, precision, recall, F1-score, AUC-ROC and AUC-PRC have been calculated for each category. In addition, the average over all categories is provided. Statistical significant differences between the AI model and human grader results are indicated with an asterix.

  • AUC-PRC, area under the precision recall curve; AUC-ROC, area under the receiving operator characteristic curve.