Article Text

Download PDFPDF
Anterior segment biometric measurements explain misclassifications by a deep learning classifier for detecting gonioscopic angle closure
  1. Alice Shen1,
  2. Michael Chiang1,
  3. Anmol A Pardeshi1,
  4. Roberta McKean-Cowdin2,
  5. Rohit Varma3,
  6. Benjamin Y Xu1
  1. 1Department of Ophthalmology, USC Keck School of Medicine, Los Angeles, California, USA
  2. 2Department of Preventive Medicine, USC Keck School of Medicine, Los Angeles, California, USA
  3. 3Southern California Eye Institute, CHA Hollywood Presbyterian Medical Center, Los Angeles, California, USA
  1. Correspondence to Dr Benjamin Y Xu, Department of Ophthalmology, USC Keck School of Medicine, Los Angeles, CA 90033, USA; Benjamin.Xu{at}med.usc.edu

Abstract

Background/aims To identify biometric parameters that explain misclassifications by a deep learning classifier for detecting gonioscopic angle closure in anterior segment optical coherence tomography (AS-OCT) images.

Methods Chinese American Eye Study (CHES) participants underwent gonioscopy and AS-OCT of each angle quadrant. A subset of CHES AS-OCT images were analysed using a deep learning classifier to detect positive angle closure based on manual gonioscopy by a reference human examiner. Parameter measurements were compared between four prediction classes: true positives (TPs), true negatives (TNs), false positives (FPs) and false negatives (FN). Logistic regression models were developed to differentiate between true and false predictions. Performance was assessed using area under the receiver operating curve (AUC) and classifier accuracy metrics.

Results 584 images from 127 participants were analysed, yielding 271 TPs, 224 TNs, 77 FPs and 12 FNs. Parameter measurements differed (p<0.001) between prediction classes among anterior segment parameters, including iris curvature (IC) and lens vault (LV), and angle parameters, including angle opening distance (AOD). FP resembled TP more than FN and TN in terms of anterior segment parameters (steeper IC and higher LV), but resembled TN more than TP and FN in terms of angle parameters (wider AOD). Models for detecting FP (AUC=0.752) and FN (AUC=0.838) improved classifier accuracy from 84.8% to 89.0%.

Conclusions Misclassifications by an OCT-based deep learning classifier for detecting gonioscopic angle closure are explained by disagreement between anterior segment and angle parameters. This finding could be used to improve classifier performance and highlights differences between gonioscopic and AS-OCT definitions of angle closure.

  • angle
  • diagnostic tests/investigation
  • glaucoma
  • imaging

Data availability statement

Data are available upon reasonable request. The data that support the findings of this study are available from the corresponding author (BYX), upon reasonable request.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Primary angle closure glaucoma (PACG) is the most severe form of primary angle closure disease (PACD), a spectrum of disease that is characterised by apposition between the iris and trabecular meshwork (TM).1 Although PACG comprises only a quarter of cases of primary glaucoma worldwide, it is responsible for approximately half of all cases of glaucoma-related blindness.2 The risk of progression from early PACD to PACG and vision loss can be decreased by treatments, such as laser peripheral iridotomy (LPI) and lens extraction; however, PACD must first be detected.3–5 The current clinical standard for diagnosing PACD is identifying angle closure, defined as inability to visualise the pigmented TM, on gonioscopy. However, manual gonioscopy is subjective and dependent on examiner expertise, leading to only fair intraobserver and interobserver reproducibility even among experienced examiners.6 7

In recent years, anterior segment optical coherence tomography (AS-OCT) has played an increasing role in clinical evaluations of the anterior chamber angle. AS-OCT is a non-contact, in vivo imaging method that acquires cross-sectional images of anterior segment anatomical structures by measuring their optical reflections.8 A previously described deep learning classifier automates analysis of AS-OCT images to detect eyes with gonioscopic angle closure and PACD.9 This method of ‘automated gonioscopy’ could serve as a clinical screening tool to detect patients at risk for PACG. However, factors that reduce the accuracy of the classifier, and potentially that of other deep learning algorithms for detecting angle closure, have not be studied.10

In order to optimise the accuracy and clinical utility of this deep learning classifier, there is a need to understand why the angles in some AS-OCT images are misclassified as open or closed relative to reference diagnoses by a human examiner performing manual gonioscopy. Quantitative analysis of AS-OCT images can be performed using highly reproducible measurements of biometric parameters that represent the dimensions and configurations of anterior segment anatomical structures.11 12 In this study, we perform quantitative analysis of AS-OCT measurements to identify biometric differences between correctly and incorrectly classified images. We also develop statistical models using biometric measurements to assess if misclassifications can be reclassified to improve the accuracy of the original deep learning classifier for detecting gonioscopic angle closure.9

Methods

Participants were recruited from the Chinese American Eye Study (CHES), a population-based, cross-sectional study that included 4572 Chinese participants aged 50 years and older residing in the city of Monterey Park, California.13 Inclusion criteria for the study included CHES participants whose gonioscopy grades and AS-OCT images were previously used in the test dataset to validate a deep learning classifier to detect gonioscopic angle closure based on AS-OCT images.14 Exclusion criteria included images that were missing AS-OCT measurements for one or more parameters.

Clinical assessment

Each CHES participant received a complete eye examination by a trained ophthalmologist including gonioscopy and AS-OCT imaging.13 Gonioscopy was performed in the seated position under dark ambient lighting (0.1 cd/m2) with a 1 mm light beam and a Posner-type 4-mirror lens (Model ODPSG; Ocular Instruments, Inc, Bellevue, Washington, USA) by one of two trained ophthalmologists (DW, CLG) masked to other examination findings. One ophthalmologist (DW) performed the majority (over 90%) of gonioscopic examinations. Care was taken to avoid light falling on the pupil and inadvertent indentation of the globe. The gonioscopy lens could be tilted up to 10 degrees. The angle was graded in each quadrant according to the modified Shaffer classification system: grade 0, no structures visible; grade 1, non-pigmented TM visible; grade 2; pigmented TM visible; grade 3, scleral spur visible; grade 4, ciliary body visible. Average gonioscopy score was calculated by averaging the numerical values of the Shaffer grades. Gonioscopy grades were dichotomised into two categories: gonioscopic angle closure (grade 0 or 1), in which pigmented TM could not be visualised, and gonioscopic open angle (grades 2–4).

AS-OCT imaging and data analysis

AS-OCT imaging in CHES was performed in the seated position under dark ambient lighting (0.1 cd/m2) after gonioscopy and prior to pupillary dilation by a single trained ophthalmologist (DW) with the Tomey CASIA SS-1000 swept-source Fourier-domain device (Tomey Corporation, Nagoya, Japan). A total of 128 two-dimensional cross-sectional AS-OCT images were acquired per eye. During imaging, the eyelids were gently retracted taking care to avoid inadvertent pressure on the globe.

AS-OCT data were imported into the Tomey SS OCT Viewer software (V.3.0, Tomey Corporation, Nagoya, Japan) which automatically segmented anatomical structures and produced biometric measurements once the scleral spurs were marked. One observer (AAP) masked to the identities and examination results of the participants confirmed the segmentation and marked the scleral spurs in each image.15 Two images oriented along the horizontal (temporal-nasal) and vertical meridians were analysed per eye to measure biometric parameters describing the anterior chamber angle and anterior segment in each quadrant.

The following biometric parameters describing the anterior chamber angle were analysed in each image: angle opening distance (AOD500/750), trabecular iris space area (TISA500/750) and scleral spur angle (SSA500/750) measured at 500 μm and 750 μm from the scleral spur.16 The following biometric parameters describing the anterior segment and its anatomical structures were also analysed in each image: iris area (IA), iris curvature (IC), anterior chamber depth (ACD), lens vault (LV), anterior chamber width (ACW), pupillary diameter (PD) and anterior chamber area (ACA). IA was defined as the cross-sectional area of the full length of the iris. ACD was defined as the distance from the apex of the anterior lens surface to the apex of the corneal endothelium. IC was defined as the distance from the apex of the iris convexity to a line extending from the peripheral to central iris pigment epithelium. ACW was defined as the distance between scleral spurs. ACA was defined as the space bounded by the corneal endothelium, anterior iris surface and anterior lens capsule. Iris thickness measured 750 μm and 2000 μm (IT750/2000) from the scleral spur were not included as biometric parameters due to large numbers of missing values.

Intraobserver repeatability of averaged parameter measurements was calculated in the form of intraclass correlation coefficients (ICCs) based on images from 20 open angle and 20 angle closure eyes graded 3 months apart. ICC values reflected excellent measurement repeatability for all parameters, ranging between 0.89 (TISA750) to 0.98 (ACA). This analysis was performed using MATLAB (Mathworks, Natick, Massachusetts, USA).

Prediction classes

Each AS-OCT image was categorised into one of four prediction classes based on the relationship between angle status assigned by an ophthalmologist performing manual gonioscopy and angle status predicted by the previously described deep learning classifier. True positive (TP) was defined as an image with predicted angle closure in an angle quadrant with angle closure (grades 0 and 1) based on manual gonioscopy. False positive (FP), or an overcall, was defined as an image with predicted angle closure in an angle quadrant with open angle (grades 2–4) based on manual gonioscopy. True negative (TN) was defined as an image with predicted open angle in an angle quadrant with open angle based on manual gonioscopy. False negative (FN), or an undercall, was defined as an image with predicted open angle in an angle quadrant with angle closure based on manual gonioscopy.

Statistical analysis

Normality testing was performed on biometric measurements using the Kolmogorov-Smirnov test. Median biometric measurements and their IQRs were calculated for all parameters due to non-normal distributions. Median biometric measurements were compared between prediction classes using the Kruskal-Wallis test while adjusting for age and sex. Pairwise comparisons of median biometric measurements between prediction classes were performed using the post-hoc Dunn’s test, adjusted for multiple comparisons.

A multivariable logistic regression model classifying between TP and FP was developed using the best subsets variable selection method. The final model was selected using a combination of adjusted R2, Mallow’s CP and Bayesian information criterion(BIC) statistics. A univariable logistic regression model classifying between TN and FN was developed by selecting the biometric parameter with the highest adjusted R2 statistic. The number of independent variables in this model was limited by the small number of FN in the dataset (n=12). Area under the receiver operating curve (AUC) was used to assess model performance. The models were used to reclassify all positive and negative predictions as TP or FP and TN or FN, respectively, using a default probability threshold of 0.5. All analyses were performed using the R programming interface (V.4.0.2). Analyses were conducted using a significance level of 0.05.

Results

The independent test dataset used to test the original deep learning classifier was comprised of 640 AS-OCT images from 127 CHES participants. Fifty-six (8.8%) images from 12 participants were excluded due to missing AS-OCT measurements for one or more biometric parameters, with 38 (5.9%) images missing measurements of angle width parameters due to poorly defined scleral spurs. A total of 584 images from 115 participants with corresponding gonioscopy grades and AS-OCT measurements were included in this study. A total of 283 images corresponded with gonioscopic angle closure and 301 corresponded with gonioscopic open angle. The mean gonioscopy grade was 1.5±1.1 on the modified Shaffer grading scale. The mean age of all participants was 62.1±9.0 years (range: 50–86 years). Seventy-five (65.2%) participants were female, and 40 (34.8%) participants were male. The mean intraocular pressure (IOP) was 15.75±3.4 mm Hg, mean refractive error was −0.3±3.4 D spherical equivalent and mean axial length was 23.5±1.4 mm.

Among 584 total images, 271 (46.4%) images with gonioscopic angle closure and 224 (38.4%) images with gonioscopic open angle were correctly predicted by the deep learning classifier. Seventy-seven (13.2%) images with gonioscopic open angle were overcalled as closed (FP) by the deep learning classifier (figure 1A,B). Twelve (2.1%) images with gonioscopic angle closure were undercalled as open (FN) by the deep learning classifier (figure 1C,D). The deep learning classifier had an overall accuracy of 84.8%, with a sensitivity of 95.8% and specificity of 74.4%.

Figure 1

Representative misclassified anterior segment optical coherence tomography images. (A, B) Representative false positive (overcall) images incorrectly classified as gonioscopic angle closure. (C, D) Representative false negative (undercall) images incorrectly classified as gonioscopic open angle.

There were significant differences (p<0.001) in median parameter measurements between prediction classes among all anterior chamber angle parameters (AOD, TISA and SSA). For these parameters, the order of median parameter measurements from smallest to largest was consistent: TP, FN, FP and TN (table 1). Significance of pairwise comparisons varied, but there was a consistent significant difference (p<0.008) between TP and FN (table 1).

Table 1

Analysis of variance test on ranks comparing median biometric measurements of anterior chamber angle and anterior segment parameters

There were significant differences (p<0.036) in median parameter measurements between prediction classes among all anterior segment parameters except PD (p=0.88). For some anterior segment parameters (IA, ACD, ACW, ACA), the order of median parameter measurements from smallest to largest was consistent, but differed from the angle parameters: TP, FP, FN and TN. This order was the same as for other anterior segment parameters (IC, LV) when median parameter measurements were ordered from largest to smallest. Significance of pairwise comparisons varied, but there was a consistent significant difference (p<0.008) between TP and FN (table 1).

A univariable logistic regression model with AOD500 produced the best performance in discriminating between TN and FN (AUC=0.837, figure 2A). There was a significant association (p<0.001) between these prediction classes and AOD500.

Figure 2

Receiver operating characteristic curves of (A) a univariable logistic regression model based on AOD500 for differentiating between true and false negatives (AUC=0.837) and (B) a multivariable logistic regression model based on AOD500, anterior chamber depth, lens vault and iris area for differentiating between true and false positives (AUC=0.752). AOD, angle opening distance; AUC, area under the receiver operating curve; FN, false negative; FP, false positive.

A multivariable logistic regression model with AOD500, ACD, LV and IA produced the best performance in discriminating between TP and FP (AUC=0.752, figure 2B). The association with prediction classes was significant (p<0.03) for AOD500, ACD, and IA and borderline significant (p=0.07) for LV.

Positive and negative predictions were reclassified as TPs or FPs and TNs or FNs, respectively, using the logistic regression models and a probability threshold of 0.5. Among the 348 images predicted to closed by the deep learning classifier, 264 were correctly identified as TPs, 25 were correctly identified as FPs, 52 were incorrectly identified as TPs (remaining as FP) and 7 were incorrectly identified as FPs (becoming FN) by the logistic regression model. Among the 236 images predicted as open by the deep learning classifier, all were identified as TNs by the logistic regression model, which preserved the 224 TNs and 12 FNs by the deep learning classifier. Factoring in the revised predictions, the classifier and logistic regression models produced 296 TPs, 45 FPs, 19 FNs and 224 TNs, which corresponded with an overall accuracy of 89.0%, sensitivity of 94.0% and specificity of 83.3% (figure 3).

Figure 3

Confusion matrices of (A) original deep learning classifier predictions and (B) deep learning classifier predictions adjusted using logistic regression models.

Discussion

Deep learning algorithms provide powerful tools for analysing ocular images and detecting a wide range of ophthalmologic disease, including diabetic retinopathy, age-related macular degeneration and glaucomatous optic neuropathy.17–19 However, these algorithms also often function as black boxes, providing little information about why images are correctly or incorrectly classified. This study takes advantage of established methods for quantitative analysis of AS-OCT images to obtain insights into how specific biometric parameters contribute to diagnostic disagreement between manual gonioscopy and a deep learning classifier for detecting gonioscopic angle closure in AS-OCT images. We identified consistent differences and patterns when comparing biometric measurements across positive and negative prediction classes. We then used these differences and patterns to develop statistical models that improved the accuracy of the deep learning classifier. We believe this work advances our understanding of disagreements between gonioscopic and AS-OCT assessments of the anterior chamber angle and enhances the utility of deep learning classifiers for detecting eyes at risk for PACG.

Angle closure is detected on manual gonioscopy when an examiner cannot visualise the pigmented TM. This is more likely to occur in eyes with steep IC and higher LV, which have been identified as primary causes of diagnostic disagreement between angle closure detected on gonioscopy but not AS-OCT.20 Our quantitative analyses demonstrated that AS-OCT measurements from FP (overcall) images were more similar to TP images than FN and TN images in terms of anterior segment parameters, including steeper IC, higher LV, shallower ACD and narrower ACW. However, FP images were more similar to TN than TP and FN images among angle parameters, including wider AOD750. These findings support the relative importance of iris convexity and lens position over angle width for predicting gonioscopic angle closure by the classifier. In addition, these overcalls are consistent with anatomical configurations typically seen in eyes that require examiners to perform dynamic gonioscopy techniques, such as tilting of the goniolens, to assess the angle over the convexity of the lens and iris. These configurations represent a small minority of open angles in the original training dataset used to develop the classifier, which could explain why the classifier does not appear to simulate lens tilting.

Our quantitative analyses demonstrated that FN (undercall) images tended to more closely resemble TN images than TP and FP images in terms of anterior segment parameters, including shallower IC, deeper ACD, wider ACW and lower LV. However, FNs were more similar to TP than TN and FP images in terms of angle parameters, including narrower AOD750. These incorrect predictions of gonioscopic open angle are consistent with anatomical variants of angle closure, such as plateau iris, that have deeper anterior chambers and flatter iris configurations more commonly found in eyes with open angles. These misclassifications also suggest that the classifier does not function by detecting the pigmented TM, which is not discernable in most AS-OCT images. Finally, decreased angle width and iridotrabecular contact (ITC) detected on AS-OCT are more commonly missed on gonioscopy in eyes with deeper ACD, which again supports the relative importance of anterior segment over angle parameters in detecting angle closure on gonioscopy.20

Our findings highlight discrepancies between gonioscopy and AS-OCT in the definitions of angle closure. While angle closure is broadly defined as apposition between the iris and TM, angle closure on gonioscopy is specifically defined as inability to visualise the pigmented TM whereas angle closure on AS-OCT is specifically defined as ITC anterior to the scleral spur. Our classifier, developed to simulate a human examiner performing gonioscopy using AS-OCT images, appears to make predictions by prioritising anterior chamber dimensions and iris configurations that impair visualisation of the pigmented TM over direct measurements of angle width. This observation is consistent with previous findings that gonioscopy grades are only weakly associated with AS-OCT measurements of angle width in eyes with gonioscopic angle closure.21 22 While outside the scope of the current study, it would be beneficial to investigate if misclassifications are more common when angle and iris configurations are consistent with specific subtypes of ITC.23

We developed logistic regression models to identify anatomical factors associated with misclassifications and assess if these factors could be used to reclassify predictions and improve overall classifier accuracy. Models differentiating between TP and FP or TN and FN were both based on measurements of angle width (AOD500), although the model differentiating between TP and FP also included ACD, LV and IA. These models improved classifier accuracy from 84.8% to 89.0% and specificity from 74.4% to 83.3% with a small decrease in sensitivity from 95.8% to 94.0%. We speculate that performing this secondary quantitative analysis may simulate dynamic tilting of the goniolens in eyes with apparent angle closure and prominent iris convexity. This approach could enhance the viability of the deep learning classifier as a diagnostic tool by substantially decreasing the number of FP while only modestly increasing the number of FN. However, significant time and effort must be expended to perform quantitative analysis of AS-OCT images. Therefore, there is need for fully automated methods to obtain biometric measurements from AS-OCT images before secondary quantitative analysis to optimise deep learning classifier performance is practical.14

Although gonioscopy is the current clinical standard for evaluating the anterior chamber angle and is used as the reference standard for training and evaluating the performance of the deep learning classifier, it has its own limitations. First, gonioscopy alone is poorly predictive of which primary angle closure suspects will progress to primary angle closure (PAC) or PACG.5 Therefore, devising a strategy to detect all patients with gonioscopic angle closure, no matter how convenient or accurate, will still yield a majority of patients who will not develop PACG. However, automated angle assessment methods may still be valuable as gonioscopy tends to be underperformed and undetected PACG can lead to significant ocular morbidity.24 Second, AS-OCT measurements of angle width are more strongly associated with IOP in subsets of eyes with PACD than gonioscopy grades.14 16 AS-OCT also better predicts development of gonioscopic angle closure after 4 years and response to interventions like LPI.25 26 Therefore, there is a need for longitudinal studies to determine whether gonioscopy or AS-OCT is more predictive of clinical outcomes in patients with angle closure.

Our study has several limitations. First, there was a small sample size of FN due to the excellent sensitivity of the deep learning classifier in detecting gonioscopic angle closure. While we detected consistent differences between FN and other prediction classes, we may have missed others due to the small sample size of FN. Second, the classifier is based on gonioscopy performed primarily by one human examiner during CHES. Although gonioscopy is the current clinical standard for evaluating the anterior chamber angle, it is intrinsically limited by human subjectivity and interexaminer differences in technique. Therefore, the generalisability of ground truth labels and classifier performance should be tested using independent AS-OCT and gonioscopy data. Third, the classifier predicts angle status based on analysis of only one AS-OCT image per quadrant, which could miss localised areas of angle closure within a quadrant. Therefore, future iterations of the classifier may benefit from analysing multiple images per quadrant to improve its sensitivity. Finally, our logistic regression models were developed and tested using the same dataset, due to its relatively small size. This may limit the generalisability of these models, and improvements in classifier performance after prediction reclassification should be validated in a larger independent dataset.

In conclusion, biometric measurements help explain misclassifications by a deep learning classifier that detect gonioscopic angle closure in AS-OCT images. Misclassifications occur due to disagreement between biometric measurements of anterior chamber angle and anterior segment parameters, and secondary quantitative analysis of these measurements may be beneficial to reclassify predictions and improve overall performance. Future studies that confirm the generalisability of these findings and compare the clinical significance of different definitions of angle closure are needed to define the role of automated OCT-based methods for detecting patients with angle closure.

Data availability statement

Data are available upon reasonable request. The data that support the findings of this study are available from the corresponding author (BYX), upon reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

Ethics committee approval was previously obtained from the University of Southern California Medical Center Institutional Review Board. All study procedures adhered to the recommendations of the Declaration of Helsinki. All study participants provided informed consent.

References

Footnotes

  • Twitter @BenXuLab

  • Contributors AS, RM-C, RV and BYX conceived and designed this study. AS, MC, AAP, RM-C, RV and BYX contributed to the drafting of manuscript. AS, MC, AAP and BYX contributed to the data analysis and interpretation. All authors contributed to the critical appraisal and final approval of the manuscript. BYX provided the overall supervision of this work.

  • Funding This work was supported by grants U10 EY017337, P30 EY029220 and K23 EY029763 from the National Eye Institute, National Institute of Health, Bethesda, Maryland; a Young Clinician Scientist Research Award from the American Glaucoma Society (no grant number), San Francisco, California; a Grant-in-Aid Research Award from Fight for Sight (no grant number), New York, New York; a SC-CTSI Clinical and Community Research Award from the Southern California Clinical and Translational Science Institute (no grant number), Los Angeles, California; and an unrestricted grant to the Department of Ophthalmology from Research to Prevent Blindness (no grant number), New York, New York.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.