Article Text

Development and validation of a deep learning system to screen vision-threatening conditions in high myopia using optical coherence tomography images
  1. Yonghao Li1,
  2. Weibo Feng1,
  3. Xiujuan Zhao1,
  4. Bingqian Liu1,
  5. Yan Zhang1,
  6. Wei Chi1,
  7. Mingzhi Lu1,
  8. Jierong Lin1,
  9. Yantao Wei1,
  10. Jun Li1,
  11. Qi Zhang1,
  12. Yi Zhu2,
  13. Chuan Chen3,
  14. Lin Lu1,
  15. Lanqin Zhao1,
  16. Haotian Lin1,4
  1. 1 State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangzhou, China
  2. 2 Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine Miami, Miami, Florida, USA
  3. 3 Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, Florida, USA
  4. 4 Centre of Precision Medicine, Sun Yat-sen University, Guangzhou, China
  1. Correspondence to Professor Haotian Lin, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China; linht5{at}mail.sysu.edu.cn; Mrs Lanqin Zhao; zhaolq7{at}mail.sysu.edu.cn

Abstract

Background/aims To apply deep learning technology to develop an artificial intelligence (AI) system that can identify vision-threatening conditions in high myopia patients based on optical coherence tomography (OCT) macular images.

Methods In this cross-sectional, prospective study, a total of 5505 qualified OCT macular images obtained from 1048 high myopia patients admitted to Zhongshan Ophthalmic Centre (ZOC) from 2012 to 2017 were selected for the development of the AI system. The independent test dataset included 412 images obtained from 91 high myopia patients recruited at ZOC from January 2019 to May 2019. We adopted the InceptionResnetV2 architecture to train four independent convolutional neural network (CNN) models to identify the following four vision-threatening conditions in high myopia: retinoschisis, macular hole, retinal detachment and pathological myopic choroidal neovascularisation. Focal Loss was used to address class imbalance, and optimal operating thresholds were determined according to the Youden Index.

Results In the independent test dataset, the areas under the receiver operating characteristic curves were high for all conditions (0.961 to 0.999). Our AI system achieved sensitivities equal to or even better than those of retina specialists as well as high specificities (greater than 90%). Moreover, our AI system provided a transparent and interpretable diagnosis with heatmaps.

Conclusions We used OCT macular images for the development of CNN models to identify vision-threatening conditions in high myopia patients. Our models achieved reliable sensitivities and high specificities, comparable to those of retina specialists and may be applied for large-scale high myopia screening and patient follow-up.

  • retina
  • diagnostic tests/investigation
  • imaging

Data availability statement

Data are available on reasonable request. All data relevant to the study are included in the article or uploaded as online supplemental information. The main data supporting the results in this study are available within the paper and online supplemental information. The raw datasets are too large to be publicly shared, yet they are available for research purpose from the corresponding authors on resonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Myopia, a common form of visual impairment worldwide, is tremendously prevalent among Asian adolescents and young adults.1 The total number of individuals with myopia in China has exceeded six hundred million, representing a proper ‘myopia boom’.1–4 Furthermore, 21.9% of these patients have high myopia (refractive error ≤ −6 diopters).2 The gradual development of high myopia into posterior staphyloma and maculopathy results in a diagnosis of pathological myopia, which is one of the main causes of blindness in young adults.5–8 Macula are the most important retinal structures for vision accuracy. Among the wide variety of abnormalities in pathological myopia, maculopathy is the most dangerous and hazardous to visual accuracy. The contributors to pathological myopia retinopathy include traction, atrophy and neovascularisation. Retinoschisis, macular hole and retinal detachment are caused by traction, and macular atrophy is usually developed from pathological myopic choroidal neovascularisation (PMCNV). These are the most common kinds of maculopathy and vision-threatening conditions in pathological myopia and need to be promptly identified and treated.9–12 Moreover, affected patients usually must be followed up once every few months for life. Due to limited and imbalanced medical sources, ophthalmologists cannot easily screen all patients and identify those at a high risk who should receive early treatment.

Artificial intelligence (AI), especially deep learning, which has been widely applied in image classification,13 has sparked global interest in recent years. Due to its capability to analyse a tremendous amount of data, AI may provide a promising solution for the current myopia burden. Optical coherence tomography (OCT), one of the most commonly used devices in ophthalmology, is a method used for high-resolution retinal imaging, providing excellent training material for AI. As evidenced by De Fauw et al,14 Lu et al 15 and Lee et al,16 the AI models based on OCT images have achieved excellent performance in diagnosing retinal diseases, including diabetic retinopathy, macular oedema, retinal detachment and CNV in age-related macular degeneration. On OCT images, pathological myopia abnormalities such as traction lesions and PMCNV can be clearly revealed. However, the manifestations of pathological myopia observed on OCT are more complicated, and the value of AI implementation for screening common vision-threatening conditions in high myopia patients has not been fully explored.

In this study, we aimed to build an OCT screening tool based on an AI deep learning system to identify the four most common and dangerous vision-threatening conditions (retinoschisis, macular hole, retinal detachment and PMCNV) in high myopia patients. Subsequently, an independent test dataset from clinical work was used to evaluate our AI system and the AI system’s performance was compared with that of human ophthalmologists in Zhongshan Ophthalmic Centre (ZOC, Guangzhou, China).

Method

This study adhered to the tenets of the Declaration of Helsinki.17 Informed consent forms were signed by all participants, and any private information that may identify individuals was excluded.

Data source and labelling

In this study, OCT images from high myopia patients were retrospectively collected in ZOC from 2012 to 2017. Given the critical impact of macular structure on vision accuracy, the images used were horizontal and vertical slices through the fovea, which is the most widely used OCT scan-model in real clinical settings. Horizontal and vertical slices through the fovea are capable of detecting most of the dangerous vision-threatening conditions. Each high myopia eye was simultaneously submitted to horizontal and vertical OCT macular images, and multiple examinations performed on the same eyes were included as separate data if the interval between examinations exceeded 1 month. In the vast majority of cases, both horizontal and vertical line per eye were included for AI development, while sometimes only one line was chosen in a few cases due to the poor image quality or failure of crossing the fovea. The details of the inclusion and exclusion criteria are shown in online supplemental figure S1. The device that we used was a Heidelberg Spectral OCT (Heidelberg, German), and the scan length was 6 mm. Finally, 5505 qualified OCT images obtained in 1048 patients were selected for AI development. Subsequently, based on the patient’s code number, these images were divided into a training dataset (80% of the patients) for model development and a validation dataset (20% of the patients) for validating the models. To test the AI system in a real clinical setting, 412 images obtained in 206 examinations performed for 91 high myopia patients recruited in ZOC from January 2019 to June 2019 were selected as an independent dataset according to the same criteria listed in online supplemental figure S1. None of them were used in the training and validation datasets before.

Seven human doctors with different clinical experience levels, including four retina specialists and three attending ophthalmologists, were recruited as reviewers. According to the recognised diagnostic criteria9–12 (online supplemental figure S2), a standard diagnosis (ground truth) was defined for each image if the four retina specialists unanimously assigned it the same diseases. When dissent occurred, the final result was confirmed through a group discussion among the retina specialists and another senior expert. The diagnoses assigned by the three attending ophthalmologists were recorded only for the comparison of AI performance.

AI system development

To achieve the best performance for all four conditions, we used a convolutional neural network (CNN) architecture, named InceptionResnetV2,18 on a TensorFlow platform to construct four independent binary classifiers to identify retinoschisis, macular hole, retinal detachment and PMCNV. The output of each classifier ranged from 0 to 1, representing the probability of the presence of each vision-threatening condition. InceptionResNetV2 has been verified to achieve better performance than most main CNN architectures, such as Inception and Resnet, in the ImageNet classification challenge.19 Transfer learning was also adopted since this method can improve the performance of an AI system based on limited biomedical images.20 Transfer learning indicated that only the weights for fully connected layers were updated on our training dataset, and the other weights were initialised with weights pretrained on ImageNet and frozen during the training process. In addition, the Focal Loss21 was used as a loss function to address the class imbalance between positive and negative classes during training for macular hole, retinal detachment and PMCNV detection. The Focal Loss has mainly been used for object detection to address the problem of extreme imbalance between foreground and background classes during training, but can also be used to improve the performance of models for image classification with imbalanced data.22

In addition, to improve the performance of the AI models, raw OCT images were preprocessed. First, the region of interest (without the fundus photograph) in an image was cropped out. Then, the image was normalised to the same size, resolution and background colour (details are shown in online supplemental figure S3). Furthermore, ImageDataGenerator class from Keras was used to randomly augment images in the training dataset to improve the robustness of the AI models. It generated the same number of new variations of the original images in the training dataset at each epoch, and then these augmented images were used for training a model. Therefore, the number of epoch was the times the images were augmented. The set of transformations included rotation, zooming, horizontal flipping, height and width shift. On the validation dataset, hyperparameters were optimised, and the optimal operating thresholds were selected. The optimal models with the highest areas under the receiver operating characteristic (ROC) curves (AUCs) for each vision-threatening condition were selected as the final models, and ROC curve analysis was conducted subsequently to determine the optimal operating or classification thresholds. In this study, the Youden Index23 was used as the criterion for the thresholds. More details about the AI training process are shown in the (online supplemental text).

Evaluation of the AI system

Once the AI system was established, each image was independently subjected to four rounds of categorisation. The workflow of our AI system is shown in figure 1. In the test dataset, the probabilities produced by the models for each image were transformed to the probability for each eye as the system output since diagnoses are determined at the eye level in clinical practice. This process is very similar to the decision-making process used by human doctors: doctors look for lesions on horizontal and vertical images of an eye, and the eye is then diagnosed as positive if at least one of the two images shows a vision-threatening condition. The performance of the AI system at the eye level was assessed using the following performance metrics: sensitivity, specificity, the AUCs of ROC curves and heatmaps. Gradient-weighted class activation maps24 were adopted to draw heatmaps for the images in the test dataset that were diagnosed as positive (a probability greater than the threshold) by the AI system. Moreover, a comparison was made between the AI system and our seven reviewers (four retinal specialists and three attending ophthalmologists) using the ROC curve.

Figure 1

Workflow of our AI system. Vertical and horizontal macular OCT images from a high myopia eye were independently subjected to the AI system as the input. After four rounds of categorisation, the positive diagnoses and corresponding heat maps were given as the output. Illustrated by Feng. AI, artificial intelligence; OCT, optical coherence tomography; PMCNV, pathological myopic choroidal neovascularisation.

Statistical analysis

ROC curves were analysed and plotted with the python packages of matplotlib 2.2.3 and scikit-learn 0.19.2. The AUCs of ROC curves, sensitivity and specificity were used to assess the performance of the AI models and the 95% CIs represented the Wilson Score intervals for sensitivity, specificity and Delong intervals for the AUCs, which were calculated with the R packages of Hmisc_4.2–0 and pRoc_1.15.3.

Results

Summary of the labels in the datasets

The 5505 images obtained from 1048 patients (2064 eyes) for AI system development consisted of 2178 (39.5%) images showing retinoschisis, 711 (12.9%) images showing macular holes, 839 (15.2%) images showing retinal detachment, 470 (8.5%) images showing PMCNV and 2761 (50.1%) images showing negative for all four labels. This dataset was divided into training dataset (4338 images, 80%) for each model training, and validation dataset (1681 images, 20%) for parameter adjustment in each model. In the independent test dataset (412 images obtained from 206 examinations in 174 eyes of 91 patients), diagnoses were assigned at the eye level for each examination based on horizontal and vertical images simultaneously, since diagnoses are determined at the eye level in real clinical practice. Multiple examinations were performed on 29 eyes (16.7%) in test dataset and two eyes developed retinal detachment during the follow-up period. The details of the labels in the training, validation and test datasets, and other basic information are shown in table 1.

Table 1

Details of the training, validation and test datasets

With respect to the cases of comorbidity, 1224 images for AI development and 60 examinations in the test dataset were labelled with more than one vision-threatening condition. The details of the comorbidities are shown in online supplemental table S1.

The performance of the AI models

In the early phase of AI training, we found that the imbalance between positive and negative classes affected the training result. To address this problem, the Focal Loss was applied to our AI training and this substantially improved its accuracy. After a large number of training experiments, the best models with the highest AUCs in the validation dataset were selected. The Youden index is a commonly used criterion for diagnostic tests, and the optimal threshold corresponds to the point on the ROC curve that is farthest from the equality line (ie, where the sum of sensitivity and specificity is maximal). The thresholds that we finally selected and the performance for each condition in the validation dataset are shown in online supplemental table S2.

Table 2

Performance of the AI system on the test dataset

For the independent test dataset, our AI system achieved AUCs ranging from 96.1% to 99.9% for all vision-threatening conditions at the eye level. Using the optimal operating thresholds determined with the Youden Index, we also measured the sensitivities and specificities for each condition. Generally, our AI system achieved excellent performance with consistently high sensitivities and specificities (greater than 90%). The details are presented in table 2.

Comparison of the AI models with clinical ophthalmologists

To further evaluate the performance of the developed AI system, the test dataset was labelled at the eye level by our seven reviewers, including four retina specialists and three attending ophthalmologists. Subsequently, the performance of the AI system was compared with that of the reviewers using ROC curve plots (figure 2). The plots demonstrated that our AI system achieved equal or even better sensitivities than the retina specialists and high specificities (greater than 90%).

Figure 2

Comparison of the AI system and ophthalmologists using ROC curves. (A) The performance of the AI system and ophthalmologists for retinoschisis. (B) The performance of the AI system and ophthalmologists for macular hole. (C) The performance of the AI system and ophthalmologists for retinal detachment. (D) The performance of the AI system and ophthalmologists for pathological myopic choroidal neovascularisation (PMCNV). AI, artificial intelligence; AUC, area under the curve; ROC, receiver operating characteristic.

Heatmaps

The AI system provided the probability that each eye had a vision-threatening condition and generated heatmaps based on OCT images for eyes diagnosed as positive. Heatmaps revealed and confirmed that the system produced a diagnosis using accurate distinguishing features and the regions or lesions most relevant to the diagnosis in the image. The maps demonstrated that our system made the diagnosis based on the correct lesion for diagnosis (figure 3). For the two eyes that developed retinal detachment during the follow-up period in the test dataset, our AI system successfully detected the pathological changes and precisely identified the distinguishing features using heatmaps (online supplemental figure S4).

Figure 3

Heatmaps for retinoschisis, macular hole, retinal detachment and pathological myopic choroidal neovascularisation (PMCNV). (A) An example of a retinoschisis lesion detected by our AI system. (B) An example of a macular hole lesion detected by our AI system. (C) An example of a retinal detachment lesion detected by our AI system. (D) An example of a PMCNV lesion detected by our AI system. AI, artificial intelligence; PMCNV, pathological myopic choroidal neovascularisation.

Discussion

AI systems, especially deep learning systems based on CNNs, have been widely demonstrated to achieve expert performances in medical imaging diagnosis.18 In this study, we developed and verified an effective AI system based on OCT macular images. Given the heavy workload for ophthalmologists to screen the large cohort of high myopia patients, our AI system can provide effective assistance by reducing the time requirement and human workload without compromising diagnostic accuracy, which may not only assist ophthalmologists in large-scale myopia screening but also facilitate the long-term follow-up of high myopia patients.

High/pathological myopia is a chronic disease, and affected patients must be followed up for life. Usually, pathological myopia patients are followed up once every 3–6 months, while the follow-up interval applied to high-risk patients can be reduced to a few weeks. Nevertheless, patients tend to be tired and less motivated to adhere to follow-ups if they are asked to visit hospitals frequently. As proven in (online supplemental figure S4), our AI system successfully detected the new emerging pathological changes during the follow-up period, which is critical to follow-up work. In the future, our AI software and OCT devices can be incorporated into a telemedicine assistant system. With our AI tools, basic examinations and regular follow-up work can be performed in local community hospitals, which would be convenient for both ophthalmologists and patients, especially for those in areas lacking retina specialists.

As evidenced in many previous studies, OCT has served as an excellent tool for CNN training in identifying common retinal diseases.14–16 However, implementation of existing AI systems for pathological myopia is probably more challenging due to the complicated manifestations. For example, posterior staphyloma, severe retinal atrophy, coexisting comorbidities and atypical lesions in pathological myopia patients will drastically increase the difficulty of diagnosis. In this study, we are the first to use OCT to train CNN and screen four common vision-threatening conditions in high myopia and expert performance was achieved. Moreover, the heatmaps exported by our AI system allowed the precise visualisation of lesion positions and distinguishing features, facilitating a target diagnosis, which may help reduce the burden on ophthalmologists by offering heatmaps to them for reference and guidance. Consistent with the promising sensitivities, our AI system can also mitigate the need for inspection of normal eyes in large-scale screening. If a suspect lesion is detected by our AI system, the original images and corresponding heatmaps will then be submitted to human ophthalmologists for further confirmation, thus significantly reducing the entire workload.

The Focal Loss was reported to have been designed to address the one-stage object detection scenario in which an extreme imbalance exists between foreground and background classes.19 The implementation of the Focal Loss in our study also provides an effective solution for cases in which imbalanced data are processed during medical AI training.

As demonstrated by the heatmaps, our AI model can precisely identify distinguishing features of vision-threatening conditions. As revealed in the ROC curve in figure 2, the AI system showed better sensitivities for PMCNV and retinal detachment than human experts. We then screened the cases in which the AI system outperformed human experts, finding that human doctors were overmatched by the AI system when detecting some cases with comorbidities or similar symptoms (online supplemental figure S5). However, a few images were still misdiagnosed. These images and corresponding class activation maps were carefully reviewed to identify possible reasons. The following may be the potential contributors: (1) relatively low image quality, (2) microlesions and (3) lesions near the image boundary. When performing image quality control of the test dataset, we excluded some cases when the image quality was too poor for recognition by human experts. Some cases were included if the image quality was relatively low but human experts could still make diagnoses. It appears that our AI system has higher image quality requirements than human experts, as its accuracy was decreased when the image quality was relatively low while human doctors still remained highly accurate. Typical examples and corresponding heatmaps are shown in online supplemental figure S6.

Limitations

There are a few limitations of our study, and further improvements are required to achieve better practicability. First, our model was not trained to identify other retinal diseases with similar symptoms, such as diabetic retinopathy, choroiditis and age-related macular degeneration, which may lead to an incorrect diagnosis if these conditions are not excluded. Second, image quality and scan location are key elements to achieve better accuracy when using our AI system. However, high-quality OCT images sometimes cannot be easily obtained in high myopia patients due to the opacity of refractive media, such as cataract or vitreous haemorrhage. If a lesion is too close to the boundary, repeat scans that are as centred as possible are recommended. Third, all the data we collected in this study were obtained from ZOC (Guangzhou, China). Multi-centre validation is required in the future to further improve the reliability of our AI system.

Conclusions

We provided the first data regarding the application of CNN architectures to develop an AI system to identify retinoschisis, macular hole, retinal detachment and PMCNV in high myopia patients based on OCT images. With the expert performance achieved by our AI system, it can help both ophthalmologists and patients by reducing the time and workload required for large-scale high myopia screening and long-term follow-up.

Data availability statement

Data are available on reasonable request. All data relevant to the study are included in the article or uploaded as online supplemental information. The main data supporting the results in this study are available within the paper and online supplemental information. The raw datasets are too large to be publicly shared, yet they are available for research purpose from the corresponding authors on resonable request.

Ethics statements

Patient consent for publication

Ethics approval

Approval for the study protocol was obtained from the Institutional Review Board/Ethics Committee of Sun Yat-sen University (Guangzhou, China).

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • YL and WF contributed equally.

  • LZ and HL contributed equally.

  • Contributors Conception and design: WF, YL and HL. Funding obtainment: HL. Provision of data: LL Collection and assembly of data: WF, LZ, ML, JL and QZ. Data analysis and interpretation: WF, YL, LZ, XZ, BL, YZ, WC, YW, JL, YZ and CC. Manuscript writing: all authors. Manuscript revision: YL, LZ. Final approval of the manuscript: all authors.

  • Funding This study was funded by the Science and Technology Planning Projects of Guangdong Province (2018B010109008), the National Key R&D Program of China (2018YFC0116500), Guangdong Science and Technology Innovation Leading Talents (2017T×04R031) and the Science and Technology Planning Projects of Guangdong Province (2019B030316012).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles

  • At a glance
    Frank Larkin