Background/aims To validate a deep learning algorithm to diagnose glaucoma from fundus photography obtained with a smartphone.
Methods A training dataset consisting of 1364 colour fundus photographs with glaucomatous indications and 1768 colour fundus photographs without glaucomatous features was obtained using an ordinary fundus camera. The testing dataset consisted of 73 eyes of 73 patients with glaucoma and 89 eyes of 89 normative subjects. In the testing dataset, fundus photographs were acquired using an ordinary fundus camera and a smartphone. A deep learning algorithm was developed to diagnose glaucoma using a training dataset. The trained neural network was evaluated by prediction result of the diagnostic of glaucoma or normal over the test datasets, using images from both an ordinary fundus camera and a smartphone. Diagnostic accuracy was assessed using the area under the receiver operating characteristic curve (AROC).
Results The AROC with a fundus camera was 98.9% and 84.2% with a smartphone. When validated only in eyes with advanced glaucoma (mean deviation value < −12 dB, N=26), the AROC with a fundus camera was 99.3% and 90.0% with a smartphone. There were significant differences between these AROC values using different cameras.
Conclusion The usefulness of a deep learning algorithm to automatically screen for glaucoma from smartphone-based fundus photographs was validated. The algorithm had a considerable high diagnostic ability, particularly in eyes with advanced glaucoma.
Data availability statement
Data are available on reasonable request.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
With the development of deep learning,1 it has become possible to diagnose ocular diseases, such as diabetic retinopathy and glaucoma, using a fundus photograph. We recently reported that such an algorithm is effective regardless of differences among fundus cameras, facilities or even ethinicity.2 3 This could potentially be very useful in preventing blindness through the early detection of ocular diseases and induction of treatments. One major drawback is that fundus cameras are usually available only at medical facilities, and it is usually not possible to take a fundus photograph outside of a medical practice. However, the number of patients with glaucoma is much larger than the number of individuals who are actually visiting hospitals. For instance, it has been estimated that 67% of individuals affected by glaucoma remain undetected in the UK and 71% in Spain.4 5 This is problematic because glaucoma is a progressive and irreversible optic neuropathy which can result in irrevocable visual field (VF) damage and is one of the leading causes of blindness in the world.6 7 This implies that fundus cameras available at medical facilities are not sufficient to accomplish complete early detection of glaucoma. This may be in particularly true in developing countries. In addition, as previously reported in retinopathy of prematurity,8 9 smartphone-based fundus photograph is also clinically useful when patients’ cooperation cannot be excepted. This implies a similar merit of the smartphone-based fundus imaging when diagnosing glaucoma in various conditions, such as in children and patients with dementia.
One possible solution to overcome this problem is the use of a portable fundus camera. For instance, recent studies have revealed the usefulness of a deep learning-assisted programme to screen for diabetic retinopathy using a smartphone.10 11 This implies that a similar approach may also be useful for glaucoma detection. The purpose of the current study was to validate the effectiveness of our deep learning-assisted programme to screen for glaucoma against fundus photographs obtained by a smartphone.
Material and methods
The ethics committee of Matsue Red Cross Hospital waived the requirement for patient informed consent regarding the use of their medical record data in accordance with the regulations of the Japanese Guidelines for Epidemiologic Study issued by the Japanese Government. Instead, the protocol was posted at the outpatient clinic to notify participants about the research. This study was performed according to the tenets of the Declaration of Helsinki.
We used the training dataset from our previous study.12 In short, 1364 glaucomatous and 1768 normative photographs were obtained using a fundus camera (Nonmyd WX-3D; Kowa Company, Aichi, Japan) between February 2016 and October 2016 at Matsue Red Cross Hospital. Labelling of glaucoma was performed according to the recommendations of the Japan Glaucoma Society Guidelines for Glaucoma.13 Signs of glaucomatous changes, such as focal rim notching or generalised rim thinning, large cup-to-disc ratio with cup excavation with or without laminar dot sign and retinal nerve fibre layer defects with edges at the optic nerve head margin were judged comprehensively. All photographs were taken with an angle of view of 45 degrees. Photographs were excluded if they were not focused, unclear, too dark, too bright, decentred from the posterior pole, other optic nerve head and retinal pathologies, or other showed other conditions that could interfere with a diagnosis of glaucoma.
The testing dataset consisted of 73 images of 73 eyes with glaucoma and 89 images of 89 normative eyes obtained from outpatients who visited the Tokyo University Hospital between June 2019 and March 2020. Posterior fundus photographs were captured using the Nonmyd WX-3D camera. The diagnosis of glaucoma was defined similarly to that in the training dataset, following the recommendations of the Japan Glaucoma Society Guidelines for Glaucoma.13 In the glaucomatous group, the VF was tested using the Humphrey Field Analyzer Swedish Interactive Thresholding Algorithm central 30–2 programme (Carl Zeiss Meditec, Dublin, California, USA). However, in this study, VF results were not considered in the diagnosis of glaucoma. The normal group was defined as being free of glaucomatous changes and retinal pathologies in fundus photographs with IOP within the normal range. When both eyes were eligible, the right eye was used for this study. The diagnosis was independently judged by three ophthalmologists specialising in glaucoma (MT, HM and RA). Photographs were excluded if the diagnoses of the three examiners did not agree. Thus, this testing dataset was prepared without considering VF defects.
Fundus Photograph With Smartphone
A fundus photograph was taken in all eyes in the testing dataset. This was accomplished by attaching the D-Eye lens (D-EYE S.r.l., Padova, Italy) to an iPhone 8 (Apple, Cupertino, California, USA). After pupil dilation with a combined eye-drop containing 0.5% tropicamide and 0.5% phenylephrine hydrochloride (Midrin-P; Santen Pharmaceutical, Osaka, Japan), a fundus recording was obtained using the video function of the iPhone in a dim room. Each video was recorded for approximately 1 min. A single still image with the best view of the optic disc was selected manually and used in the following analyses.
Structure and training strategy of deep neural networks
The structure and training strategy of the network are mostly adapted from our previous reports.2 3 12 We exploited a type of convolutional neural network known as the residual neural network (ResNet)6 with 34 layers, which is well known to be useful for image classification and feature extraction. We reported the usefulness of the ResNet algorithm (a scratch model) to accurately discriminate between glaucomatous and healthy eyes, trained with approximately 3000 fundus images labelled as glaucomatous or not. To facilitate a deeper and larger network, ResNet skips one or more layers and features are propagated to succeeding layers. As detailed in our previous paper,2 we exploit 34 layers in ResNet which was pretrained using the ImageNet classification.14 In addition, further improvements were attempted by applying image augmentation: (1) rotation with random angles from −10 degrees to +10 degrees, (2) vertical and horizontal translations to a maximum of 10% of the length, (3) scaling edges to a random value from 224 to 256 pixels and (4) the contrast (50%–150%), saturation (90%–110%) and hue (90%–110%) of the images were modified randomly. Prior to training, the actual input of the network was prepared out of raw fundus images by trimming around the optic disc using an object detection network,15 resizing to fit the network input (224×224 pixels), and normalising pixel values by the mean and SD of the training dataset.
Validation was performed using the fundus photographs obtained with both the Nonmyd WX-3D camera and the smartphone in the testing dataset. The ResNet algorithm was built using all data in the training dataset and the area under the receiver operating characteristic curve (AROC) was calculated. AROC values were compared using DeLong’s method.
Subsequently, as a subanalysis, the AROC value was calculated using glaucomatous eyes with advanced disease status (mean deviation: MD < −12 dB)16 and all normative eyes. Finally, the AROC value was calculated using glaucomatous and normative eyes without highly myopic eyes (axial length <26 mm).
Demographic data of the subjects in the testing datasets are summarised in table 1. The MD value of the glaucomatous eyes was −10.9±7.8 (mean±SD, SD) (range −29.8 to 1.6) dB in the testing dataset.
Samples of the fundus images obtained with the Nonmyd WX-3D camera and the smartphone are shown in figure 1. The smartphone-based images were a mixture of good (cases A and B) and blurred focus (cases C and D). Images that were not perfectly targeted around the optic disc were also included (case D).
Figure 2A shows the receiver operating characteristic curve obtained with the Nonmyd WX-3D camera and the smartphone. The AROC was 98.9% (95% CI 97.8% to 99.9%) for images obtained with the Nonmyd WX-3D camera and 84.2% (95% CI 78.1% to 90.2%) with the smartphone. The AROC value was significantly lower with the smartphone (p<0.001, DeLong’s method). Using the fundus camera, the sensitivity was 98.6%, 97.3% and 94.5% at specificities of 85%, 90% and 95%, respectively. Using the smartphone, the sensitivity was 64.4%, 50.7% and 37.0% at specificities of 85%, 90% and 95%, respectively.
Among 79 eyes with glaucoma, 26 eyes were in an advanced stage. Table 2 shows the demographics of patients according to the disease status. The AROC using the smartphone was 90.0% (95% CI 83.4% to 96.3%); with the Nonmyd WX-3D camera, the AROC was 99.3% (95% CI 83.4% to 96.3%) (figure 2B). The AROC value was significantly lower with the smartphone (p=0.0041, DeLong’s method). Using the fundus camera, the sensitivity was 98.6%, 97.3% and 94.5% at specificities of 85%, 90% and 95%, respectively. Using the smartphone, the sensitivity was 100.0%, 100.0% and 96.2% at specificities of 85%, 90% and 95%, respectively.
Among 92 eyes with glaucoma and 96 normative eyes, 67 and 93 eyes, respectively, were non-highly myopic. Table 3 shows the demographics of non-highly myopic subjects. The AROC using the smartphone was 81.7% (95% CI 73.9% to 89.5%), whereas that uwing the Nonmyd WX-3D camera was 98.7% (95% CI 97.3% to 100.0%) (figure 2C). With the smartphone, the sensitivity was 62.7%, 53.4% and 40.3% at specificities of 85%, 90% and 95%, respectively. The AROC value was significantly lower with the smartphone (p<0.001, DeLong’s method). Using the fundus camera, the sensitivity was 97.9%, 95.7% and 91.4% at specificities of 85%, 90% and 95%, respectively. Using the smartphone, the sensitivity was 59.6%, 51.1% and 36.2% at specificities of 85%, 90% and 95%, respectively.
We previously reported that a ResNet deep learning model is useful to diagnose glaucoma in fundus photographs12 and that this approach was useful regardless of the myopic status12 and differences in fundus cameras, facilities or even ethinicity.2 3 In the current study, we investigated the usefulness of applying this approach to smartphone-based fundus photographs compared with an ordinary fundus camera. Our results demonstrated that considerably high diagnostic accuracy was obtained with the smartphone-based fundus camera, although the area under the curve (AUC) value (84.2 %) was significantly lower than that with an ordinary fundus camera (98.9 %).
We have reported an AUC value between 94.8% and 99.7% using our ResNet deep learning model, which was also used in the current study.2 12 Other previous studies have also reported similar levels: 91%,17 94.0%,18 96.1%,19 98.6%,20 98.2% and 99.5%.21 Compared with these results, a considerably lower AUC value (84.2%) was observed with the smartphone-based fundus camera in the current study (figure 2A). This is in contrast to previous studies that suggested high sensitivity and specificity values (sensitivity values of 60.3%, 49.1% and 37.1% with specificities of 85%, 90% and 95%, respectively) when diagnosing diabetic retinopathy using the smartphone-based fundus camera. Natarajan et al 11 reported a sensitivity of 85.2% with a specificity of 92.0%, Rajalakshmi et al 22 reported a sensitivity of 95.8% and specificity of 80.2%, Bilong et al reported a sensitivity of 73.3% and specificity of 90.5%, and Rajalakshmi et al 22 reported a sensitivity of 92.7% and specificity of 98.4%. One reason for this result could be associated with the disease status of glaucoma, as the diagnostic accuracy was considerable improved in eyes with advanced disease status (figure 2B). In turn, the AUC value was much lower (81.3 %: data not shown in Result) when these eyes were excluded, in the current study. Using a fundus camera-based deep learning algorithm, we recently validated the diagnostic accuracy according to the glaucoma disease stage.3 As a result, it has been suggested that the diagnostic performance was considerably improved associated with the disease stage. The reason for this is that glaucomatous changes at the optic disc, such as rim thinning and retinal nerve fibre layer defects, become more evident as the disease progresses.
Nonetheless, the relatively low diagnostic performance of the smart-phone based deep learning algorithm in the current study cannot be fully explained by disease status because a much higher AUC value was observed in eyes with advanced stage using the ordinary fundus camera (99.3%). This may be because imaging should be carefully performed to target the area around the optic disc in glaucoma in contrast to diabetic retinopathy. The quality of the focus is also imperative to capture subtle structural changes (see figure 1). In contrast, retinal changes are often observed in a wide area in the retina in diabetic retinopathy and characteristic abnormalities, such as haemorrhage, exudate and neovascularisation, are generally more obvious than glaucomatous optic disc changes. Thus, the current results could be associated with the requirement for higher quality images for diagnosis of glaucoma than diabetic retinopathy. In the current study, all smartphone images were recorded by experienced optometrists, and the image quality would be much worse when the images were captured by people without any experience in fundus recording, which will make this problem more distinct, although there is a possibility the ‘autofocus’ function in a smartphone, unlike in the conventional fundus camera, may be advantageous in future when a fundus image was captured by a general people. In addition, we made every effort as possible, such as changing the brightness of images, overlaying multiple images, and improving the sharpness of images, however, none of these were successful. We concluded that these procedures were not effective enough to get rid of the relatively poor image quality of the images captured with a smartphone.
The importance of the current result cannot be overstated. This is because glaucoma is one of the leading causes of blindness in the world.6 7 Early detection of the disease is essential because of the progressive and irreversible VF damage it causes. A previous study suggested that more advanced VF loss at diagnosis is a significant risk factor for subsequent legal blindness.23 Further efforts are needed to improve the diagnostic ability in eyes with early-stage glaucoma by improving the quality of smartphone-based optic disc images, such as improved camera lenses and illumination. Alternatively, image quality could be improved by applying deep learning methods to improve image resolution ability, such as Generative Adversarial Network and variational autoencoder. Similar approaches have been reported as successful with optical coherence tomography images24 and VF.25–27 Although this is beyond the scope of the current study, further efforts should be made to shed light on this issue.
Morphologically, optic discs in highly myopic eyes are different from those of non-highly myopic eyes. For instance, eyes with high myopia are often associated with the tilting of the optic nerve head and thinning of the retinal nerve fibre layer.28 29 These changes often make detection of glaucoma challenging. Previous epidemiological studies have reported that myopia is a risk factor for the development of open angle glaucoma.30–33 The current study suggested that the diagnostic accuracy of the smartphone-based deep learning algorithm was not affected by this issue (figure 2C). This is in agreement with our previous studies in which the same diagnosing algorithm was trained using the same training dataset.2 3 The importance of this result cannot be overstated in patients of Asian origin, including the Japanese, because myopia is more common in these populations.34
One significant limitation of the current study is that the deep learning algorithm was trained merely using images from a fundus camera. Improved diagnostic accuracy could be obtained if this algorithm was further trained using images derived from a smartphone. This should be investigated in a future study. Second, we had to pick up a single still image with the best view of the optic disc, because otherwise optic disc was out of the view of the images, too blurred or too dark, which would be undoubtfully disadvantageous when diagnosing glaucoma. This is a ‘manual’ process and limitation of the current study when this algorithm is used in the real world (outside hospitals). In addition, we initially attempted the smartphone measurement without the pupil dilation, however, the images were very dark, defocused, or optic discs were even outside the view of the images. These difficulties occurred even with experienced optometrists. Consequently, we had to dilate pupils to take smartphone images. That being said, the present work is basic research, and not yet commercial application. Furthermore, flash light had to be continuously on for 1 min during the recoding time for a video which may give some discomfort to the patient, especially with the pupil dilation. This further suggest the need for the improvement in the image capturing system using a smartphone. Finally, the effect of background demographic of the training dataset could have an influence on the diagnostic accuracy, which was not investigated in the current study. However this may have only a marginal effect, since the training dataset was constructed without giving any bias on this aspect, and as a result sampled widely from the general population of glaucoma.
In conclusion, this study validated the usefulness of a deep learning algorithm to automatically screen for glaucoma from smartphone-based fundus photographs. The algorithm had a considerably high diagnostic ability, particularly in eyes with advanced glaucoma. The potential clinical impact of the current result as a screening tool cannot be exaggerated, but there is room for improvement in the quality of photographs.
Data availability statement
Data are available on reasonable request.
Patient consent for publication
The study was approved by the Research Ethics Committee of the Matsue Red Cross Hospital, the Faculty of Medicine at the University of Tokyo, and Shimane University.
Contributors KN and RA researched literature and conceived the study. KN, RA and MT were involved in protocol development, gaining ethical approval. All authorswere involved in patient recruitment and data analysis. RA wrote the first draft of the manuscript. All authors reviewed and edited the manuscript and approved the final version of the manuscript.
Funding This study was supported in part by grants (nos. 19H01114, 18KK0253 and 20K09784 (RA)) from the Ministry of Education, Culture, Sports, Science and Technology of Japan and The Translational Research program; Strategic Promotion for practical application of Innovative medical Technology (TR-SPRINT) from the Japan Agency for Medical Research and Development (AMED) (no grant number), grant AIP acceleration research from the Japan Science and Technology Agency (RA) (no grant number), and grants from the Suzuken Memorial Foundation (no grant number) and the Mitsui Life Social Welfare Foundation (no grant number).
Competing interests NS, MT, KM, HM and RA reported that they are coinventors on a patent for the deep learning system used in this study (Tokugan 2017-196870). Potential conflicts of interests are managed according to institutional policies of the University of Tokyo.
Provenance and peer review Not commissioned; externally peer reviewed.