Article Text
Abstract
Background Diabetic retinopathy (DR) is a leading cause of blindness in adults worldwide. Artificial intelligence (AI) with autonomous deep learning algorithms has been increasingly used in retinal image analysis, particularly for the screening of referrable DR. An established treatment for proliferative DR is panretinal or focal laser photocoagulation. Training autonomous models to discern laser patterns can be important in disease management and follow-up.
Methods A deep learning model was trained for laser treatment detection using the EyePACs dataset. Data was randomly assigned, by participant, into development (n=18 945) and validation (n=2105) sets. Analysis was conducted at the single image, eye, and patient levels. The model was then used to filter input for three independent AI models for retinal indications; changes in model efficacy were measured using area under the receiver operating characteristic curve (AUC) and mean absolute error (MAE).
Results On the task of laser photocoagulation detection: AUCs of 0.981, 0.95, and 0.979 were achieved at the patient, image, and eye levels, respectively. When analysing independent models, efficacy was shown to improve across the board after filtering. Diabetic macular oedema detection on images with artefacts was AUC 0.932 vs AUC 0.955 on those without. Participant sex detection on images with artefacts was AUC 0.872 vs AUC 0.922 on those without. Participant age detection on images with artefacts was MAE 5.33 vs MAE 3.81 on those without.
Conclusion The proposed model for laser treatment detection achieved high performance on all analysis metrics and has been demonstrated to positively affect the efficacy of different AI models, suggesting that laser detection can generally improve AI-powered applications for fundus images.
- Treatment Lasers
- Neovascularisation
Data availability statement
Data may be obtained from a third party and are not publicly available. Deidentified data used in this study are not publicly available at present. Parties interested in data access should contact Jorge Cuadros (jcuadros@eyepacs.com) for queries related to EyePACS.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
What is already known on this topic
Laser photocoagulation is an established treatment for retinal conditions such as Diabetic Retinopathy. The performance of artificial intelligence models for the detection of various retinal indications may be affected by the existence of laser artifacts.
What this study adds
This study proposes a new state-of-the-art artificial intelligence model for the detection of laser photocoagulation artifacts, as well as demonstrating a positive effect on the performance of other artificial intelligence models.
How this study might affect research, practice or policy
This study may improve or aid the development of future artificial intelligence models for the detection and diagnosis of various retinal conditions.
Introduction
Laser photocoagulation is a common and established procedure, in which laser pulses are used to coagulate retinal tissue, used to treat multiple retinal diseases.1–3 Ablative photocoagulation is mostly used to prevent leakage and ischaemic neovascularisation in vascular retinal conditions such as diabetic retinopathy (DR),4 5 diabetic macular oedema (DME),6–8 retinal vein occlusion,9 10 and neovascular age-related macular degeneration (AMD).11
Laser photocoagulation is generally divided into panretinal and focal; the former is delivered in the peripheral retina with deep ablative burns to stem the neovascular process,12 13 while the latter is a lighter photocoagulative treatment delivered in the central macula to treat macular conditions.14 15 There are well-established laser treatment protocols depending on disease severity and individual patient disease state.11 16–18 While laser photocoagulation is an effective treatment, it causes retinal scarring and is destructive to the retinal tissue leaving long-term defects in the anatomy.19–21
Artificial intelligence (AI) using fundus imaging has been increasingly employed in various ophthalmological applications.22 23 These applications include extraction of basic patient data, such as age and sex,24 detection of retinal pathologies,25 26 and pathology development prediction.27 28 AI methods rely on image pattern recognition, especially in areas in which the pathology is present. As such, laser photocoagulation may disrupt general pattern recognition by adding new patterns or artefacts, such as burns and scars, which the model is less trained to deal with. This is specifically problematic given that laser treatment is often done on areas of interest, such as leaky blood vessels, which are often the very areas that are most crucial to recognise.
The effect laser photocoagulation has on AI systems suggests that a tool to identify images of eyes which have undergone photocoagulation may be beneficial for the autonomous retinal-based diagnosis and follow-up treatment of patients. While previous methods of laser photocoagulation detection exist,29–33 this work, to the best of our knowledge, presents the first laser treated image detection method based on a large, diverse, and widely accepted database—in this case, EyePACS (https://www.eyepacs.org); the database contains images from a variety of manufacturers and patient populations, of varying image qualities.
Methods
Data
The data consisted of a subsample of the EyePACs dataset, which contains 45° angle fundus photography images and expert readings of said images. All images and data were deidentified according to the Health Insurance Portability and Accountability Act ‘Safe Harbor’ before they were transferred to the researchers.
The dataset contained up to six images per patient visit: one macula centred image, one disc centred image, and one centred image (in which a central fixation image is fixated on the middle of a line connecting the foveola and the optic disc), per eye. Each eye underwent expert reading, including but not limited to panretinal laser treatment presence, focal laser treatment presence, and image quality. All images of the subsample deemed readable by expert annotations were used.
The resulting dataset consisted of 21 050 images from 9212 patients, of which 9484 images (45%) had artefacts of panretinal laser treatment, 1888 (9%) had artefacts of focal laser treatment and 847 (4%) had both. This work combined focal and panretinal laser treatments into 1 category of laser treatment, resulting in an overall 10 525 (50%) images with laser treatment artefacts (table 1). Of these, roughly 77% of patients required dilation, where 54% of all patients received 1 gtt. tropicamide 1%, 17% received 1 gtt. tropicamide 0.5% and 5% received other dilation agents.
The average age of patients with laser treatment artefacts was 59.5 (10.0 SD) and 55% were women, compared with the patients who had not undergone laser treatment, for which the average age was 55.6 (11.3 SD) and 61% of which were women (table 2). The prevalence of laser photocoagulation across ethnic groups may be found in online supplemental table A. The distribution of laser treatment images across DR levels is given in online supplemental table B; all laser treatment images were from patients with more than mild DR, and the majority were from patients with grade 4 DR.
Supplemental material
Quality assessment
An image quality assessment tool was developed using classic computer vision methods; the tool detects visibility of fundus-specific characteristics and assigns each image a score. The given quality score for an image is an aggregation of the visibility from multiple areas within the fundus image. The tool was validated based on visual assessment of images score and the readability of the images. Figure 1 demonstrates a few examples of images and their respective scores, showing the correlation between score and visual image quality. This was done in order to remove low-quality images from the dataset, as the quality scores assigned by EyePACs are assigned to patients and not to individual images.
Preprocessing
Image preprocessing was performed in two steps for both datasets. First, image backgrounds were cut along the convex hull, which contains the circular border between the image and the background. Figure 2 shows an example of this process. Second, images were resized to 512×512 pixels. Lastly, using the afore-mentioned quality assessment tool, bad-quality images were filtered out before training. The model was checked with multiple training configurations set by multiple thresholds and the image quality threshold was set at the point at which model performances were not improved by filtering additional images, resulting in 1373 images filtered, approximately 6.5% of the data.
Model training
The data was then divided into training, validation, and test datasets at a ratio of 80%, 10% and 10%, respectively. A binary classification neural network was trained. The model architecture was automatically fitted to best balance the model performance versus model complexity tradeoff. Hyperparameter tuning was done on the validation set.
Statistical analysis
The metrics used for model assessment were accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). For each metric, the bias corrected and accelerated bootstrap method34 was used to produce a 95% CI.
Analysis levels
Laser detection was done on three different levels. The first, detection on the individual image level, was the basic task for which the model was trained. The second, detection on the eye level, used all images from a given eye and the image for which the model had the highest probability score was selected for analysis. For the third, detection on the patient level, the results from both eyes were compared and the eye with the higher probability score was selected to produce a patient-level result. The eye and patient level analysis, respectively, operate on the logic that one field or eye with photocoagulation artefacts is sufficient for the eye or patient to be classified as positive.
Effect on imaging tasks
The effect that laser treatment has on imaging tasks was measured by applying the laser detection model as a postprocessing step for a model for the detection of DME, which was developed based on the EyePACs dataset,35 and a model for age detection, also developed based on the EyePACs dataset.
The performance on these tasks was measured in AUC on a separate validation set containing images both with and without laser treatment artefacts. The 95% CI was calculated using the accelerated bootstrap method for each population and compared for significance.
A regression model was additionally trained for age detection, and the mean absolute error (MAE) between the patient’s age and predicted age was calculated on a separate validation set. The validation set was separated into patients with and without laser treatment artefacts, such that the mean age between these populations was the same. Significance in MAE between the two populations was calculated using a student’s t-test. Detailed patient statistics of these experiments, as well as details on model development, are given in online supplemental tables C and following explanations.
Results
The results for the different analysis methods of laser artefact detection were as follows (table 3): on the image level, sensitivity of 0.883 (95% CI 0.868 to 0.897), specificity of 0.880 (95% CI 0.864 to 0.894), and AUC of 0.950 (95% CI 0.943 to 0.956) were achieved. On the eye level, sensitivity of 0.925 (95% CI 0.900 to 0.945), specificity of 0.931 (95% CI 0.916 to 0.944), and AUC of 0.979 (95% CI 0.972 to 0.984) were achieved. On the patient level, sensitivity of 0.929 (95% CI 0.881 to 0.947), specificity of 0.926 (95% CI 0.911 to 0.944), and AUC of 0.981 (95% CI 0.971 to 0.987) were achieved.
The results of laser artefact detection for each DR level are displayed in table 4: the model achieved 0.910 AUC (95% CI 0.866 to 0.941) for DR level 2, 0.887 AUC (95% CI 0.758 to 0.954) for DR level 3, 0.929 AUC (95% CI 0.918 to 0.938) for DR level 4, and 0.772 AUC (95% CI 0.904 to 0.968) for ungradable DR level. DR levels 0 and 1 did not have any laser treated examples, thus most metrics are not defined for these groups. The results of laser artefact detection stratified by ethnicity are available in online supplemental table D.
Online supplemental table E shows the difference in results in laser artefact detection between patients with and without DME. The model achieved 0.955 AUC (0.948–0.962) for non DME patients versus 0.908 AUC (0.884–0.927) for DME patients, demonstrating that these conditions do affect results, but the model achieves high performance irrespective of them.
Online supplemental table F displays the results of laser artefact detection for images which passed (high quality) and did not pass (low quality) the quality filter, showing a significant difference between the populations. The results for low-quality images, which were filtered out, were 0.787 sensitivity (95% CI 0.710 to 0.849), 0.793 specificity (95% CI 0.709 to 0.860), and 0.857 AUC (95% CI 0.803 to 0.898); compared with 0.854 sensitivity (95% CI 0.838 to 0.869), 0.904 specificity (95% CI 0.890 to 0.917), and 0.948 AUC (95% CI 0.941 to 0.955) for high-quality images which passed the filter.
The effect of laser detection and subsequent filtration on the afore-mentioned three tasks of DME detection, age prediction, and sex detection were as follows: DME detection results for images with no laser artefacts were 0.955 AUC (95% CI to 0.948 to 0.961), compared with images with laser artefacts, on which the model achieved 0.932 AUC (95% CI 0.905 to 0.951). Age prediction results for images with no laser artefacts, after age adjustment, were 3.81 MAE, compared with images with laser artefacts, on which the model achieved 5.33 MAE. T-test analysis shows a significance of p<1e−4. Sex detection results for images with no laser artefacts were 0.922 AUC (95% CI 0.916 to 927), compared with images with laser artefacts for which the model achieved 0.872 AUC (95% CI 0.830 to 0.903).
The aggregation of these results is shown in table 5.
Discussion
This work proposed a method for the automatic detection of laser treatment artefacts in fundus images, which may also serve as a component in the future development of AI systems for different diagnoses based on retinal imaging. Such tasks may need to consider images of laser-treated eyes differently from non-treated eyes according to their design needs; some may choose to discard these images, while others may analyse them in a manner differently to images of untreated eyes. Accordingly, and in accordance with the degree to which laser treatment affects the task in question, the proposed system may be used at different operating points with different sensitivity–specificity balances. Discarding laser-treated images is a viable option for most automated retinal screening applications, as these patients should already have an awareness of the need for regular screening.
Previous studies on the autonomous detection of laser burns from fundus images have been on a smaller scale (roughly 2 orders of magnitude).29–33 The importance of scale is in the better representation of real-life conditions; specifically, this study allows better representation of various image qualities, camera manufacturers, and populations. Additionally, a wider range of clinical conditions, such as DR and DME, are represented in this study both with and without laser treatment, and the proposed system shows high performance across these conditions.
The effect laser treatment has on imaging tasks, and the model’s ability to detect relevant images was validated by checking the model’s effect on different AI tasks involving retinal images. A significant difference was found for all three tasks, showing the relevance of the proposed method for future AI tasks.
A limitation of this work is the lack of differentiation between focal and panretinal laser treatments that were grouped as one in this work. Future works may differentiate between the two, given increased data. Furthermore, even though the base characteristics of laser photocoagulation remain similar across conditions, the addition of AMD-specific databases to the training set may improve results.
In addition, and in the same vein of the presented work, machine learning methods to detect patients with DME who will require future laser treatment may be developed. This would require training a model, similar to the one presented, on a dataset generated from a longitudinal study tracking the progression of patients with diabetes.
Data availability statement
Data may be obtained from a third party and are not publicly available. Deidentified data used in this study are not publicly available at present. Parties interested in data access should contact Jorge Cuadros (jcuadros@eyepacs.com) for queries related to EyePACS.
Ethics statements
Patient consent for publication
Ethics approval
Institutional Review Board exemption was obtained from the Sterling Independent Review Board on the basis of a category 4 exemption (DHHS), pursuant to the terms of the U.S. Department of Health and Human Service’s Policy for Protection of Human Research Subjects at 45 C.F.R §46.104(d)
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors IB analysed the data, designed the study and conducted research. ZD-A and DM conceived the study and supervised research. YR provided assistance in assessing external models. TI provided medical and strategic guidance and oversight. IB and RA drafted the manuscript with input from all authors. ZD-A is guarantor.
Funding Employees and board members of AEYE Health designed and carried out the study; managed, analysed and interpreted the data; prepared, reviewed and approved the article; and were involved in the decision to submit the article. There were no grants or awards involved in the funding of this article.
Competing interests RA, IB and YR are employees of AEYE Health. TI is a board member of AEYE Health. DM is COO of AEYE Health. ZD-A is CEO of AEYE Health.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.