Article Text
Abstract
Background/aims The aim of this study was to develop and evaluate digital ray, based on preoperative and postoperative image pairs using style transfer generative adversarial networks (GANs), to enhance cataractous fundus images for improved retinopathy detection.
Methods For eligible cataract patients, preoperative and postoperative colour fundus photographs (CFP) and ultra-wide field (UWF) images were captured. Then, both the original CycleGAN and a modified CycleGAN (C2ycleGAN) framework were adopted for image generation and quantitatively compared using Frechet Inception Distance (FID) and Kernel Inception Distance (KID). Additionally, CFP and UWF images from another cataract cohort were used to test model performances. Different panels of ophthalmologists evaluated the quality, authenticity and diagnostic efficacy of the generated images.
Results A total of 959 CFP and 1009 UWF image pairs were included in model development. FID and KID indicated that images generated by C2ycleGAN presented significantly improved quality. Based on ophthalmologists’ average ratings, the percentages of inadequate-quality images decreased from 32% to 18.8% for CFP, and from 18.7% to 14.7% for UWF. Only 24.8% and 13.8% of generated CFP and UWF images could be recognised as synthetic. The accuracy of retinopathy detection significantly increased from 78% to 91% for CFP and from 91% to 93% for UWF. For retinopathy subtype diagnosis, the accuracies also increased from 87%–94% to 91%–100% for CFP and from 87%–95% to 93%–97% for UWF.
Conclusion Digital ray could generate realistic postoperative CFP and UWF images with enhanced quality and accuracy for overall detection and subtype diagnosis of retinopathies, especially for CFP.\
Trial registration number This study was registered with ClinicalTrials.gov (NCT05491798).
- Lens and zonules
- Imaging
- Retina
Data availability statement
No data are available.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
A large proportion of fundus images collected in real-world settings have quality defects and cataracts present as the leading cause, which can undermine clinical evaluation and cannot be easily solved by recaptures on site. While previous studies report several techniques to enhance cataractous fundus images, these methods failed to achieve large-scale clinical application due to the indirect reflection of the real quality degradation caused by cataracts in model development and a lack of clinical evaluation of models.
WHAT THIS STUDY ADDS
Compared with the traditional CycleGAN framework, our modified CycleGAN framework used in digital ray achieved better performance in technical parametric evaluation. In clinical evaluation experiments, fundus images generated by digital ray presented both improved quality and diagnostic accuracy for retinopathies.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
Digital ray could effectively enhance cataractous fundus images and significantly improve both the overall detection and subtype diagnosis of retinopathies.
Introduction
Fundus photography is important in the evaluation of both ophthalmic diseases and systemic conditions such as hypertension and diabetes.1 In real-world data collection, 19.7%–21% of fundus images cannot meet the quality standards of ophthalmologists or intelligent algorithms due to multiple environmental and patient-related factors,2–4 which can largely undermine downstream analysis.5 Among these ungradable images, cataracts have been recognised as the most common cause, accounting for 78% of all cases.2 Considering the high comorbidity rate of cataracts and other ocular pathologies such as age-related macular degeneration (AMD) and epiretinal membrane,6 7 images degraded by cataracts can even more severely impede the detection of retinal lesions. Moreover, while some quality factors, such as poor lightning conditions and patients’ head moving, can usually be corrected by repeated acquisition, cataracts cannot be easily solved by this method because fundus images are captured through the lens and cataracts will unavoidably attenuate and scatter the passing light. Therefore, image enhancement should be employed as a preprocessing step for cataractous fundus images to improve readability and usability.
Several studies have investigated the enhancement of fundus images based on various techniques. Hand-crafted algorithms, such as contrast-limited adaptive histogram equalisation, have been introduced to restore degraded fundus images.8–10 While these methods were effective for enhancing image visibility, indiscriminately processing the entire image leads them to either struggle to preserve delicate pathological characteristics or suffer from strict constraints in implementation since the observation of different retinal structures requires distinct image parameters (eg, illumination for macula and optic disc). The recent advancement in deep learning techniques has facilitated the application of generative adversarial networks (GANs) in ophthalmology.11–14 However, these models still failed to achieve large-scale clinical application. One possible reason is that these studies mostly used Gaussian noise or other image blurring technique to available clear fundus images to form training dataset, instead of collecting real-world cataractous and clear images of identical patients (‘image pairs’), which cannot directly reflect the real quality degradation caused by cataracts. Besides, the restoration results were evaluated mainly based on technical parameters rather than clinical diagnosis, so their clinical significance remains to be specified.
Inspired by the design of X-ray scanning, we proposed digital ray to ‘penetrate’ cataractous lens and achieve better clarity in colour fundus photography (CFP) and ultra-wide field (UWF) images. Digital ray was developed using GAN frameworks based on preoperative and postoperative image pairs of cataract patients. The generated images were then evaluated through both technical parameters and clinical experiments regarding image quality, authenticity and diagnostic efficacy by ophthalmologists of distinct levels of experience, for assisting clinical practice.
Materials and methods
Study design and participants
The overall study design is shown in figure 1. For digital ray development, patients who visited the Cataract Department of Zhongshan Ophthalmic Centre from January 2022 to July 2022 were recruited to undergo CFP (Topcon TRC-NW8, Tokyo, Japan) and UWF photography (Optos Daytona, Dunfermline, UK). The inclusion criteria were as follows: (1) age ≥18 years; (2) age-related or complicated cataracts, eligible for phacoemulsification and intraocular lens implantation. The exclusion criteria were (1) conditions that prevented pupil dilation and (2) conditions that prevented the identification of primary retinal vessels, such as the extraordinarily cloudy lens and severe vitreous haemorrhage. Photographs were taken within 1 week before and 1 week to 1 month after cataract surgery. Pupil dilation was achieved with topical 1.0% tropicamide before CFP imaging. To test digital ray after model development, CFP and UWF fundus photographs were taken from a prospective cohort of cataract patients enrolled from February 2023 to April 2023, following the same procedure as the development cohort.
Fundus image generation
To develop digital ray, we deployed one of the classic GAN algorithms, CycleGAN and a modified version specially designed for cataractous fundus images in the training process. The overall model structure of CycleGAN and our modifications are demonstrated in online supplemental figure S1. In our modified CycleGAN, to make the generated images more realistic and identifiable in important structural details (optic disc, macula and blood vessels), we employed spatial attention to enhance the traditional CycleGAN generator, which involves pooling features globally and then passing them through convolutional layers to create a weight matrix. This matrix is then multiplied with the original features to emphasise critical retinal regions so the model can generate more apparent macula, optic disc and blood vessels. Besides, we introduced multiscale discriminator to discriminate the fundus images at three different scales so that the discriminator can perceive fundus image details at different granularities. The enhanced model is denoted as C2ycleGAN (‘C’ycleGAN multiplied by ‘C’ataractous images). In the training process, we adopted Adam as the optimiser and set its momentum term to be 0.5. The learning rate was set to be 0.0001 while the number of iterations was set to be 200, and the number of iteration to linearly decay learning rate to zero was set to be 150, respectively. We implemented all the models with Python (V.3.8) and PyTorch framework. All experiments were performed on Ubuntu V.16.04 LTS with GeForce GTX 3090.
Supplemental material
Performance evaluation of CycleGAN using technical parameters
To objectively assess the performance of GANs at image generation, we introduced Frechet Inception Distance (FID) and Kernel Inception Distance (KID), both of which are widely used in GAN research for measuring the similarity between the generated dataset and the real dataset, with a lower value suggesting a higher-quality image generated and closer to the distribution of the real data set.15 16 Details about the calculation of FID and KID are summarised in online supplemental appendix 1.
Clinical evaluation of generated images
Preoperative and generated images were anonymised and then evaluated in terms of quality, authenticity and diagnostic efficacy, respectively (figure 1C). Nine ophthalmologists with different levels of expertise participated in the evaluation study: resident level (TC, XWei and YS), senior level (LLiu, SB and YL) and expert level (LLi, C-KT and FX). In each experiment, doctors were presented with the images in a randomised order and then needed to independently decide on the answers to corresponding questions. Definitions of quality, authenticity and retinal diseases are summarised in online supplemental table S1. For each patient enrolled in this study, we collected their medical records, slit-lamp images, fundus images, optical coherence tomography (OCT) scans and fundus fluorescein angiography (FFA) images (optional, depending on the condition of lesions) before cataract surgery to make a preliminary diagnosis. These tests were repeated postoperatively (if necessary) and referred to a group of retinal experts to finalise the ground truth.
Evaluation metrics and statistical analysis
In parametrical evaluation, FID and KID achieved by CycleGAN and C2ycleGAN were compared using t-test for two paired means. In clinical evaluation, the 95% CI of sensitivity, specificity, accuracy and discrimination rate was calculated using binomial proportion CIs. Comparative analysis of these indices was conducted using McNemar’s test. The receiver operating characteristic (ROC) curve was also created by plotting the ratio of sensitivity against the ratio of 1−specificity, with the area under the ROC (AUC) calculated. All the statistical tests were two tailed, and a p<0.05 was considered statistically significant. All tests were conducted using the statistical package R, V.3.2.4.
Results
Characteristics of the datasets
A total of 959 CFP and 1009 UWF image pairs from 510 patients were assigned to the training dataset while 100 CFP and 100 UWF image pairs from 200 patients were allocated to the test dataset (online supplemental table S2). In the training dataset, the mean age of patients was 64.5±12.6 years, and 55.9% were female. In the test set, the mean age of patients was 64.6±11.5 years, and 52.0% were female. More baseline clinical characteristics are shown in online supplemental table S2.
Generated images and parametrical evaluation
After model development, CFP and UWF images were generated using the algorithms described above. Online supplemental figure S2 demonstrates typical examples of preoperative and postoperative image pairs, along with generated images using original CycleGAN and C2ycleGAN. As these images indicate, CycleGAN can significantly improve observation of retinal structures and pathological features such as drusen and tessellated fundus. This enhancement is more obvious using C2ycleGAN. Table 1 shows the performance comparison between original CycleGAN and C2ycleGAN using FID and KID in the test dataset. The statistically significant lower FID and KID scores achieved by C2ycleGAN for both CFP and UWF images suggest that fundus images generated by C2ycleGAN present better quality and a closer distribution to the real postoperative data. Thus, C2ycleGAN was used for establishment of digital ray.
Quality and authenticity evaluation of preoperative and generated retinal images
Following experiment one as illustrated in online supplemental table S1, the quality distribution of preoperative and generated CFP and UWF images labelled by ophthalmologists are summarised in online supplemental table S3 and figure 2A,B. For CFP images, a significant increase in the proportion of excellent-quality images was observed in all the ophthalmologists’ group after digital ray enhancement. Conversely, the percentage of inadequate-quality images all experienced a statistically significant decrease. As for UWF images, a significant change was only observed in the proportion of excellent-quality images, with an increase from 14.3% to 24% in the residents’ group and from 25% to 32.7% in the seniors’ group. In experiment 2, the rates to accurately recognise generated CFP and UWF images are shown in figure 2C,D. For CFP images, the average discrimination rates achieved by residents, seniors, experts were 20.3% (95% CI 13.8% to 26.9%), 25.7% (95% CI 22.4% to 28.9%) and 28.3% (95% CI 24.0% to 32.7%), respectively. For UWF images, the corresponding rates were 9.3% (95% CI 7.4% to 11.3%), 13.3% (95% CI 11.0% to 15.7%) and 18.7% (95% CI 16.7% to 20.6%), respectively.
Diagnostic efficacy evaluation of preoperative and generated images
Diagnostic efficacy evaluation includes two steps: retinopathy detection (to determine the existence of any retinopathies) and subtype diagnosis (further classification of retinopathies). Retinopathies included in CFP and UWF analysis are presented in online supplemental table S1. Based on all ophthalmologists’ average ratings, their diagnostic performance indices were calculated and shown in table 2 and figure 3. For retinopathy detection, a significant increase was observed in both accuracy (from 78% to 91%) and sensitivity (from 72.1% to 94.1%) among generated CFP images (table 2), and higher AUCs were achieved by generated images in both CFP and UWF groups (figure 3A,B). For retinopathy subtype diagnosis, sensitivity for AMD and accuracy for diabetic retinopathy (DR), pathological myopia (PM) and others have been significantly improved in generated CFP images, and a significant improvement was also observed in accuracy for retinal detachment (RD) in generated UWF images (table 2). The overall diagnostic accuracies by different panels of ophthalmologists are summarised in figure 2E,F. As is shown, digital ray enhancement caused a significant increase in overall diagnostic accuracy for CFP images by all groups of ophthalmologists, specifically from 57.3% (95% CI 51.3% to 63.3%) to 73.7% (95% CI 71.5% to 75.8%) for residents, from 67% (95% CI 63% to 71%) to 81% (95% CI 78.6% to 83.4%) for seniors, and from 69% (95% CI 65.3% to 72.7%) to 81.7% (95% CI 76.9% to 86.4%) for experts. Similarly, a significant increase was also observed in overall diagnostic accuracy for UWF images by all groups of ophthalmologists, specifically from 62% (95% CI 59.6% to 64.4%) to 73.7% (95% CI 71.5% to 75.8%) for residents, from 66% (95% CI 64.2% to 67.8%) to 77% (95% CI 74.6% to 79.4%) for seniors, and from 71.7% (95% CI 68.4% to 74.9%) to 79.7% (95% CI 74.3% to 85.1%) for experts.
Discussion
In this study, we developed and evaluated digital ray for enhancing cataractous CFP and UWF images using a modified CycleGAN structure. This structure achieved higher performance than traditional CycleGAN measured by FID and KID. In the test set, our results demonstrated that digital ray generated images of realistic characteristics and better overall quality compared with preoperative counterparts. In the diagnostic efficacy evaluation, the average accuracies of retinopathy detection significantly increased from 78% to 91% for CFP, and from 91% to 93% for UWF. For retinopathy subtype diagnosis, the accuracies also increased from 87%–94% to 91%–100% for CFP, and from 87%–95% to 93%–97% for UWF. The mean diagnostic accuracies by residents, seniors and experts all significantly improved after digital ray enhancement, with an average increase of 14.3% for CFP and 10.2% for UWF images.
Several previous studies have proposed enhancement methods for cataractous fundus images based on different techniques such as hand-crafted algorithms8–10 17 and convolutional neural networks.11–13 Compared with these studies, digital ray has multiple advantages. First, we used preoperative and postoperative image pairs collected from cataract patients in real-world clinical settings, rather than cataract-like image simulation, for model development. Thus, our input data can provide the actual transition from cataractous images to clear ones for models to learn. Also, to the authors’ knowledge, enhancement of UWF images has not been studied extensively before. Second, this study presented a method to improve the CycleGAN model. After adding multiscale discriminator and spatial attention mechanisms, the performance of CycleGAN has been greatly improved, which could be further implemented in image generation techniques. Third, we included ophthalmologists with distinct levels of expertise to systematically evaluate the generated images in terms of image quality, authenticity and diagnostic efficacy. This evaluation process could form a standardised workflow for assessing generated medical images and demonstrate the clinical significance of digital ray.
In the diagnostic efficacy evaluation, sensitivity for AMD and accuracy for DR, PM and others have been significantly improved after digital ray enhancement of CFP images while significant improvement was observed only in accuracy for RD in UWF images (table 2). This is probably attributable to the generally acceptable quality of preoperative UWF images.18 As UWF imaging is based on scanning laser lights, they can better penetrate cataractous lens and obtain a clearer view of fundus than ordinary lights used by CFP imaging.19 Accordingly, CFP images can benefit more from digital ray enhancement. Among CFP images, the largest increase in sensitivity was noticed in AMD compared with other conditions. One possible reason is that, in contrast to DR and PM, AMD mostly presents delicate pathological changes such as drusen, the detection of which can be more easily affected by cataracts. For UWF images, the most substantial increase in AUC was achieved in the detection of RD. This may be attributed to the fact that both RD and cataracts can present as a large, grey mass in UWF images and digital ray can reduce the artefacts caused by cataracts, and therefore, strengthen doctors’ confidence in their decision. Overall, ophthalmologists can benefit from digital ray enhancement and make more accurate diagnosis, but this effect can also vary according to their clinical experience. As is shown in figure 2C,D, the greatest gain in diagnostic accuracy was observed in the residents’ group for both CFP and UWF images. Thus, digital ray could potentially assist ophthalmologists in clinical diagnosis and preoperative assessment, particularly for the less experienced.
The study findings should be interpreted with several limitations. First, digital ray cannot solve all quality defects caused by cataracts. While cataract-related poor contrast and clarity can be enhanced by our method, images that do not contain complete targeted structures (eg, optic disc) still require recaptures. Second, while digital ray is theoretically useful for other fundus diseases, such as optic disc oedema and retinal vein occlusion, the current investigation is limited to five disease categories for each imaging modality. The assistance of digital ray to other fundus diseases will be explored in our future studies. Third, while GANs could be used to remove the blurredness caused by cataracts, it can also introduce other information during image generation, which would be a patient safety issue in clinical settings and remains to be explored.
In conclusion, this work introduced digital ray for enhancing cataractous poor-quality CFP and UWF fundus images based on a modified CycleGAN structure. Digital ray performed well in terms of both technical parameters and clinical evaluation experiments. This system could provide an effective method for enhancing cataractous fundus images and improving accurate detection of retinopathies. Given the ubiquity of image quality issues across various medical specialties,20 21 this study could be an important reference for the development of other image enhancement and intelligent diagnostic systems.
Data availability statement
No data are available.
Ethics statements
Patient consent for publication
Ethics approval
This study was approved by the Institutional Review Board of Zhongshan Ophthalmic Centre at Sun Yat-sen University (IRB-ZOC-SYSU, identifier: 2021KYPJ074 ). All procedures followed the tenets of the Declaration of Helsinki . Participants gave informed consent to participate in the study before taking part.
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
LL and JH are joint first authors.
Contributors Conception and design: XWu, HL and LLiu. Data collection: LLiu, YW, LLi, TC, C-KT, FX, YS, SB, XWei, YL, ZF, YD, KC, YX, DW, XZ and MD. Analysis and interpretation: JH, SL, KW, ML and LZ. Writing–original draft: LLiu and XWu. Writing–review and editing: JH, SL, YW, KW, LZ, ZL, DY, XC, YS, DL, ZC and HL. Obtainment of funding: XWu and HL. The guarantor: XWu.
Funding This study was funded by the National Natural Science Foundation of China (92368205, 82171035), Guangdong Provincial Natural Science Foundation for Progressive Young Scholars (2023A1515030170), the High-level Science and Technology Journals Projects of Guangdong Province (2021B1212010003), Guangzhou Basic and Applied Basic Research Project (202201011301) and Basic scientific research projects of Sun Yat-sen University (23ykcxqt002).
Disclaimer The sponsors of the study played no role in the study protocol design; data collection, analysis or interpretation; manuscript preparation; or the decision to submit the manuscript for publication.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Linked Articles
- Highlights from this issue