Background/aims This study evaluates the performance of the Airdoc retinal artificial intelligence system (ARAS) for detecting multiple fundus diseases in real-world scenarios in primary healthcare settings and investigates the fundus disease spectrum based on ARAS.
Methods This real-world, multicentre, cross-sectional study was conducted in Shanghai and Xinjiang, China. Six primary healthcare settings were included in this study. Colour fundus photographs were taken and graded by ARAS and retinal specialists. The performance of ARAS is described by its accuracy, sensitivity, specificity and positive and negative predictive values. The spectrum of fundus diseases in primary healthcare settings has also been investigated.
Results A total of 4795 participants were included. The median age was 57.0 (IQR 39.0–66.0) years, and 3175 (66.2%) participants were female. The accuracy, speciﬁcity and negative predictive value of ARAS for detecting normal fundus and 14 retinal abnormalities were high, whereas the sensitivity and positive predictive value varied in detecting different abnormalities. The proportion of retinal drusen, pathological myopia and glaucomatous optic neuropathy was significantly higher in Shanghai than in Xinjiang. Moreover, the percentages of referable diabetic retinopathy, retinal vein occlusion and macular oedema in middle-aged and elderly people in Xinjiang were significantly higher than in Shanghai.
Conclusion This study demonstrated the dependability of ARAS for detecting multiple retinal diseases in primary healthcare settings. Implementing the AI-assisted fundus disease screening system in primary healthcare settings might be beneficial in reducing regional disparities in medical resources. However, the ARAS algorithm must be improved to achieve better performance.
Trial registration number NCT04592068.
- Public health
- Diagnostic tests/Investigation
Data availability statement
Data are available on reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
WHAT IS ALREADY KNOWN ON THIS TOPIC
AI has shown great potential in screening eye diseases. However, the effectiveness of AI systems in screening multiple retinal abnormalities in primary healthcare settings remains unclear.
WHAT THIS STUDY ADDS
This study demonstrates the feasibility of Airdoc retinal AI system for multiple retinal disease detection in real-world scenarios of primary healthcare settings. Moreover, it reminds us that more attention should be paid to fundus diseases in the community.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
The AI-assisted fundus disease screening system in primary healthcare settings could reduce regional medical resource inequalities, improve earlier detection of fundus diseases and reduce the disease burden caused by vision loss.
As China’s population ages and lifestyle evolves, vision impairment and blindness remain significant public health concerns. In China, from 1990 to 2019, regardless of population growth, the number of people with moderate vision impairment increased by 113.51% (87.22% due to population ageing and 26.29% contributed by age-specific prevalence), severe vision impairment increased by 126.97% (116.06% attributable to population ageing and 10.91% due to age-specific prevalence) and blindness increased by 44.18% (99.22% pertained to population ageing and −55.04% contributed by age-specific prevalence) according to the Global Burden of Diseases, Injuries and Risk Factors Study 2019.1 Uncorrected refractive error, cataracts and fundus diseases, including age-related macular degeneration (AMD), glaucomatous optic neuropathy (GON) and diabetic retinopathy (DR), are the leading causes of vision impairment in China.1
Compared with refractive error and cataracts, the diagnosis and management of fundus diseases are more complex and often accompanied by poor prognosis. Early screening and referral can effectively reduce the visual loss caused by DR,2 and combined screening for AMD and DR is highly cost-effective in both rural and urban China.3 However, there is a shortage of ophthalmologists in China. According to the International Council of Ophthalmology, China had 36 342 ophthalmologists in 2015, which equates to 26.4 ophthalmologists per million people.4 The scarcity of medical resources is more serious in grassroots and remote areas. Consequently, patients with fundus diseases may not receive timely treatment in primary healthcare settings due to a deficiency of ophthalmologists and equipment.
A growing number of studies have demonstrated that artificial intelligence (AI) has great potential for diagnosing eye diseases,5 indicating that AI might relieve the burden on primary healthcare settings. However, most AI systems trained in tertiary hospitals have not been tested with large samples in primary healthcare settings, so there may be variations in disease type and condition between the real and training populations. Our group worked on AI-based fundus disease screening in community hospitals and demonstrated high sensitivity and specificity for detecting DR.6 7 Recently, a deep-learning system, the Comprehensive AI Retinal Expert (CARE, Beijing Airdoc Technology, Beijing, China), depicted satisfactory performance in identifying multiple retinal abnormalities in a national real-world evidence study, including tertiary and community hospitals.8 Moreover, CARE has proven safe and reliable for detecting DR in Chinese community healthcare centres.9
This study employed an upgraded deep learning system based on CARE, Airdoc retinal AI system (ARAS, Beijing Airdoc Technology). Compared with CARE, ARAS applied a multitask classification network: a general subtask, macular subtask and optic-disc subtask, which could capture macula and optic disc abnormalities more accurately. However, the effectiveness of ARAS in screening multiple retinal abnormalities in primary healthcare settings remains unclear. To remedy this, three primary healthcare settings in Shanghai (a high-income area) and three in Xinjiang (a low-income area) were included to evaluate the performance of ARAS in detecting multiple fundus diseases. The performance of ARAS was described based on its accuracy, sensitivity, specificity, and positive and negative predictive values. Furthermore, we investigated the spectrum of fundus disease in primary healthcare settings of different economic levels based on the results of ARAS.
Materials and methods
The overall study design is displayed in online supplemental file 1. All fundus images were anonymised before analysis. All procedures were conducted following the Declaration of Helsinki, and informed consent was obtained from all the participants.
This study was conducted in six primary healthcare settings (or hospitals at a similar level): Linfen Community Health Service Centre, Pengpu New Village Community Health Service Centre, Pengpu Town Community Health Service Centre in Jing'an District of Shanghai, Bachu County People’s Hospital, Bachu County Traditional Chinese Medicine Hospital, and Selibuya Town Centre Health Centre in Kashgar, Xinjiang. Consecutive patients who visited primary healthcare institutions for eye discomfort were included between November 2020 and August 2022.
Algorithm construction of ARAS
CARE developed an algorithm based on Inception-ResNet-V2.8 ARAS is an updated version of CARE. As shown in online supplemental file 1, the overall architecture of ARAS consists of two AI submodules: a detection model for the macula and optic disc region (Yolo-V3) and a multitask learning model for retinal disease classification (EfficientNet-B3).10 First, Yolo-V311 was used to localise the macula and optic disc of each image. The initially detected bounding boxes of the macula and optic disc were extended by one-fourth of the width and height to cover the neighbouring regions, and then generated the macula and optic disc regions. Second, the multitask classification network took EfficientNet-B312 as the backbone with three subtasks, the general subtask for all 14 retinal abnormalities, the macula subtask for macular abnormalities (eg, macular holes (MH), macular oedema (ME), and central serous chorioretinopathy (CSC)), and the optic disc subtask for optic disc abnormalities (eg, GON). After detection, each subtask used the corresponding feature map region of the fundus image as the input. The general subtask, macular subtask and optic-disc subtask took the total fundus region, macula region and optic disc region as input, respectively. For each task, the corresponding feature map region located by the detection model was fed into a multilayer perceptron block with sigmoid as the activation function of the last layer and the loss function was cross-entropy. Third, during inference, each subtask outputs a status as either healthy or abnormal with some retinal diseases. If none of the three subtasks of the multitask classification model reported abnormalities, one eye was diagnosed as healthy; otherwise, one eye was diagnosed as the union of the abnormalities of the three subtasks. The development and validation of ARAS are detailed in online supplemental tables S1 and S2. The attention heatmaps of the 14 fundus abnormalities are displayed in online supplemental figures S3–S16.
Colour fundus photographs with 45°–50° fields of view centred on the midpoint of the fovea and optic disc were prospectively taken for both eyes under a natural pupil size. One image was taken for each eye of each participant. Manufacturers and camera models used in this study are listed in online supplemental table S3. All fundus photographs were anonymised and uploaded to the AI system. ARAS was used to identify normal fundus and 14 fundus abnormalities in each image/eye: referable DR, referable hypertensive retinopathy, GON, pathological myopia (PM), retinal vein occlusion (RVO), retinal detachment (RD), MH, ME, CSC, epiretinal membranes (ERM), retinitis pigmentosa (RP), retinal drusen ≥65 µm, macular neovascularisation and geographic atrophy.8 Unidentifiable images of each eye were excluded from further analysis, such as blurred and defocused images due to severe cataracts or keratitis. Ultimately, the fundus abnormalities of each participant were the union of the abnormalities in both eyes. Reference standards were generated by randomly assigning each image to two retinal specialists (5–10 years of postcertification experience in a tertiary hospital). Only after both retinal experts agreed that the image labelling was finalised; otherwise, a third expert (>10 years of postcertification experience in a tertiary hospital) make the final decision. The definitions or basis for judgement of the 14 retinal abnormalities as a reference for the graders are consistent with a previous study.8 The expert graded each image/eye for a normal fundus and all 14 abnormalities. All retinal abnormalities were labelled if several lesions were found in the same image/eye.
All statistical analyses were performed in R V.4.1.2. Median (IQR) and frequency (percentage) were reported to describe continuous and categorical variables, respectively. Medians and proportions were compared using the Wilcoxon rank-sum test (or Kruskal-Wallis rank-sum test, if appropriate) and Pearson’s χ2 test (or Fisher’s exact test, if appropriate), respectively. The analyses stratified by age and gender were visualized using “ggplot2” package and fitted using generalized linear models. The performance of ARAS was analysed at the eye level (one image was taken for each eye) based on its accuracy, sensitivity, specificity and positive and negative predictive values with a 95% CI. Notably, a p<0.05 was considered statistically significant.
Performance of ARAS in screening
From November 2020 to August 2022, 5069 participants (2725 from Shanghai and 2344 from Xinjiang) were enrolled in six primary healthcare settings. Moreover, 1149 images/eyes (806 from Shanghai and 343 from Xinjiang) were excluded from further analysis for unidentifiable. Finally, 4795 participants (2528 from Shanghai and 2267 from Xinjiang) were included. More details are shown in figure 1.
ARAS accurately identified the accuracy of the normal fundus (0.76 (95% CI 0.74 to 0.77)), referable DR (0.93 (95% CI 0.92 to 0.94)), referable hypertensive retinopathy (1.00 (95% CI 1.00 to 1.00)), GON (0.93 (95% CI 0.92 to 0.94)), PM (0.93 (95% CI 0.92 to 0.93)), RVO (0.99 (95% CI 0.99 to 1.00)), RD (1.00 (95% CI 1.00 to 1.00)), MH (1.00 (95% CI 0.99 to 1.00)), ME (0.97 (95% CI 0.96 to 0.97)), CSC (1.00 (95% CI 1.00 to 1.00)), ERM (0.94 (95% CI 0.93 to 0.95)), RP (1.00 (95% CI 1.00 to 1.00)), retinal drusen (0.93 (95% CI 0.92 to 0.94)), macular neovascularisation (0.98 (95% CI 0.98 to 0.99)) and geographic atrophy (1.00 (95% CI 0.99 to 1.00)). The sensitivity, speciﬁcity, and positive and negative predictive values of the ARAS to detect normal fundus and 14 common retinal abnormalities are illustrated in table 1.
The spectrum of retinal diseases in Shanghai and Xinjiang primary healthcare settings
Among the 4795 included individuals (2528 from Shanghai and 2267 from Xinjiang), the median age was 57.0 (IQR 39.0–66.0) years, 3175 participants (66.2%) were female, and 515 participants (10.7%) were identified as having normal fundus. There was no significant difference in the incidence of referable DR, referable hypertensive retinopathy, MH, CSC,or geographic atrophy between Shanghai and Xinjiang. The demographic characteristics and fundus conditions of 4795 participants are detailed in table 2 and figure 1. The demographic characteristics and fundus conditions of the Shanghai and Xinjiang participants are detailed in online supplemental tables S4 and S5, figures 17 and 18, respectively. The percentage of normal fundus in Shanghai and Xinjiang was 2.5% and 20.0%, respectively. The proportion of GON, PM, ERM, retinal drusen and macular neovascularisation was significantly higher in Shanghai than in Xinjiang. The percentages of RVO, ME and RP in Xinjiang were significantly higher than in Shanghai.
Due to the significant differences in age and gender between Shanghai and Xinjiang participants, we further stratified the analysis by age and gender. Among diseases requiring regular follow-up, the proportion of retinal drusen and PM in Shanghai was significantly higher than in Xinjiang for males and females of all ages (figure 2). Among diseases requiring referral, the percentages of referable DR, RVO and ME in middle-aged and elderly people in Xinjiang were significantly higher than those in Shanghai, while the proportion of GON in males in Shanghai was the highest at all ages (figure 3).
In this study, we initially evaluated the efficacy of ARAS for screening multiple retinal abnormalities in primary healthcare settings. The accuracy, speciﬁcity and negative predictive value of ARAS for detecting normal fundus and 14 abnormalities were all high, while it revealed a low sensitivity when it comes to detecting normal fundus, RD, macular neovascularisation and geographic atrophy. Moreover, the positive predictive value for screening 14 abnormalities was unsatisfactory compared with that of the normal fundus. Although ARAS performed satisfactorily in identifying multiple retinal diseases during the initial development and validation (online supplemental tables S1 and S2), it is not surprising given that the strict control of image quality and photographic conditions is hard to be fully met in real-world scenarios. The poor quality of images owing to the small pupils, ocular media opacities, camera-related technical issues and operator-dependent factors could affect the performance of the AI screening system.13 Additionally, Lin et al reported satisfactory sensitivity and speciﬁcity for identifying multiple retinal abnormalities of CARE in a national real-world study.8 Unlike in the present study, participants with fundus diseases in the study by Lin et al were mainly from tertiary hospitals (85.0% of participants in the internal validation set and 75.3% in the external test set).8 Meanwhile, the low proportion of fundus diseases in primary healthcare visits might have led to the low positive predictive value and low sensitivity in our study. For example, only eight patients had referable hypertensive retinopathy (0.17%), seven had CSC (0.15%) and five had geographic atrophy (0.10%). In summary, ARAS is an effective screening tool for multiple retinal abnormalities; however, its algorithm and workflow still require further improvement for better performance in real-world applications in primary healthcare settings.
AI systems have been widely studied for fundus diseases, including screening, diagnosis and prognosis assessment, using fundus photographs14 and optical coherence tomography angiography images.15 In addition to AMD,16 GON,17 DR18 and major blinding fundus diseases, Srisuriyajan et al developed an AI system with high sensitivity and specificity in screening cytomegalovirus retinitis, which is a great threat to the vision of HIV-positive patients.19 Moreover, AI revealed high performances in predicting postoperative visual prognosis after ERM20 and MH surgeries.21 Furthermore, as an accessible extension of the brain and the only window for observing small vessels in vivo,22 the retina can detect Alzheimer’s disease23 and cardiovascular diseases24 in addition to fundus diseases. However, significant gaps exist between developing and applying AI systems in clinical practice,25 and only a few reports have validated AI systems in real-world scenarios. Font et al demonstrated the high diagnostic accuracy of an AI system for holistic maculopathy screening, including DR, AMD, GON and nevus, in a routine occupational health check-up context.26 Hao et al reported that AI-assisted screening for DR in rural areas of Midwest China was highly consistent with ophthalmologist diagnoses.27 In our study, we conducted a multicentre, cross-sectional study to assess the efficacy of AI-assisted screening for normal fundus and 14 retinal abnormalities in primary healthcare settings in high-income (Shanghai) and low-income (Xinjiang) areas of China.
With the ageing population and evolving lifestyle in China, vision impairment and blindness remain important public health concerns, particularly in rural and remote areas. Based on AI-assisted screening, we preliminarily analysed the spectrum of retinal diseases in primary healthcare settings in Shanghai and Xinjiang. The percentage of normal fundus in Shanghai and Xinjiang was 2.5% and 20.0%, respectively. The proportion of retinal drusen and PM in Shanghai was significantly higher than in Xinjiang among males and females of all ages. Furthermore, the proportion of GON in males in Shanghai was the highest at all ages. At the same time, the percentages of referable DR, RVO and ME in middle-aged and elderly people in Xinjiang were significantly higher than those in Shanghai. There may be several reasons for the difference in the spectrum of fundus diseases in primary healthcare settings between Shanghai and Xinjiang. On the one hand, the participants from Xinjiang were younger than those from Shanghai (44.0 years vs 64.0 years). In addition, compared with Shanghai, the Kashgar region in Xinjiang has a higher altitude (1289.5 m vs 2.19 m), greater ultraviolet light exposure and a drier climate, which are significant risk factors for ocular surface diseases.28 These might explain the higher proportion of normal fundus in primary healthcare visits in Xinjiang. On the other hand, the per capita GDP of Shanghai in 2021 is RMB173 630, while that of Xinjiang is RMB61 725 (National Bureau of Statistics of China, https://data.stats.gov.cn/). Shanghai has abundant medical resources, and patients with serious fundus diseases, such as DR, RVO and ME, are more likely to seek consultation at a tertiary hospital rather than primary healthcare, which might cause a deviation in the results. To our knowledge, there are few studies on the prevalence of multiple fundus diseases in Shanghai and Xinjiang communities. A community-based cross-sectional study depicted that the incidence of DR in suburban Shanghai was approximately 6.3%,29 lower than 8.3% in this study, which may have resulted from the participants included in our study being patients attending primary healthcare settings, while Ma et al study was based on a community population.
As far as we know, this is the first study to investigate the spectrum of fundus diseases using an AI system of actual real-world scenarios in primary healthcare settings. Compared with conventional ophthalmologist screening, AI-based screening is more cost-effective.30 In addition, AI-assisted early screening can effectively alleviate the strain on the healthcare system and enhance eye care in primary healthcare settings: referrals will be made for patients who require additional evaluation or treatment, while others will continue to receive follow-up care in primary healthcare settings. In addition, AI system’s prompt and accurate diagnoses can improve patient compliance. Moreover, ARAS also demonstrated promise in screening cardiovascular diseases and Alzheimer’s diseases based on fundus photographs in primary healthcare settings. Furthermore, this study selected two representative regions, Shanghai (a high-income area) and Xinjiang (a low-income area), to screen for fundus diseases and provided a theoretical basis and practical experience for nationwide screening in both developed and developing countries. In the future, an epidemiological study on the prevalence of multiple retinal diseases in the whole country is anticipated to comprehensively analyse the condition of fundus disease and further provide data for national healthcare policies.
However, this study has several limitations. First, the demographic characteristics of participants were incomplete. Second, some participants with only one identifiable fundus image were included in the analysis of the disease spectrum, which might have biased the results. Third, it is possible that the prevalence of fundus diseases was overestimated owing to the higher negative predictive values of the AI system and lower positive predictive values. Thus, caution should be exercised when interpreting the current findings.
In conclusion, this multicentre and cross-sectional study demonstrated the feasibility of ARAS for multiple retinal disease detection in real-world scenarios in primary healthcare settings. Moreover, it reminds us that more attention should be paid to fundus diseases in the community in both developed and remote areas. The AI-assisted fundus disease screening system in primary healthcare settings could reduce regional medical resource inequalities and improve earlier detection of fundus diseases, thereby reducing the disease burden caused by vision loss. However, the ARAS algorithm requires additional development for enhanced performance, and well-designed population-based studies across the country are warranted to further assess its performance.
Data availability statement
Data are available on reasonable request.
Patient consent for publication
This study was approved by the Ethics Committee of Shibei Hospital of Jing’an District, Shanghai, China (approval number: YL-201805258-05). Participants gave informed consent to participate in the study before taking part.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
CG, YW and YJ are joint first authors.
Contributors CG, YuW and YJ: writing the manuscript, analysing the results and interpreting the data. FX, SW, RL, WY, NA, YiW, YL, XL, TW and LD: data collection. YC, BW and YZ: data processing. WBW: conception and design of the work. QQ, ZZ, DL and JC: idea, conception and design of the work, interpretation of the data, critical revision of the manuscript, and study supervision.
Funding This work was supported by the project of Shanghai Municipal Commission of Health and Family Planning (grant number: 202140224), Shanghai Municipal Health and Family Planning Commission (grant number: 20164Y0180), Shanghai Jing'an District Health Research (grant number: 2016QN06 and 2022MS11) and Shanghai Medical Key Special Construction Project (grant number: none).
Disclaimer The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.