Glaucoma is a result of irreversible damage to the retinal ganglion cells. While an early intervention could minimise the risk of vision loss in glaucoma, its asymptomatic nature makes it difficult to diagnose until a late stage. The diagnosis of glaucoma is a complicated and expensive effort that is heavily dependent on the experience and expertise of a clinician. The application of artificial intelligence (AI) algorithms in ophthalmology has improved our understanding of many retinal, macular, choroidal and corneal pathologies. With the advent of deep learning, a number of tools for the classification, segmentation and enhancement of ocular images have been developed. Over the years, several AI techniques have been proposed to help detect glaucoma by analysis of functional and/or structural evaluations of the eye. Moreover, the use of AI has also been explored to improve the reliability of ascribing disease prognosis. This review summarises the role of AI in the diagnosis and prognosis of glaucoma, discusses the advantages and challenges of using AI systems in clinics and predicts likely areas of future progress.
- Optic Nerve
Statistics from Altmetric.com
Glaucoma is a presently incurable condition that requires lifelong treatment and monitoring, and is the second largest cause of irreversible blindness worldwide.1 Primarily affecting the elderly, it is estimated that nearly 79.6 million people will have glaucoma by 2020.2 The central event in glaucoma is the irreversible damage of retinal ganglion cell (RGC) axons that carry visual information from the retina to the brain.3 When damaged, the RGCs undergo programmed cell death (apoptosis) resulting in vision loss.3 While an early diagnosis could minimise the risk of permanent vision loss,4 nearly half the patients affected by glaucoma remain undiagnosed until a relatively late stage5 due to the slow and asymptomatic nature of the disease in its earlier stages.6
Once diagnosed, predicting the progression of glaucoma is a complex endeavour that is time consuming, subjective and heavily dependent on the clinician’s experience and expertise, and requires multiple clinical tests.7 8 Often, these tests are repeated over several visits to rule out their inherent subjectivity and to account for patient variability.7 In fact, in 2016 the World Glaucoma Association acknowledged the absence of a single specific test that could be regarded as a perfect reference to predict the progression of glaucoma.9 This means that cases of overtreatment and undertreatment are presently inevitable.9 Given that glaucomatous changes to the optic nerve head (ONH) tissues are irreversible, timely and reliable structural and functional evaluation of the eye could help in the early diagnosis of glaucoma, and to better predict its progression.10 11
In recent years, artificial intelligence (AI)-based systems have started to revolutionise the healthcare industry.12–14 In the field of ophthalmology, a number of AI approaches have been explored for the diagnosis of retinal,15 16 macular,17 18 choroidal16 and corneal pathologies.19 20 Moreover, with the advent of deep learning (DL), a number of AI tools for the automated segmentation and enhancement of ocular images from optical coherence tomography (OCT)21–25 and fundus imaging modalities26 have also been proposed.
Modern AI algorithms are especially tailored to extract meaningful features from complex and high-dimensional data. Consequently, a number of AI studies have been proposed for the diagnosis and management of glaucoma based on the interpretation of functional and/or structural information of the eye. This review summarises the role of AI in glaucoma, the clinical advantages and challenges, and projects potential future applications of AI in glaucoma.
Understanding AI algorithms
The AI algorithms discussed in this review can be broadly classified into two categories based on the complexity of the data they handle.
The first category consists of machine learning classifiers (MLC) and artificial neural networks (ANN). MLCs such as random forest (RF), logistic regression (LR), support vector machine (SVM), Gaussian mixture model (GMM) and independent component analysis (ICA) are clustering algorithms that are extensions of classical statistical modelling. ANNs, on the other hand, are biologically inspired algorithms that pass the input data through a series of interconnected nodes (artificial neurons), and continuously modify the weights of each node to obtain the desired classification.
These algorithms learn to take input data (eg, clinical parameters) and automatically make a prediction (eg, presence of pathology, glaucoma severity) through a supervised or unsupervised learning process. In supervised learning, the algorithm (eg, SVM, RF, ANNs, LR) is trained with a fully labelled data set (eg, disease diagnosis as label). In unsupervised learning, the algorithm (eg, GMM, ICA) is trained with an unlabelled data set (eg, only clinical parameters as inputs) in an attempt to identify new patterns/trends. Such algorithms are typically well suited for handling low-dimensionality numeric data (eg, simple numbers such as the vertical cup-to-disc ratio, intraocular pressure (IOP), age and sex).
The second category of AI algorithms are the sophisticated variants of the ANNs known as convolutional neural networks (CNN). They are well suited for exploiting high dimensionality data (eg, fundus and OCT images) through multiple interconnected levels of data abstraction. In each level, convolution layers (layers of filters) attempt to organically extract the features (feature maps; eg, information on texture, edges, intensity, thickness) that best represent the task (eg, identifying the presence of a pathology, identifying a specific tissue) of the algorithm. Through an iterative learning process (training) that aims to minimise the error between the output of the network (eg, predicted diagnosis) and the ground truth (eg, clinical diagnosis), weights (parameters) of the extracted feature maps (to decide the influence of each feature towards the final decision) are continuously refined until the optimal weights (least error between the network output and the ground truth) are identified.
In the recent years, DL, an advanced and powerful incarnation of the CNNs, has become the go-to AI approach in the field of medical imaging for segmentation, enhancement and diagnostic applications.27 When exposed to diverse and large volumes of data (eg, a combination of clinical parameters and images), it is often critical to identify the important features present in those data that drive the accuracy of an AI model. This is a time-consuming process known as feature engineering. In such scenarios, DL networks often outshine traditional methods owing to their ‘automated feature engineering’ approach (ie, automatically identifying the best set of features in the data that influence the performance of the algorithm). For instance, jointly understanding the textural information (eg, hyper-reflectivity) and spatial arrangement (eg, between the retinal layers and choroid) might be crucial for identifying the retinal pigment epithelium layer while learning to segment the ONH tissues from OCT images. Nevertheless, a hybrid of traditional and advanced AI algorithms are also being explored to increase the robustness of these predictive models.
AI for evaluation of functional damage
In the earliest study reported in 1994, Goldbaum et al 28 29 proposed the use of ANNs to interpret visual fields (VF) data from standard automated perimetry evaluations. When trained on the absolute threshold sensitivity, age and VF values (read in sequence from nasal to temporal), the ANN was able to detect glaucomatous eyes almost as proficiently as a trained reader (ANN and expert agreement: 74%). Subsequently, other studies30–32 also concluded that ANNs and MLCs are able to match, or even outperform, human experts and more conventional algorithms.32 A study by Lietman et al 33 concluded that ANNs designed to detect VF defects could also outperform accepted global indices such as the mean deviation and pattern SD at specificities greater than 90%. More recently, CNNs have been leveraged to detect abnormal VFs.34 35 Li et al 34 developed a DL network that was trained on the probability map of the pattern deviation image to distinguish glaucoma from healthy VFs. The network achieved a diagnostic accuracy of 87.6%, higher than glaucoma experts (62.6%) and using traditional criteria such as Advanced Glaucoma Intervention Study (AGIS) (45.9%) and Glaucoma Staging System 2 (GSS2) (52.3%). In another study, Kucur et al 35 reported that a DL system trained with images that were Voronoi representations of VFs could identify early glaucomatous VFs with an average precision of 87.4%.
AI for prognosis of functional damage
The progression of glaucoma is non-linear, dependent on the instrument used for assessment, and there presently exists no widely accepted clinical index to predict glaucomatous vision loss.36 Several research groups have explored AI approaches to increase the reliability of the clinical prognosis. In one of the initial studies, Brigatti et al 37 developed an ANN using the VF data obtained from the Yale Glaucoma Section database. The network, when trained on the VF threshold points, mean defect, corrected loss, variance, false positive and false negative ratios and patient age, produced a good agreement (sensitivity: 73% and specificity: 88%) with the experienced observer. Sample et al 38 reported that standard MLCs such as SVMs and mixture of Gaussians classifiers predicted the development of abnormal fields, on average, nearly 4 years earlier than more traditional methods (StatPac-like) in patients with ocular hypertension (OHT). Since OHT is a significant risk factor for glaucoma, the proposed VF prognosis approach could also help in the early diagnosis. Studies exploiting unsupervised techniques such as ICA also concurred that the VF loss predicted by ANNs was comparable, or even better than, existing clinical criteria.39 Similarly, Yousefi et al 40 concluded that AI techniques could identify patterns of progression earlier than more conventional methods. Recently, Wen et al 41 reported that DL networks could predict the future 24-2 Humphrey visual fields (HVF), up to 5.5 years (figure 1), from a single HVF as input.41
It is important to note that, while the above studies reported the success of AI in the assessment and prognosis of functional damage, their performance is variable due to the sheer subjectivity involved in obtaining VF data (eg, patient factors, measurement noise, fixation losses).42 A summary of the AI studies for the diagnosis and prognosis of glaucoma using functional evaluations can be found in table 1.
AI for evaluation of structural damage
Early AI-based studies assessed glaucomatous structural damage using data obtained from confocal scanning laser ophthalmoscopy (CSLO), and scanning laser polarimetry (SLP) modalities. Although the clinical relevance of these modalities has decreased in recent years because of advances in ophthalmic imaging,42 43 their contribution in establishing the idea that glaucomatous structural damage precedes VF loss is noteworthy.44
Several studies reported that MLCs increased the discriminatory power of the optic disc parameters obtained from CSLO.45–50 Specifically, Bowd et al 45 reported that MLCs, when trained on global and regional optic disc topographic parameters, offered a significantly higher area under the curve (AUC) when compared with more standard methods. The results also inferred that the peak height contour (temporal inferior), global cup shape and the disc area (nasal) were the most informative parameters in discriminating glaucomatous and healthy eyes. Moreover, studies also concurred that MLCs trained on the retinal nerve fibre layer (RNFL) measurements from an SLP device offered a diagnostic accuracy higher than the inbuilt software (GDx).51 52
With the advent of fundus colour photography, studies have explored the use of ANNs for the segmentation and classification of optic disc photographs.53–58 In the earliest study, Sinthanayothin et al 58 reported that an ANN recognised the important regions (optic disc, blood vessels and fovea) of a fundus photograph with high sensitivity (80.4%–99.1%) and specificity (91.0%–99.1%). While these studies were not designed to directly help the diagnosis of glaucoma, they53–57 led the way for other groups to design methods to automatically extract relevant structural parameters (ie, cup-to-disc ratio, volume) for disease diagnosis and clinical management.
Subsequently, many DL approaches directly exploited the visual information and extracted glaucoma-related features from fundus colour photographs.59–61 In a landmark multiethnicity study, Ting et al 61 developed a DL network trained on 500 000 fundus photographs (125 189 glaucoma images) that was capable of discriminating glaucomatous fundus photos with high confidence (AUC: 0.942; specificity: 87.2%; sensitivity: 96.4%). Li et al 60 also developed a DL network (AUC: 0.982; specificity: 92.0%; sensitivity: 95.6%) able to detect referable glaucomatous optic neuropathy by training a CNN on 48 000 fundus photographs. In another study, Medeiros et al 59 proposed a DL approach to predict the spectral-domain OCT average RNFL thickness measurements from fundus photos (mean error less than 10 µm). If successfully translated to clinics, the proposed DL-based system could cost-effectively diagnose and stage glaucoma merely from optic disc photographs.
Having demonstrated excellent reproducibility in measuring the RNFL thickness,62 OCT has emerged as the de facto standard for objective quantification of glaucomatous structural damage of the ONH tissues.63 In the earliest study, Huang and Chen64 reported that ANNs successfully differentiated (AUC: 0.87) glaucomatous and healthy eyes using OCT measurements (RNFL thickness and ONH parameters). Furthermore, Burgansky-Eliash et al 65 showed that MLCs offered excellent discriminatory power (AUC: 0.98) in detecting glaucoma eyes using the OCT parameters obtained from macula, peripapillary and ONH regions. The study also concluded that MLCs were able to differentiate (AUC: 0.85) early from advanced glaucoma eyes. Other research groups that attempted to study the efficacy of other MLCs using OCT measurements also achieved similar results.66–69
Although the above-mentioned studies reported the general success of AI systems in identifying glaucoma eyes using OCT parametric data, their performance strongly depended on the accuracy of the automated measurements. For instance, the presence of blood vessel shadows can adversely affect the performance of these tools, yielding incorrect RNFL thickness measurements.70 Indeed, these issues are more pronounced in glaucoma subjects since they already exhibit a thinner RNFL,71 limiting the classification ability of AI systems. Besides structural parameters, there exists other visual information from OCT images (speckle pattern, tissue reflectance) that are associated with the progression of glaucoma.72 Thus, by better exploiting the information contained within OCT image data, it is feasible to increase the diagnostic power of the instrument in ophthalmology clinics.
While the idea of AI-assisted OCT for glaucoma diagnosis is still quite nascent, a few studies have begun to explore its feasibility using deep CNNs. Muhammad et al 73 proposed a hybrid approach that used a CNN to extract rich features from wide-field (9×12 mm) OCT scans that were later classified by an RF classifier to predict the existence of glaucomatous damage (AUC: 0.94). The extracted features included the RNFL and ganglion cell plus inner plexiform layer (RGC+) thickness measurements, thickness probability maps and the en face projection images. In a population-based study (2701 subjects; 135 glaucoma), Girard et al 74 developed a novel ‘Co-Training’ DL network to simultaneously segment (figure 2A) and diagnose glaucoma (figure 2B) from OCT images of the ONH. The DL network first isolated the individual neural and connective tissues, and subsequently combined the segmentation information with the OCT images to identify glaucoma from non-glaucoma subjects (AUC: 0.90). By leveraging 3D structural information, the DL network can discriminate glaucomatous eyes significantly better than methods exploiting the RNFL thickness values alone (figure 2B). Maetschke et al 75 proposed a feature agnostic approach that used a 3D DL network to classify glaucomatous and healthy eyes directly from raw OCT volumes (AUC: 0.94). Further, they also concluded that the DL network focused on the neuroretinal rim, optic disc area, and the lamina cribrosa and its surrounding regions, while identifying a glaucoma scan (figure 3). These findings concurred with established clinical markers associated with glaucoma,76 thus adding ‘clinical explainability’ to these so-called ‘black-box’ tools. More recently, in a multidevice, multiethnicity study, Zhang et al 77 reported that a DL network could offer a fast (less than 1 s) and simplified glaucoma diagnosis (AUC: 0.90) from just a single clinical test (OCT scan of the ONH). Finally, a number of studies have also used AI to detect glaucoma with varying success from anterior segment OCT images and measurements (AUC ranging from 0.85 to 0.96).78–81 A summary of the AI studies for the diagnosis of glaucoma using the structural evaluations can be found in table 2.
Hybrid AI approaches
Given the inherent subjectivity of functional assessments and the variability in structural measurements, research groups have attempted to increase the discriminatory power of AI systems by combining both the structural and functional statuses of the eye. The earliest such study by Brigatti et al 82 assessed the performance of an ANN trained on the ONH parameters (cup-to-disc ratio, rim area, cup volume and RNFL thickness) and the VF indices (mean defect, correct loss variance and short-term fluctuation). The network offered a higher diagnostic accuracy (88%) when trained with both the structural and functional information, as opposed to only one of them (ONH parameters/VFs: 80%/84%). Subsequent studies that combined the structural measurements from OCT,67 83 CSLO84–86 and SLP parameters87 along with the perimetry evaluations also offered similar results.
In a slightly different hybrid approach, Oh et al 88 reported that an ANN trained with a mix of ophthalmic (IOP, spherical equivalent refractive errors, vertical cup-to-disc ratio, presence RNFL defects) and systemic factors (sex, age, menopause, duration of hypertension) could successfully differentiate (AUC: 0.89) between primary open-angle glaucoma (POAG) subjects and glaucoma suspects.
Other AI approaches
While the AI algorithms mentioned above discriminated between the functional and/or structural status of the eye to identify glaucoma, a few studies have also explored the potential of applying AI to genetic data. In the largest genome-wide association study conducted on nearly 140 000 participants, Khawaja et al 89 identified 112 genomic loci (including 68 novel loci) associated with IOP and the development of POAG. Further, they developed a regression model that predicted POAG with an AUC of 0.76 based on these loci. Burdon et al 90 used both regression and ANN models and concluded that a combination of disc parameters, IOP and POAG-associated loci could improve the accuracy of POAG risk prediction models, thus allowing scope for early treatment and prevention of blindness. A summary of the hybrid and genetic studies that use AI for the diagnosis of glaucoma can be found in table 3.
In this review, we discuss the role of AI in the diagnosis and prognosis of glaucoma using functional or/and structural evaluations of the eye. While early studies relied on simple MLCs or ANNs to detect glaucomatous eyes using parametric data, modern DL systems have successfully exploited high-dimensionality image data, thus increasing the diagnostic power of ocular imaging modalities such as fundus photography and OCT in clinics.
The use of AI algorithms in clinics may be viewed as a tool to assist clinicians, but not to replace them. AI methods can help speed up the triage process by collating data from multiple tests, detecting abnormalities and offering relevant referrals. Thus, AI systems can help create a clinically conducive environment that better uses specialised resources, reduce workload for clinicians, minimise diagnostic errors leading to incorrect treatment and improve the overall quality of ophthalmic care for patients with glaucoma.
With its ability to extract meaningful features from complex modalities, modern AI methods can help in the discovery of new biomarkers to improve our current understanding of glaucoma. This could be useful for the early detection of glaucoma and to promote research and development into new drugs and treatment. Besides, AI could also help us to identify new structural/functional signatures or traits that are extremely difficult to parameterise, perhaps leading to an enhancement of the accepted definition of glaucoma. Thus, a synergy between AI systems and clinicians could lead to mutual advancements in both glaucoma research and clinical practices.
The screening of large populations is logistically and economically unfeasible.91 However, new telemedicine-based screening has offered the benefits of early detection, reduced travel times, increased specialist referral rates, thus saving costs for both the individual and the healthcare system.92 Incorporating AI systems in tandem with the ocular imaging modalities used in telemedicine might be a long-term and cost-effective solution to increase screening efficacy, and to monitor patients in primary care and community settings where resources and access to specialists are limited.
The lack of a widely accepted clinical reference to predict the progression of glaucoma increases the likelihood of incorrect treatment and management strategies.9 Since structural changes to the eye almost always precede functional loss in glaucoma,44 AI tools for the segmentation21 93 and enhancement (figure 2C)25 of ONH tissues could help clinicians to better visualise and model the 3D structural information in OCT images (figure 2A) specifically for each patient, thus improving the reliability of the prognosis offered. Eventually this could open doors for a range of precision medicine tools for the clinical forecasting, personalised pharmacological/surgical recommendations and monitoring glaucoma therapeutic efficacy.
Lastly, by exploiting the data regarding the surgical outcomes of several patients, AI could also help clinicians to evaluate surgical options, preoperative and postoperative management strategies, thus improving surgical outcomes.
There exist several challenges in the clinical translation of AI tools that warrants further discussion. First, the performance of these tools primarily depends on the quality of training data (quality of images and diagnosis, presence of artefacts). Furthermore, a large (>100 000 images) and diverse training set with a good mix of ophthalmic (severity of glaucoma, presence of other conditions) and non-ophthalmic factors (age, ethnicity, imaging device) is generally recommended to ensure clinical robustness.12 Curating such a training set in practice is expensive and daunting.
Second, while all the above studies discussing the use of AI to assess glaucoma seem encouraging, it must be noted that the AUCs are extremely subjective and cannot be used to directly compare different studies due to the following reasons. First, the prevalence of glaucoma, its type and severity94 varies among different regions,95 ethnicities,96 age and gender.97 Second, even among people with glaucoma, due to its asymptomatic nature, only half of them are aware of their disease in the developed countries, and even lower in low and middle-income countries.95 This may change what constitute normal and glaucoma populations across countries. Third, while the cohorts in retrospective clinic-based studies comprise patients who are diagnosed with glaucoma using well-defined criteria, the cohorts in prospective population-based studies can have patients with varying degrees of disease severity and prevalence depending on the region95 and demographics (ethnicity, gender, age).97 Thus, although the AUC is a single measure to summarise the overall diagnostic performance of a study, it must be interpreted with the context of the nature of study (clinic based/population based), population size and spectrum (type and severity) and demographics. Ideally, to truly assess diagnostic power, studies must be highly standardised and must use balanced data sets across multiple factors, including, but not limited to age, ethnicity, gender, type of glaucoma and severity. However, given the practical limitations in curating such data sets, the clinical acceptance for these algorithms thus lies in the repeated and reliable independent validations across multiple data sets. Extra caution must also be exerted when dealing with very high AUCs (>0.98) due to the discrepancies and sheer subjectivity in the diagnosis even among the experts.98 In addition, the diagnostic power calculated with AI on a given population should ideally be compared with that obtained with a gold standard parameter such as RNFL thickness on the exact same population. If performance is identical and high (AUC >0.90 for both cases), one should ultimately ask ‘Why is the AI algorithm’s diagnostic power not superior to that of a simple parameter?’ and ‘Why is the diagnostic power for RNFL thickness high for this given population?’ It is worth noting that it is relatively easy to get a very high AUC on a small population (~100 patients), but less so with data from tens of thousands or millions of patients. Finally, sensitivity (at 95% specificity) should also be reported to better understand performance.
Third, while longitudinal predictions from structural analysis of OCT images might help in the prognosis and even early diagnosis of glaucoma, the development of such an AI system is a clinical challenge. This is because the recruitment and follow-up of a large cohort is very expensive and cumbersome. Besides, the subjective definitions for suspect/moderate phenotypes could compromise the integrity of training data, eventually affecting the performance of such tools.
Fourth, the ‘black-box’ nature of AI algorithms has long offered resistance to its clinical adoption. AI algorithms discriminate based on the correlations/patterns they infer from the training data. These correlations may or may not be in concurrence with the theoretical cause. When exposed to high-dimensionality data (eg, a combination of ophthalmic and non-ophthalmic parameters), the algorithm might pick up intrinsic patterns in the data that might correlate to glaucoma, but might not be clinically correct. For instance, given the strong correlation in the prevalence of glaucoma with demographics (ethnicity, age, gender),97 the AI algorithm might learn to identify pathology merely based on the demographics and neglecting the ophthalmic parameters. The chances of identifying such clinically irrelevant correlations increase when dealing with high-dimensionality data (eg, OCT images), if the algorithm is not designed and validated carefully, thus leading to excess false positives and false negatives. Although a few studies16 75 have attempted to offer ‘clinical explainability’ of the results by visualising heat maps (or class activation maps) to understand the influence of different regions in the image towards the final diagnosis, these methods are still relatively brittle and under active development: the use of such visualisation tools is still in its infancy. The development of such tools is crucial for the widespread clinical adoption of AI algorithms.
Fifth, the ‘black-box’ nature also poses a host of conflicting medicolegal liability issues between clinical practice and industry.99 Given the performance subjectivity of these tools due to the variability in image quality and device, a clinical verification of the results is required before the final decision is made. However, clinical decisions based directly on the interpretation of AI systems lead to a higher liability on the manufacturer, thus affecting retail price.99 Clear medicolegal guidelines and a diagnostic pipeline involving both the AI systems and clinicians to minimise errors are essential for an economically feasible adoption and increased acceptability of these tools.
Finally, the approval of regulatory bodies such as the Food and Drug Administration and the European Medicines Agency required for the clinical use of these diagnostic tools is likely to be difficult.99 This is a result of the fact that the required performance for clinical acceptance is not yet clearly defined. Thus, multiple clinical studies comparing AI and traditional diagnostic practices are needed to better understand these systems, and increase the patient/clinician confidence in them.
In conclusion, clinicians in future may expect a plethora of AI tools to assist them in the day-to-day diagnosis and management of glaucoma. While the persistence of new clinical and technical challenges is undeniable, one cannot dismiss the ways in which AI could positively impact ophthalmic research and clinical practice in glaucoma.
Contributors SKD was the first author and drafted the manuscript. ZL coauthored and provided review on the usage of AI for functional analysis. THP coauthored and provided review on the usage of AI for structural analysis. CB, NGS, AHT and MJAG critically reviewed the manuscript.
Funding This work was supported by the Singapore Ministry of Education Academic Research Funds Tier 1 (R-397-000-294-114 (MJAG)); and the Singapore Ministry of Education Tier 2 (R-397-000-280-112, R-397-000-308-112 (MJAG)).
Competing interests MJAG and AHT are co-founders of Abyss Processing.
Patient consent for publication Not required.
Provenance and peer review Commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.