Purpose To describe an artificial intelligence platform that detects thyroid eye disease (TED).
Design Development of a deep learning model.
Methods 1944 photographs from a clinical database were used to train a deep learning model. 344 additional images (‘test set’) were used to calculate performance metrics. Receiver operating characteristic, precision–recall curves and heatmaps were generated. From the test set, 50 images were randomly selected (‘survey set’) and used to compare model performance with ophthalmologist performance. 222 images obtained from a separate clinical database were used to assess model recall and to quantitate model performance with respect to disease stage and grade.
Results The model achieved test set accuracy of 89.2%, specificity 86.9%, recall 93.4%, precision 79.7% and an F1 score of 86.0%. Heatmaps demonstrated that the model identified pixels corresponding to clinical features of TED. On the survey set, the ensemble model achieved accuracy, specificity, recall, precision and F1 score of 86%, 84%, 89%, 77% and 82%, respectively. 27 ophthalmologists achieved mean performance of 75%, 82%, 63%, 72% and 66%, respectively. On the second test set, the model achieved recall of 91.9%, with higher recall for moderate to severe (98.2%, n=55) and active disease (98.3%, n=60), as compared with mild (86.8%, n=68) or stable disease (85.7%, n=63).
Conclusions The deep learning classifier is a novel approach to identify TED and is a first step in the development of tools to improve diagnostic accuracy and lower barriers to specialist evaluation.
- Diagnostic tests/Investigation
Data availability statement
No data are available.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
JK and LG are joint first authors.
JK and LG contributed equally.
Correction notice The authorship of this article has been updated since it was first published. JK and LG are joint first authors.
Contributors JK conceived the study, formulated data collection strategies, developed data collection tools, collected and labelled photographic data, monitored data collection, cleaned the data, wrote the statistical analysis plan, analysed the data and drafted and revised the manuscript. He is the guarantor. LG cleaned the data, designed the deep learning ensemble model, supervised training and validation of the deep learning ensemble model, performed inference using the deep learning ensemble model, wrote the statistical analysis plan, analysed the data and drafted and revised the manuscript. NL and KT participated in the design, training and validation of the deep learning ensemble model. KD, JF and BP collected and labelled photographic data and cleaned the data. JZ, WW and EE served as scientific advisors. DR conceived the study, analysed the data, served as a scientific advisor and revised the draft manuscript.
Funding This research was supported by an unrestricted grant to the Stein Eye Institute from Research to Prevent Blindness, Inc. There is no award or grant number associated with this financial support.
Competing interests JK and DR are on the Tepezza Speakers’ Bureau for Horizon Pharma. JK is a consultant for Triage Inc. The authors otherwise have no conflicts of interests to report.
Provenance and peer review Not commissioned; externally peer reviewed.