Article Text

Download PDFPDF
ICGA-GPT: report generation and question answering for indocyanine green angiography images
  1. Xiaolan Chen1,
  2. Weiyi Zhang1,
  3. Ziwei Zhao1,
  4. Pusheng Xu2,
  5. Yingfeng Zheng2,
  6. Danli Shi1,3,
  7. Mingguang He1,3,4
  1. 1School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
  2. 2State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, Guangdong, China
  3. 3Research Centre for SHARP Vision (RCSV), The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
  4. 4Centre for Eye and Vision Research (CEVR), 17W Hong Kong Science Park, Hong Kong, China
  1. Correspondence to Dr Danli Shi, The Hong Kong Polytechnic University, Kowloon, Hong Kong 999077, China; danli.shi{at}polyu.edu.hk

Abstract

Background Indocyanine green angiography (ICGA) is vital for diagnosing chorioretinal diseases, but its interpretation and patient communication require extensive expertise and time-consuming efforts. We aim to develop a bilingual ICGA report generation and question-answering (QA) system.

Methods Our dataset comprised 213 129 ICGA images from 2919 participants. The system comprised two stages: image–text alignment for report generation by a multimodal transformer architecture, and large language model (LLM)-based QA with ICGA text reports and human-input questions. Performance was assessed using both qualitative metrics (including Bilingual Evaluation Understudy (BLEU), Consensus-based Image Description Evaluation (CIDEr), Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence (ROUGE-L), Semantic Propositional Image Caption Evaluation (SPICE), accuracy, sensitivity, specificity, precision and F1 score) and subjective evaluation by three experienced ophthalmologists using 5-point scales (5 refers to high quality).

Results We produced 8757 ICGA reports covering 39 disease-related conditions after bilingual translation (66.7% English, 33.3% Chinese). The ICGA-GPT model’s report generation performance was evaluated with BLEU scores (1–4) of 0.48, 0.44, 0.40 and 0.37; CIDEr of 0.82; ROUGE of 0.41 and SPICE of 0.18. For disease-based metrics, the average specificity, accuracy, precision, sensitivity and F1 score were 0.98, 0.94, 0.70, 0.68 and 0.64, respectively. Assessing the quality of 50 images (100 reports), three ophthalmologists achieved substantial agreement (kappa=0.723 for completeness, kappa=0.738 for accuracy), yielding scores from 3.20 to 3.55. In an interactive QA scenario involving 100 generated answers, the ophthalmologists provided scores of 4.24, 4.22 and 4.10, displaying good consistency (kappa=0.779).

Conclusion This pioneering study introduces the ICGA-GPT model for report generation and interactive QA for the first time, underscoring the potential of LLMs in assisting with automated ICGA image interpretation.

  • Imaging

Data availability statement

Data are available upon reasonable request. The authors do not have the authorisation to distribute the dataset.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data availability statement

Data are available upon reasonable request. The authors do not have the authorisation to distribute the dataset.

View Full Text

Footnotes

  • XC and WZ contributed equally.

  • Correction notice This paper has been corrected since it was first pubished. The contributors statement and the data sharing statement have been changed.

  • Contributors DS and XC conceived the study. DS and WZ built the deep learning model. XC, WZ, ZZ and PX performed the data analysis. XC wrote the manuscript. All authors have commented on the manuscript. DS is the guarantor.

  • Funding This study was supported by the Start-up Fund for RAPs under the Strategic Hiring Scheme (P0048623) and the Global STEM Professorship Scheme (P0046113) from HKSAR.

  • Disclaimer The sponsor or funding organisation had no role in the design or conduct of this research.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.