Article Text

Download PDFPDF
Clinical science
Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases
  1. Matteo Mario Carlà1,2,
  2. Gloria Gambini1,2,
  3. Antonio Baldascino1,2,
  4. Federico Giannuzzi1,2,
  5. Francesco Boselli1,2,
  6. Emanuele Crincoli1,2,
  7. Nicola Claudio D’Onofrio1,2,
  8. Stanislao Rizzo1,2
  1. 1 Ophthalmology Department, Catholic University “Sacro Cuore”, Rome, Italy
  2. 2 Ophthalmology Department, Fondazione Policlinico Universitario "A. Gemelli", IRCCS, Rome, Italy
  1. Correspondence to Dr Matteo Mario Carlà; mm.carla94{at}gmail.com

Abstract

Background We aimed to define the capability of three different publicly available large language models, Chat Generative Pretrained Transformer (ChatGPT-3.5), ChatGPT-4 and Google Gemini in analysing retinal detachment cases and suggesting the best possible surgical planning.

Methods Analysis of 54 retinal detachments records entered into ChatGPT and Gemini’s interfaces. After asking ‘Specify what kind of surgical planning you would suggest and the eventual intraocular tamponade.’ and collecting the given answers, we assessed the level of agreement with the common opinion of three expert vitreoretinal surgeons. Moreover, ChatGPT and Gemini answers were graded 1–5 (from poor to excellent quality), according to the Global Quality Score (GQS).

Results After excluding 4 controversial cases, 50 cases were included. Overall, ChatGPT-3.5, ChatGPT-4 and Google Gemini surgical choices agreed with those of vitreoretinal surgeons in 40/50 (80%), 42/50 (84%) and 35/50 (70%) of cases. Google Gemini was not able to respond in five cases. Contingency analysis showed significant differences between ChatGPT-4 and Gemini (p=0.03). ChatGPT’s GQS were 3.9±0.8 and 4.2±0.7 for versions 3.5 and 4, while Gemini scored 3.5±1.1. There was no statistical difference between the two ChatGPTs (p=0.22), while both outperformed Gemini scores (p=0.03 and p=0.002, respectively). The main source of error was endotamponade choice (14% for ChatGPT-3.5 and 4, and 12% for Google Gemini). Only ChatGPT-4 was able to suggest a combined phacovitrectomy approach.

Conclusion In conclusion, Google Gemini and ChatGPT evaluated vitreoretinal patients’ records in a coherent manner, showing a good level of agreement with expert surgeons. According to the GQS, ChatGPT’s recommendations were much more accurate and precise.

  • Retina
  • Vitreous
  • Medical Education
  • Ophthalmologic Surgical Procedures
  • Surveys and Questionnaires

Data availability statement

Data are available on reasonable request. The data that support the findings of this study are available from the corresponding author, MMC, on reasonable request.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data availability statement

Data are available on reasonable request. The data that support the findings of this study are available from the corresponding author, MMC, on reasonable request.

View Full Text

Footnotes

  • Contributors Conceptualisation, MMC and GG; methodology, MMC; validation, AB; formal analysis, EC; investigation, MMC; writing—original draft preparation, MMC; writing—review and editing FG, FB and NCD'O; guarantor, MMC; project administration, SR. All authors have read and agreed to the published version of the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles

  • Highlights from this issue
    Frank Larkin