Article Text

Download PDFPDF
Clinical science
Large language models: a new frontier in paediatric cataract patient education
  1. Qais Dihan1,2,
  2. Muhammad Z Chauhan2,
  3. Taher K Eleiwa3,
  4. Andrew D Brown4,
  5. Amr K Hassan5,
  6. Mohamed M Khodeiry6,
  7. Reem H Elsheikh2,
  8. Isdin Oke7,
  9. Bharti R Nihalani7,
  10. Deborah K VanderVeen7,
  11. Ahmed B Sallam2,
  12. Abdelrahman M Elhusseiny2,7
  1. 1 Rosalind Franklin University of Medicine and Science Chicago Medical School, North Chicago, Illinois, USA
  2. 2 Deparment of Ophthalmology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
  3. 3 Department of Ophthalmology, Benha University, Benha, Egypt
  4. 4 University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
  5. 5 Department of Ophthalmology, South Valley University, Qena, Egypt
  6. 6 Department of Ophthalmology, University of Kentucky, Lexington, Kentucky, USA
  7. 7 Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
  1. Correspondence to Abdelrahman M Elhusseiny; AMELhusseiny{at}uams.edu

Abstract

Background/aims This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract.

Methods We compared LLMs’ responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was ‘easily understandable by an average American.’ Prompt B modified prompt A and requested the handout be written at a ‘sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.’ Prompt C rewrote existing PEMs on paediatric cataract ‘to a sixth-grade reading level using the SMOG readability formula’. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable).

Results All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3).

Conclusion LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract.

  • Medical Education
  • Public health
  • Epidemiology
  • Child health (paediatrics)

Data availability statement

Data are available on reasonable request. Data to be made are available on reasonable request.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data availability statement

Data are available on reasonable request. Data to be made are available on reasonable request.

View Full Text

Footnotes

  • X @MuhammadZainCh8

  • Contributors Guarantor: AME. QD contributed with design, statistical analysis, writing and editing. MZC contributed with data visualisation, statistical analysis and editing. TKE contributed with data collection, writing and editing. ADB contributed with source evaluation. AKH contributed with statistical analysis and source evaluation. MMK contributed with data collection and analysis, REH contributed with data collection and analysis, IO contributed with editing. BN contributed with editing. DKV contributed with editing. ABS contributed with writing and editing. AME contributed with design, supervision, statistical analysis, review and editing and he is the guarantor for this publication

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles

  • Highlights from this issue
    Frank Larkin