HealthcareAI's usefulness in emergency room diagnoses is limited to...

AI’s usefulness in emergency room diagnoses is limited to presentation of typical symptoms, researchers find


WVU researchers test AI's limits in emergency room diagnoses
West Virginia University researchers have determined that AI technology can use input from physicians’ exam notes to assist in diagnosing diseases for patients with classic symptoms. Credit: WVU / Greg Ellis

Artificial intelligence tools can assist emergency room physicians in accurately predicting disease, but only for patients with typical symptoms, West Virginia University scientists have found.

Gangqing “Michael” Hu, assistant professor in the WVU School of Medicine Department of Microbiology, Immunology and Cell Biology and director of the WVU Bioinformatics Core facility, led a study that compared the precision and accuracy of four ChatGPT models in making medical diagnoses and explaining their reasoning.

His findings, published in the journal Scientific Reports, demonstrate the need for incorporating greater amounts of different types of data in training AI technology to assist in disease diagnosis.

More data can make the difference in whether AI gives patients the correct diagnoses for what are called “challenging cases,” which don’t exhibit classic symptoms. As an example, Hu pointed to a trio of scenarios from his study involving patients who had pneumonia without the typical fever.

“In these three cases, all of the GPT models failed to give an accurate diagnosis,” Hu said. “That made us dive in to look at the physicians’ notes and we noticed the pattern of these being challenging cases. ChatGPT tends to get a lot of information from different resources on the internet, but these may not cover atypical disease presentation.”

The study analyzed data from 30 public emergency department cases, which—for reasons of privacy—did not include demographics.

Hu explained that in using ChatGPT to assist with diagnosis, physicians’ notes are uploaded, and the tool is asked to provide its top three diagnoses. Results varied for the versions Hu tested: the GPT-3.5, GPT-4, GPT-4o and o1 series.

“When we looked at whether the AI models gave the correct diagnosis in any of their top three results, we didn’t see a significant improvement between the new version and the older version,” he said. “But when we look at each model’s number one diagnosis, the new version is about 15% to 20% higher in accuracy than the older version.”

Given AI models’ current low performance on complex and atypical cases, Hu said human oversight is a necessity for high-quality, patient-centered care when using AI as an assistive tool.

“We didn’t do this study out of curiosity to see if the new model would give better results. We wanted to establish a basis for future studies that involve additional input,” Hu said. “Currently, we input physician notes only. In the future, we want to improve the accuracy by including images and findings from laboratory tests.”

Hu also plans to expand on findings from one of his recent studies in which he applied the ChatGPT-4 model to the task of role-playing a physiotherapist, psychologist, nutritionist, artificial intelligence expert and athlete in a simulated panel discussion about sports rehabilitation.

He said he believes a model like that can improve AI’s diagnostic accuracy by taking a conversational approach in which multiple AI agents interact.

“From a position of trust, I think it’s very important to see the reasoning steps,” Hu said. “In this case, high-quality data including both typical and atypical cases helps build trust.”

Hu emphasized that while ChatGPT is promising, it is not a certified medical device. He said if were to include images or other data in a , the AI model would be an open-source system and installed in a hospital cluster to comply with privacy laws.

Other contributors to the study were Jinge Wang, a postdoctoral fellow, and Kenneth Shue, a lab volunteer from Montgomery County, Maryland, both in the School of Medicine Department of Microbiology, Immunology and Cell Biology; as well as Li Liu, Arizona State University.

Hu noted that future research on using ChatGPT in emergency departments could examine whether enhancing AIs’ abilities to explain their reasoning could contribute to triage or decisions about patient treatment.

More information:
Jinge Wang et al, Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics, Scientific Reports (2025). DOI: 10.1038/s41598-025-95233-1

Citation:
AI’s usefulness in emergency room diagnoses is limited to presentation of typical symptoms, researchers find (2025, May 20)
retrieved 20 May 2025
from https://medicalxpress.com/news/2025-05-ai-usefulness-emergency-room-limited.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Original Source Link

Latest News

Brickbat: Border Bills

A new border reform bill proposed by Canada's Liberal Party government would ban cash transactions of $10,000 (U.S....

Fewer scavengers could mean more zoonotic disease

Scavengers often get a bad rap — hyena giggles are nefarious, crows gather in “murders” and the naked...

Eli Lilly is making it cheaper and easier to buy high doses of blockbuster weight-loss drug Zepbound

Patients with obesity can now get Zepbound’s strongest doses for a flat rate through Lilly’s self-pay pharmacy. Eli Lilly...

Amazon Prime Day 2025 stretches to four days of deals

Amazon has announced the dates when its annual Prime Day deal extravaganza will kick off, and it’s happening...

Must Read

- Advertisement -

You might also likeRELATED
Recommended to you