ChatGPT can pass medical exams but fails heart risk assessments

2 mins read May 6, 2024

In a heart risk assessment study, ChatGPT provided inconsistent answers most of the time for the same cases.
Experts think that it is due to the randomness built into the system to mimic natural language.
Experts like the AI-based software TIMI and HEART scale more because of their fixed assessments for each case.

ChatGPT has the ability to pass medical exams, according to reports, but it will not be a wise decision to rely on it for some serious health assessments, for example, if a patient with chest pain needs to be hospitalized, according to new research.

ChatGPT is clever but fails at heart assessment

In research published in the journal PLOS ONE, ChatGPT provided different conclusions by returning inconsistent heart risk levels for the same patient in a study that involved thousands of chest pain patients.

A researcher at Washington State University’s Elson S. Floyd College of Medicine, Dr. Thomas Heston, who was also the lead author of the research, said,

“ChatGPT was not acting in a consistent manner; given the exact same data, ChatGPT would give a score of low risk, then next time an intermediate risk, and occasionally it would go as far as giving a high risk.”
Source: WSU.

According to the researchers, the issue is probably due to the degree of randomness built into the recent version of the software, ChatGPT-4, because it helps it diversify its answers to mimic natural language. But Heston says that this same level of randomness does not work for use cases in healthcare and can be dangerous, as it demands a single, consistent answer.

Doctors need to quickly evaluate the urgency of a patient’s condition, as chest pains are an everyday complaint in hospital emergency rooms.

Some of the very serious patients can be easily identified by their symptoms, but the trickier ones are those who have lower risk, said Dr. Heston, especially when they need to decide whether someone is out of risk enough to be sent home with outpatient care services or should be admitted.

Other systems prove more reliable

An AI neural network like ChatGPT, which is trained on a high number of parameters with huge datasets, can assess billions of variables in seconds, which gives it the ability to understand a complex scenario faster and in a much more detailed way.

Dr. Heston says that medical professionals mostly use two models for heart risk assessments called HEART and TIMI, and he likes software as they use a number of variables, including age, health history, and symptoms, and they rely on fewer variables than ChatGPT.

For the research study, Dr. Heston and his coworker, Dr. Lawrence Lewis, of the St. Louis campus of the same university, used three datasets of 10,000 randomly simulated cases each. One data set had five variables from the heart scale; another included seven variables from the TIMI; and the third had 44 variables that were randomly selected.

For the first two datasets, ChatGPT produced inconsistent risk assessment 45% to 48% of the time on the individual simulated cases compared to a constant score of TIMI and HEART. But for the third dataset, despite running it multiple times, ChatGPT returned different results for the same cases.

Dr. Heston thinks that there is greater potential for GenAI in healthcare as the technology advances, despite the unsatisfactory findings of the study. According to him, medical records can be uploaded to the systems, and if an emergency arrives, doctors could ask ChatGPT to provide the most important facts about the patient. It can also be asked to generate some possible diagnoses and the reasoning for each one, which will help doctors see through a problem.

Don’t just read crypto news. Understand it. Subscribe to our newsletter. It's free.

Share this article

Disclaimer: The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decision.

Aamir Sheikh

Aamir is a tech journalist with nearly six years of experience in the crypto and tech industries. He graduated from MAJ University with an MBA in Finance and Marketing. He now works with Cryptopolitan, where he reports on the latest developments in the cryptocurrency markets and price prediictions.

TABLE OF CONTENT

1. ChatGPT is clever but fails at heart assessment

2. Other systems prove more reliable

Share this article