Feeding AI bad info can cause catastrophic health guidance – study

A recent investigation injected erroneous information into large language models, and AI passed it along as true, showing backstops are needed.

Feb 24 265 min read

Fred Bazzoli

Editor in Chief

It’s long been said that there are lies, damned lies and statistics. Also, another management or computing truism is this – garbage in, garbage out.

Mix those maxims together, and you could arrive at the potential for highly misleading results from artificial intelligence. And nowhere may the risk be higher than when using AI to develop recommendations for healthcare.

Training a model on bad data can yield bad results, and that can be compounded in using AI with health-based data. That’s because what can be fed into these systems may be of low quality, flawed, biased or incomplete.

While there can be almost blind trust that healthcare information is inviolable, there’s growing risk that AI can return inaccurate and potentially dangerous results if bad information serves as the data source for computation.

That’s been proven out in a recent study by the Icahn School of Medicine at Mount Sinai, posing a frightening dilemma for clinicians and adding new concerns for using AI in clinical decision-making, both now and in the future.

The research project concludes that “current safeguards do not reliably distinguish fact from fabrication once a claim is wrapped in familiar clinical or social-media language.”

The study’s premise

While many hold hope that medical AI can make patient care safer by helping clinicians manage information, a lot of damage can be done when a medical lie enters the system through datasets that are used to train them.

To test the potential for risk, Icahn researchers and collaborators created a study in which large language models were exposed to misinformation. Results of the study were published this week in The Lancet Digital Health.

To study the risk, the team exposed LLMs to three types of content.

Real hospital discharge summaries from the Medical Information Mart for Intensive Care (MIMIC) database, but with a single fabricated recommendation added.

Common health myths collected from Reddit.

A total of 300 short clinical scenarios written and validated by physicians.

“Each case was presented in multiple versions, from neutral wording to emotionally charged or leading phrasing, similar to what circulates on social platforms,” researchers explained.

The study analyzed more than a million prompts across nine leading language models, finding that these systems can repeat false medical claims when they appear in realistic hospital notes or social media health discussions.

For example, one discharge note falsely advised patients with esophagitis-related bleeding to “drink cold milk to soothe the symptoms.” Several models accepted the statement, and they did not flag it as unsafe – they treated it like medically accepted guidance and, outside of a research setting, it could have been passed along to patients as true.

Frightening conclusions

Those and other results from the study indicated that much work needs to be done before AI can be more fully trusted to provide trustworthy results.

“Our findings show that current AI systems can treat confident medical language as true by default, even when it’s clearly wrong,” says co-senior and co-corresponding author Eyal Klang, MD, chief of generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “A fabricated recommendation in a discharge note can slip through. It can be repeated as if it were standard care. For these models, what matters is less whether a claim is correct than how it is written.”

The question for organizations looking to inject AI into healthcare decisions is to critically assess whether existing systems can tease out misinformation and erroneous data before spitting out a conclusion.

Can systems detect bad info?

The study’s authors say the next step, “is to treat ‘Can this system pass on a lie?’ as a measurable property, using large-scale stress tests and external evidence checks before AI is built into clinical tools.”

“Hospitals and developers can use our dataset as a stress test for medical AI,” says physician-scientist and first author Mahmud Omar, MD. “Instead of assuming a model is safe, you can measure how often it passes on a lie and whether that number falls in the next generation.”

“AI has the potential to be a real help for clinicians and patients, offering faster insights and support,” says co-senior and co-corresponding author Girish N. Nadkarni, MD, chair of the Windreich Department of Artificial Intelligence and Human Health and chief AI officer of the Mount Sinai Health System. “But it needs built-in safeguards that check medical claims before they are presented as fact. Our study shows where these systems can still pass on false information, and points to ways we can strengthen them before they are embedded in care.”

There’s wide recognition that data governance and stewardship is important in advancing the use of trusted AI in healthcare – in fact, Julie Shay discussed this need last August in an illuminating article for Health Data Management. In addition, Kevin Ritter noted in a June article that it’s crucial to start with clean, standardized data when using AI.

The Icahn study goes beyond that, however, to show the importance of ensuring that AI systems are better able to screen out false information, whether it’s obviously false or just not consistently true.

Fred Bazzoli is the Editor in Chief of Health Data Management.

More for you

Loading data for hdm_tax_topic #better-outcomes...