The Lancet Digital Health just published one of the largest-ever stress tests on medical misinformation in LLMs, and it looks like most models still struggle to separate fact from fiction.
Here’s the setup. Researchers probed 20 LLMs with over 3M prompts containing medical information from three different sources: social media posts, simulated clinical vignettes, or real hospital discharge notes with a single fabricated recommendation inserted.
- Each prompt was presented in multiple versions, once with neutral wording to establish a baseline, then with a series of variations that were emotionally charged or leading.
- Ten logical fallacies were also used to test how framing influences model behavior, such as appeals to authority (a physician said…) or popularity (everyone agrees that…).
LLMs love fake news. The susceptibility was shockingly high across all models, with the medical misinformation accepted in 32% of the neutral base prompts.
- That jumped to 46% when the misinformation was embedded in formal discharge notes, but at least the models were more skeptical of the social media content (9%).
Other findings were more counter-intuitive. Eight of the 10 logical fallacies ended up reducing the misinformation acceptance rate rather than increasing it like the authors expected.
- Only appeals to authority (+2.9 percentage points above the base prompts) and slippery slope prompts (+2.2pp) increased susceptibility, a relatively small impact considering appeals to popularity slashed it by nearly 20pp.
- Larger models were generally safer, although the language and phrasing had a far greater influence than the parameter count alone.
- It was also surprising to see that the medical models performed worse than the general purpose models, with many having weaker lie detectors despite the specialization.
Improving LLM safety is about more than making bigger models. It’s about knowing how information gets presented by actual humans, and having guardrails in place that hold up even when that information is wrong.
The Takeaway
Benchmark performance isn’t real-world performance, and this study provides another reminder that a model’s ability to separate fact from fiction is often more important than its test scores.
