More Reasoning, More Hallucinations for LLMs

Better reasoning apparently doesn’t prevent LLMs from spewing out false facts.

Independent testing from AI firm Vectara showed that the latest advanced reasoning models from OpenAI and DeepSeek hallucinate even more than previous models.

OpenAI’s o3 reasoning model scored a 6.8% hallucination rate on Vectara’s test, which asks the AI to summarize various news articles.

DeepSeek’s R1 fared even worse with a 14.3% hallucination rate, an especially poor performance considering that its older non-reasoning DeepSeek-V2.5 model clocked in at 2.4%.

On OpenAI’s more difficult SimpleQA tests, o3 and o4-mini hallucinated between 51-79% of the time, versus just 37% for its GPT-4.5 non-reasoning model.

OpenAI positions o3 as its most powerful model because it’s a “reasoning” model that takes more time to “think” and work out its answers step-by-step.

This process produces better answers for many use cases, but these reasoning models can also hallucinate at each step of their “thinking,” giving them even more chances for incorrect responses.

The Takeaway

Even though the general purpose models studied weren’t fine-tuned for healthcare, the results raise concerns about their safety in clinical settings – especially given how many physicians report using them in day-to-day practice.

We’re testing a new format today – let us know if you prefer two shorter Top Stories or one longer Top Story with this quick survey!

Function Lands $298M for Medical Intelligence November 24, 2025

The face of the health membership movement is now worth $2.5B after Function landed $298M of Series B funding to prove that subscription care is here to stay. Function is building an “operating system for human health.” The OS “fuses AI with medical expertise to empower 8 billion people to take control of their health […]

TrumpRx and the GLP-1 Land Grab November 20, 2025

It’s a bad day to be a pharma middleman. The White House announced the launch of TrumpRx in 2026, kicking off a wave of cost reductions on some of the most popular drugs in the world. TrumpRx looks exactly like it sounds. Here’s the website. It gets better. TrumpRx was part of a broader initiative […]

U.S. Healthcare is an “Abominable Creature” November 17, 2025

If a picture is worth a thousand words, then this chart from industry consultant Andrew Tsang might be worth a million. America’s healthcare system is “An Abominable Creature.” That’s also the title of Tsang’s stellar blog post that meticulously maps out every dollar that flows through it. The charts don’t just map spending, they map […]

Get the top digital health stories right in your inbox

You might also like

Function Lands $298M for Medical Intelligence November 24, 2025

TrumpRx and the GLP-1 Land Grab November 20, 2025

U.S. Healthcare is an “Abominable Creature” November 17, 2025