Hidden Flaws Behind High Accuracy of Clinical AI

AI is getting pretty darn good at patient diagnosis challenges… but don’t bother asking it to show its work.

A new study in npj Digital Medicine pitted GPT-4V against human physicians on 207 image challenges designed to test the reader’s ability to diagnose a patient based on a series of pictures and some basic clinical background info.

Researchers at the NIH and Weill Cornell Medicine then asked GPT-4V to provide step-by-step reasoning for how it chose the answer.

Nine physicians then tackled the same questions in both a closed-book (no outside help) and open-book format (could use outside materials and online resources).

How’d they stack up?

GPT-4V and the physicians both scored high marks for accurate diagnoses (81.6% vs. 77.8%), with a statistically insignificant difference in performance.

GPT-4V bested the physicians on the closed-book test, selecting more correct diagnoses.

Physicians bounced back to beat GPT-4V on the open-book test, particularly on the most difficult questions.

GPT-4V also performed well in cases where physicians answered incorrectly, maintaining over 78% accuracy.

Good job AI, but there’s a catch. The rationales that GPT-4V provided were riddled with mistakes – even if the final answer was correct – with error rates as high as 27% for image comprehension.

The Takeaway

There could easily come a day when clinical AI surpasses human physicians on the diagnosis front, but that day isn’t here quite yet. Real care delivery also doesn’t bless physicians with a set of multiple choice options, and hallucinating the rationale behind diagnoses doesn’t cut it with actual patients.

8VC’s Vision for Healthcare AI in America December 18, 2025

8VC just dropped its Vision for Healthcare AI in America, and it’s the best roadmap we’ve seen for removing the barriers between AI and its potential to transform medicine. Great cakes have three layers, maybe four. Before 8VC shared its recipe for how AI can help fix things, it laid out the four main ingredients […]

Rock Health: Innovation at the Turn of 2026 December 15, 2025

Rock Health is wrapping up the year in style by updating its Innovation Maturity Curve with the hottest trends of 2025 and sharing its predictions for what lies ahead. The curve uses three major data points to plot innovation: The pace is picking up. Here’s a look at the categories that defined the year: Longevity […]

Artera Raises $65M and Hits Nine-Figure CARR December 11, 2025

Patient communications still feel stuck in the Dark Ages, which is why Artera.io just raised $65M to flip on the AI-powered floodlights. We need to work on communication. Healthcare’s “communication crisis” can be traced back to a couple distinct challenges. One part engagement, one part infrastructure. Artera tackles these issues the only way any self-respecting […]

Get the top digital health stories right in your inbox

You might also like

8VC’s Vision for Healthcare AI in America December 18, 2025

Rock Health: Innovation at the Turn of 2026 December 15, 2025

Artera Raises $65M and Hits Nine-Figure CARR December 11, 2025