Artificial Intelligence

OpenAI Delivers Largest-Ever Study of Clinical AI

OpenAI Penda

Hot on the heels of launching its HealthBench medical AI benchmark, OpenAI just delivered results from the largest-ever study of clinical AI in actual practice – and let’s just say the future’s looking bright.

40,000 visits, 106 clinicians, 15 clinics. OpenAI went big to get real-world data, equipping Kenya-based primary and urgent care provider Penda Health with AI Consult (GPT4o) clinical decision support within its EHR.

  • The study split 106 Penda clinicians into two even groups (half with AI Consult, half without), then tracked outcomes over a three month period. 

When AI Consult detected a potential error in history, diagnosis, or treatment, it triggered a simple Traffic Light alert.

  • Green – No concerns, no action needed
  • Yellow – Moderate concerns, optional clinician review 
  • Red – Safety-critical concerns, mandatory clinician review

The results were definitely promising. Clinicians using AI Consult saw a:

  • 16% reduction in diagnostic errors
  • 13% reduction in treatment errors
  • 32% reduction history-taking errors

The “training effect” is real. The AI Consult group got significantly better at avoiding common mistakes over time, triggering fewer alerts as the study progressed.

  • Part of that is because Penda took several steps to help along the way, including one-on-one training, peer champions, and performance feedback.
  • It’s also worth noting that there was no recorded harm as a result of AI Consult suggestions, and 100% of the clinicians using it said that it improved their quality of care.

What’s the catch? While AI Consult led to a clear reduction in clinical errors, there was no statistically significant difference in patient-reported outcomes, and clinicians using the copilot saw slightly longer visit times.

The Takeaway

Clinical AI continues to prove itself outside of multiple choice licensing exams / clinical vignettes, and OpenAI just gave us our best evidence yet that general-purpose models can reduce errors in actual patient care.

Get the top digital health stories right in your inbox

You might also like

Like the website? You'll love the newsletter

Completely free. Every Monday and Thursday.

DHW New Phone Image

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Digital Health Wire team

You're all set!