Anthropic and OpenAI Set Sights on Providers

Digital health has some fresh competition. Less than a week after OpenAI launched ChatGPT Health, Anthropic crashed the party with the grand debut of Claude for Healthcare

Player 2 has entered the fight. Anthropic’s headlining feature for consumers is identical to ChatGPT Health – the answers are grounded in the patient’s own medical history.

  • Claude for Healthcare lets patients securely upload their health records and app data to unlock the same wide-ranging benefits as ChatGPT Health, such as spotting trends, preparing for visits, interpreting lab results… so on and so forth.
  • The two even share some overlapping partner apps like Function and Apple Health, but the similarities end there. 

Claude for Healthcare gets providers in on the action. Unlike OpenAI’s shiny new patient-facing solution, Claude for Healthcare comes with a suite of “Connectors” that enable it to support previously out-of-reach workflows. The list includes:

  • Prior auth reviews and coverage verifications [CMS Coverage Database]
  • Medical coding and billing accuracy [ICD-10]
  • Provider verification and credentialing [NPI Registry]

OpenAI hasn’t taken any days off. It followed up last week’s big ChatGPT Health news with the launch of ChatGPT for Healthcare – similar names, very different products.

  • ChatGPT for Healthcare is OpenAI’s enterprise solution to the Anthropic problem. It brings new provider-facing capabilities like care path management, referral letter generation, and clinical search (tough break for Doximity and Wolters Kluwer).

The fun doesn’t end there. OpenAI added to its hot streak by picking up Torch, a four-person startup building “a medical memory for AI.” The Information pinned the price tag at $100M. 

  • Torch feeds scattered records into a context engine that connects the dots between visit notes, lab results, wearable data, and any other medical info you can think of. 
  • That pitch rhymes perfectly with ChatGPT Health’s value prop, and the Torch team will now be helping boost the new solution’s medical memory across its inaugural cohort of partner apps.

The Takeaway

What a week for our little corner of the industry. OpenAI and Anthropic are diving in head first, and their tech, ambition, and pockets might even be deeper than the choppy legal waters.

OpenAI Jumps Into Healthcare Arena With ChatGPT Health

If OpenAI wasn’t already a major healthcare player, the launch of ChatGPT Health definitely just made it one.

It’s the gamechanger everyone saw coming. OpenAI even teed up the launch with a report showing that 40M people are already using ChatGPT for healthcare advice on a daily basis. 

ChatGPT Health is about to take that a massive step further. 

Here’s a look at the core features:

  • ChatGPT Health operates inside a dedicated health environment with additional privacy layers (conversations aren’t used for model training, optional two-factor authentication).
  • Users can securely upload their complete medical records (courtesy of b.well).
  • Users can connect apps to inform answers (Apple Health, Function, MyFitnessPal).
  • The model uses longitudinal health data, labs, and visit summaries to help spot trends.

OpenAI is moving beyond general health advice. The extra clinical context gives ChatGPT Health the ability to give better answers at scale, and that’s good news for patients.

A few of the most obvious benefits for patients include:

  • Empowering them to take a more active role in their care.
  • Helping them uncover trends in their overall health.
  • Reducing confusion around test results.
  • Reinforcing care plans between visits.
  • The list could go on for a while.

ChatGPT Health isn’t actually HIPAA compliant. Then again, it doesn’t need to be.

  • Consumer health apps like ChatGPT Health aren’t covered by HIPAA, and to OpenAI’s credit it appears to have done a great job with the necessary disclaimers.
  • The dedicated health environment was also developed with input from 260+ physicians, and it leverages a physician-authored framework for safety, clarity, and escalation.

The question now is, who’s accountable when things go wrong? Millions of patients are about to start showing up to visits armed with advice from ChatGPT Health, which means its AI fingerprints will be all over their questions, concerns, and even clinical decisions. The tech might be ready. The governance isn’t.

  • When ChatGPT Health mentions an unproven treatment and a patient follows through, or interprets a worrying lab value as benign, who carries the liability?
  • OpenAI? The physicians who authored the safety framework? The patient who followed the advice? It’s tough to say, but providers – and their patients – still need a clear answer.

The Takeaway

Everyone wants a doctor in their pocket, and ChatGPT Health just filled that role for millions of patients… even if OpenAI explicitly told them it wasn’t up for the job.

OpenAI Delivers Largest-Ever Study of Clinical AI

Hot on the heels of launching its HealthBench medical AI benchmark, OpenAI just delivered results from the largest-ever study of clinical AI in actual practice – and let’s just say the future’s looking bright.

40,000 visits, 106 clinicians, 15 clinics. OpenAI went big to get real-world data, equipping Kenya-based primary and urgent care provider Penda Health with AI Consult (GPT4o) clinical decision support within its EHR.

  • The study split 106 Penda clinicians into two even groups (half with AI Consult, half without), then tracked outcomes over a three month period. 

When AI Consult detected a potential error in history, diagnosis, or treatment, it triggered a simple Traffic Light alert.

  • Green – No concerns, no action needed
  • Yellow – Moderate concerns, optional clinician review 
  • Red – Safety-critical concerns, mandatory clinician review

The results were definitely promising. Clinicians using AI Consult saw a:

  • 16% reduction in diagnostic errors
  • 13% reduction in treatment errors
  • 32% reduction history-taking errors

The “training effect” is real. The AI Consult group got significantly better at avoiding common mistakes over time, triggering fewer alerts as the study progressed.

  • Part of that is because Penda took several steps to help along the way, including one-on-one training, peer champions, and performance feedback.
  • It’s also worth noting that there was no recorded harm as a result of AI Consult suggestions, and 100% of the clinicians using it said that it improved their quality of care.

What’s the catch? While AI Consult led to a clear reduction in clinical errors, there was no statistically significant difference in patient-reported outcomes, and clinicians using the copilot saw slightly longer visit times.

The Takeaway

Clinical AI continues to prove itself outside of multiple choice licensing exams / clinical vignettes, and OpenAI just gave us our best evidence yet that general-purpose models can reduce errors in actual patient care.

OpenAI Dives Into Healthcare With HealthBench

OpenAI is officially setting its sights on healthcare with the launch of HealthBench, a new benchmark for evaluating AI performance in realistic medical scenarios.

HealthBench marks the first time the ChatGPT developer has taken a direct step into the industry without a partner to hold its hand.

  • Developed with 262 physicians from 60 countries, HealthBench includes 5,000 simulated health conversations, each with a custom rubric to grade the responses.
  • The conversations “were created to be realistic and similar to real-world use of LLMs,” meaning they’re multi-turn and multilingual, while spanning a range of medical specialties and themes like handling uncertainty or global health.

Here’s how current frontier models stacked up in the HealthBench test.

  • OpenAI’s o3 was the best performing model with a score of 60%
  • xAI’s Grok 3 ranked second with a score of 54%
  • Google’s Gemini 2.5 Pro followed close behind at 52%

All three leading models outperformed physicians who weren’t equipped with AI, although physicians outperformed the newer models when they had access to the AI output.

  • The paper also reviewed other LLMs like Llama and Claude, but unsurprisingly none of them scored higher than OpenAI’s model on OpenAI’s own test.

Even the best models came up short in a few common places, AKA areas that developers should focus on to improve performance.

  • Current AI models would rather hallucinate than withhold an answer they aren’t confident on, obviously not a good trait to bring into a clinical setting.
  • None of the leading LLMs were great at asking for additional context or more information when the input was vague.
  • When AI misses, it misses bad, as seen in the sharp quality dropoff with the worst 10% of responses.

The Takeaway

Outside of giving us yet another datapoint that AI is catching up to human physicians, HealthBench provides one of the best standardized ways to compare model performance in (simulated) clinical practice, and that’s just what the innovation doctor ordered.

Get the top digital health stories right in your inbox