OpenAI Jumps Into Healthcare Arena With ChatGPT Health

If OpenAI wasn’t already a major healthcare player, the launch of ChatGPT Health definitely just made it one.

It’s the gamechanger everyone saw coming. OpenAI even teed up the launch with a report showing that 40M people are already using ChatGPT for healthcare advice on a daily basis. 

ChatGPT Health is about to take that a massive step further. 

Here’s a look at the core features:

  • ChatGPT Health operates inside a dedicated health environment with additional privacy layers (conversations aren’t used for model training, optional two-factor authentication).
  • Users can securely upload their complete medical records (courtesy of b.well).
  • Users can connect apps to inform answers (Apple Health, Function, MyFitnessPal).
  • The model uses longitudinal health data, labs, and visit summaries to help spot trends.

OpenAI is moving beyond general health advice. The extra clinical context gives ChatGPT Health the ability to give better answers at scale, and that’s good news for patients.

A few of the most obvious benefits for patients include:

  • Empowering them to take a more active role in their care.
  • Helping them uncover trends in their overall health.
  • Reducing confusion around test results.
  • Reinforcing care plans between visits.
  • The list could go on for a while.

ChatGPT Health isn’t actually HIPAA compliant. Then again, it doesn’t need to be.

  • Consumer health apps like ChatGPT Health aren’t covered by HIPAA, and to OpenAI’s credit it appears to have done a great job with the necessary disclaimers.
  • The dedicated health environment was also developed with input from 260+ physicians, and it leverages a physician-authored framework for safety, clarity, and escalation.

The question now is, who’s accountable when things go wrong? Millions of patients are about to start showing up to visits armed with advice from ChatGPT Health, which means its AI fingerprints will be all over their questions, concerns, and even clinical decisions. The tech might be ready. The governance isn’t.

  • When ChatGPT Health mentions an unproven treatment and a patient follows through, or interprets a worrying lab value as benign, who carries the liability?
  • OpenAI? The physicians who authored the safety framework? The patient who followed the advice? It’s tough to say, but providers – and their patients – still need a clear answer.

The Takeaway

Everyone wants a doctor in their pocket, and ChatGPT Health just filled that role for millions of patients… even if OpenAI explicitly told them it wasn’t up for the job.

8VC’s Vision for Healthcare AI in America

8VC just dropped its Vision for Healthcare AI in America, and it’s the best roadmap we’ve seen for removing the barriers between AI and its potential to transform medicine.

Great cakes have three layers, maybe four. Before 8VC shared its recipe for how AI can help fix things, it laid out the four main ingredients that it’ll be working with.

  • Level 0: Administrative – AI that supports providers in the back office. Example: AI scheduling agents, scribes.
  • Level 1: Assistive – AI that assists clinicians but doesn’t diagnose, treat, or triage, or prescribe medications to patients. Example: AI coaches, navigators.
  • Level 2: Supervised Autonomous – AI that does all the things that Level 1 doesn’t, with decisions supervised by a clinician. Example: AI medication management.
  • Level 3: Autonomous – AI that diagnoses, treats, triages, or prescribes medications completely on its own. Example: fully-autonomous triage lines.

Now for the vision. Most healthcare AI solutions currently live on Level 0. They’re creating real value for providers, but they aren’t going to steer the Titanic away from the iceberg.

  • 8VC thinks the other levels might, but not unless we remove the legal barriers that are preventing our innovators from innovating.

Level 1. These solutions exist today, but assistive AI care models are being held back by a lack of broadly billable CPT codes for the services they render.

  • Solution: Implement value-based reimbursement for assistive AI care models. 8VC describes a CMMI model with durable codes and case rates, which sounds like something most payors would be lining up to lobby for.

Level 2. All autonomous AI is considered Software as a Medical Device by the FDA, but the current performance bars are set too high. Driving tests don’t need to be F1 races.

  • Solution: Align FDA approval benchmarks with real-world standards, not hypothetical ideals. LumineticsCore is a good example – the FDA required the tool to catch at least 85% of diabetic retinopathy cases, but most ophthalmologists land between 33-77%. 

Level 3. Only a few policy changes are needed to open the door to Level 3 once we get to Level 2, the biggest of which is defining AI as a type of practitioner that’s eligible for reimbursement.

  • Solution: Amend the Social Security Act to allow Medicare reimbursement for licensed AI. As it stands today, even if CMS created a code for a Level 3 service, it would still be illegal for Medicare to pay an AI company instead of the supervising physician.

The Takeaway

AI is going to have to level up if we want to transform healthcare experiences, costs, and ultimately outcomes. 8VC thinks we can get there if we let our builders build, and it even gave us a blueprint for getting out of our own way.

AI Scribes Aren’t Productivity Tools, Yet

The first randomized controlled trials for ambient AI have finally arrived, and NEJM AI just gave us the strongest evidence yet that scribes deliver… minimal time savings.

The first study was a mixed bag. UCLA researchers assigned 238 physicians across 14 specialties to one of two scribes – Microsoft DAX and Nabla – or usual care for two months.

  • Nabla ended up saving about 23 seconds per visit, while DAX shaved off a whopping 5 seconds (which wasn’t even statistically significant).
  • Both scribe groups did however report less burnout and reduced cognitive burden than the usual care controls.

The second study told a similar tale. Physicians at the University of Wisconsin that used Abridge’s AI scribe for 6 weeks trimmed their daily documentation time by 22 minutes.

  • Still not a world-changing difference, but the UW physicians also saw significant positive improvements in work exhaustion and well-being.

But wait, there’s more. While those studies didn’t go as far as to suggest a cause for the lackluster time savings, a separate well-timed study from Navina offered a possible mechanism.

  • Scribes capture clinical conversations. Those conversations only inform a piece of the note, and those notes are only a piece of the workflow.
  • Navina found that incorporating patient medical histories into ambient documentation dramatically improves both note completeness and quality, which also seems like a great way to help physicians avoid lengthy manual chart reviews to fill any remaining gaps.

Then why do scribes get rave reviews? That’s a mystery that’s still up for debate.

  • It’s worth noting that “average time savings” include plenty of physicians who barely used the scribe. UCLA only had about a third of physicians pick up the tools, while UW was close to a best-case scenario at 71%.
  • It’s also possible that physicians enjoy not having to hold the visit in their head until they can finish their note, and getting rid of that burden is as magical as actual time savings.

The Takeaway

Not everything that can be measured matters, and not everything that matters can be measured. AI scribes might not be productivity tools quite yet, but physicians are clearly finding plenty of reasons to love them until they get there – even if more time isn’t one of them.

Incorporating Human Factors Into AI Research

The majority of AI research centers on model performance, but a new paper in JAMIA poses five questions to guide the discussion around how physicians actually interact with AI during diagnosis.

A little reframing goes a long way. As the clinical scope of AI expands alongside its capabilities, the interface between the models and doctors is becoming increasingly important. 

  • Researchers from UCLA and Tufts University point out that this “human-computer interface” is essential to make sure AI is properly integrated into care delivery, serving as the first line of defense against common AI pitfalls like distracting doctors or giving them too much confidence in its answers.

Here’s the questions they came up with, and why they’re each important:

Question 1: What type of information and format should AI present?

  • Why it’s important: Deciding how information gets presented is just as important as deciding what information to present. Format affects doctors’ attention, diagnostic accuracy, and possible interpretive biases.

Question 2: Should AI provide that information immediately, after initial review, or be toggled on and off by the physician?

  • Why it’s important: Immediate information can lead to a biased interpretation, while delayed cues can help physicians maintain their hard-earned diagnostic skills by allowing them to fully engage in each diagnosis.

Question 3: How does AI show its reasoning?

  • Why it’s important: Clear explanations of how AI arrives at a decision can highlight features that were ruled in or out, provide “what if” explanations, and more effectively align with doctors’ clinical reasoning.

Question 4: How does AI affect bias and complacency?

  • Why it’s important: When physicians lean too heavily on AI, they might rely less on their own critical thinking, widening the space for an accurate diagnosis to slip past them.

Question 5: What are the risks of long-term reliance on AI?

  • Why it’s important: Long-term AI reliance could end up eroding learned diagnostic abilities. We recently covered a great study in The Lancet that investigated the topic.

The Takeaway

AI holds enormous potential to improve clinical decision-making, but poor integration could end up doing more harm than good. This paper provides a solid framework to push the field from “Can AI detect disease?” to “How should AI help doctors detect disease without introducing new risks?”

Menlo Ventures: The State of AI in Healthcare

Of all the AI market overviews that have hit the wire recently, none have generated more buzz than Menlo Venture’s State of AI in Healthcare. One look at the report and it’s easy to see why.

First things first. Here’s a couple high-level callouts before we zoom in on the details:

  • Healthcare AI spending has already topped $1.4B in 2025 – 22% of healthcare orgs have now implemented domain-specific AI tools, a 7x increase over last year [Chart 1].
  • 85% of all healthcare AI spend is currently flowing into startups (faster cycles, clearer ROI), rather than incumbents (often layering AI on legacy platforms) [Chart 2].

Providers accelerate, payors deliberate. Providers dominate AI adoption in healthcare, especially health systems – supplying $1B of the $1.4B total spending.

  • Outpatient providers represent $280M, while payors surprisingly contribute just $50M.

The song remains the same. Menlo found that leading health systems are choosing AI based on themes we’ve covered plenty of times before. They prioritize:

  • Tech maturity – providers prioritize production-ready solutions that perform at scale. 
  • Risk level – tools that don’t directly interface with patients see less scrutiny.
  • Quick value – a 2025 favorite, rapid ROI and organizational confidence are essential. 

What solutions check all the boxes? Two categories account for the lion’s share of AI budgets, in large part because they quickly address acute operational pain points.

  • Ambient documentation ($600M), no surprise here. This puts it in perspective [Chart 3].
  • Coding and billing automation ($450M), hard to think of a quicker ROI.

Bonus chart. Here’s the closest we’ll ever get to official ambient AI market share [Chart 4].

IT is good, services are great. Total U.S. healthcare administration spending reaches $740B annually, yet IT spend represents <10% of that. The report has a top tier breakdown [Chart 5].

  • AI’s frontrunners found success carving into existing IT budgets, but the future victors could be the teams that convert services dollars into software dollars for the first time.
  • AI offers the ability to automate workflows that have always been “people-intensive” – prior auth, patient engagement, front-office RCM – and Menlo believes 80% of this market is still completely untapped.

The Takeaway

Healthcare’s AI moment is here, and most of its potential has hardly been touched.

AI Learns the Natural History of Human Disease

Clinical decision-making relies on understanding patients’ past health to improve their future health, an impossible task without first understanding how diseases progress over time.

That’s where a new study in Nature suggests AI is ready to help.

It starts with generative pretrained transformers. Researchers built a GPT, dubbed Delphi-2M, to predict the “progression and competing nature of human diseases.” 

  • Delphi-2M was trained on 400k UK Biobank participants (which lean healthier than the average person), and then externally validated on 1.9M Danish patients.
  • The training was designed to predict a patient’s next diagnosis and the time to it, using only data readily available within the EHR: past medical history, age, sex, BMI, and alcohol/smoking status.

How’d it do? The results speak for themselves:

  • Delphi-2M was able to forecast the incidence of over 1,000 diseases with comparable accuracy to existing models that are fine-tuned to predict single diseases.
  • Death could be predicted with eerily impressive accuracy (AUC: 0.97), and the survival curves that it simulated lined up almost perfectly with national mortality statistics.
  • Comorbidities emerged naturally from the training, and Delphi-2M was able to understand the progression from type 2 diabetes to eye disease to nerve damage.
  • Delphi-2M’s ability to predict heart attack and stroke matched established scores like QRisk, and it even outperformed leading biomarker-based AI models.

Better forecasts inform better policies. If policymakers can consult the Oracle of Delphi to see how many people will develop a disease over the next decade, the authors conclude that they’ll also be able to implement better regulations to prepare. 

  • Not a bad theory, assuming models trained on historical data can make forecasts that hold up to evolving treatments and populations (and that politicians act in the best interest of the people:).

The Takeaway

AI is reaching the point where it can predict thousands of diseases as well as the best narrowly focused models, and that could have big implications for everything from early screening to policymaking.

Wolters Kluwer Jumps in the GenAI Ring With UpToDate Expert AI

Right when you think Wolters Kluwer might just let everyone else have all the AI fun, it debuted UpToDate Expert AI to give the world’s most widely used clinical decision support tool a much-needed AI overhaul.

Wolters Kluwer took its time with the launch. The incumbent CDS juggernaut is used by 3M doctors worldwide, so it had plenty of users to disappoint with a hasty roll out.

  • That said, nimble competition has been gaining ground pretty much as fast as it takes to download OpenEvidence from the App Store.
  • The good news is that WK made the most of the extra development time.

Here’s what sets UpToDate Expert AI apart. Unlike general-purpose chatbots, the AI-enhanced version of UpToDate is built exclusively on WK’s peer-reviewed content library.

  • It draws on 30+ years of evidence-based research authored by 7,600 experts, rather than the open web or selective journals.
  • That allows it to quickly answer complex clinical questions, while surfacing all of its sources, assumptions, and step-by-step reasoning directly in the response. Probably safe to assume that also helps with hallucinations.
  • Those answers still manage to be easy to scan at the bedside and will look extremely familiar to any doctor that’s ever read an UpToDate article (or one that’s been reading them for a decade).

The extra time in the oven means that more features are baked in. Wolters Kluwer knows its audience, and UpToDate Expert AI’s biggest leg up on the competition is its fine-tuning for health systems.

  • Enterprise-grade governance, compliance, and workflow integration are all standard out-of-the-box, giving UpToDate Expert AI an advantage for a system-wide implementation over OpenEvidence or Doximity.

The Takeaway

It turns out that the 800-pound clinical support gorilla wasn’t going to let the newcomers eat its lunch forever, and UpToDate Expert AI gives health systems plenty of reasons to keep rolling with Wolters Kluwer.

Co-Creating Confidence: Inside Amigo’s Approach to Building Trustworthy AI Agents

AI moves fast, but trust moves slow. That’s why Digital Health Wire is launching a new series to spotlight the companies taking AI from promise to practice.

First up: Amigo.

No matter how many medical licensing exams and curated case vignettes the latest models conquer, they’ll still need to make it through the proving ground of real clinical practice to get doctors on board.

The biggest challenge for AI in healthcare isn’t building agents that can handle a task, it’s building agents that clinicians can trust to handle those tasks safely – every time, guaranteed.

There’s a massive gap between textbook performance and real-world reliability, and Amigo is giving providers the infrastructure to bridge that gap.

Earning trust takes more than technology. Amigo’s process is just as important as its platform for enabling healthcare orgs to safely design, test, and monitor agents that they can genuinely depend on for their unique clinical and administrative workflows.

Amigo’s approach to building trust stands on four core pillars:

  • Controllability – Clinical teams can define and adjust agent behavior.
  • Performance Validation – High-fidelity patient simulations stress-test readiness.
  • Real-time Observability – There’s full transparency into decision-making.
  • Continuous Alignment – Agents adapt to changing protocols and priorities.

“Good enough” isn’t enough in healthcare. Most industries can get away with using the 80/20 rule to fine-tune their products. If they can improve the experience for 80% of their users, it justifies any shortcomings for the other 20%. Traditional benchmarks might work for customer service, but not when that 20% includes life or death situations.

  • When AI developers chase benchmark scores but ignore outcomes, they miss the actual point of care delivery: making patients healthier. A perfect medical licensing exam is great, but it’s not the same thing as a perfect clinician – or a trustworthy AI agent.
  • Strong benchmark scores can also lure providers into a false sense of security, and it’s tough to notice when performance starts to drift if nobody is on the lookout.

Drift is inevitable, and the current is strong. Even if an AI agent works on day one, there will always be a tendency for performance to slip over time. Clinical guidelines change. New drugs enter the market. Populations evolve. 

Amigo safeguards against this drift with a three-layer framework:

  • The Problem Model – Customers define their specific needs and the “operable neighborhood,” which is basically the set of scenarios that the agent can help with.
  • The Judge – Customers establish their own success criteria, as well as the verification measures to keep track of them. That includes both safety metrics like accuracy and handoff reliability, plus experience metrics like empathy and response time.
  • The Agent – Amigo spins up an agent that can safely tackle the problem at hand, then continuously monitors it against the “success scorecard” to minimize drift and intervene well before it impacts patient care.

How can performance be guaranteed? Simulating success ahead of time. Amigo swaps generic benchmarks for millions of simulated patient conversations to make sure each of its agents are 100% operationally ready before they’re actually deployed.

  • The simulations reflect the real-world scenarios and demographics of each customer’s unique patient population. The goal is to stress-test the agents to their breaking point in a controlled environment, then refine them until they perform reliably under pressure.
  • Amigo intentionally oversamples rare scenarios – like patients with unusual drug interactions – to ensure edge cases don’t slip through. This not only helps keep the agents consistent at scale but also means that they frequently perform better in real practice.

It’s a proven blueprint. Amigo’s strategy for building trust in AI resembles the playbook used in another area with similarly high stakes, high variance, and high skepticism: self-driving cars.

  • Waymo defines the well-charted terrain where its autonomous vehicles (AVs) are designed to operate safely. Amigo maps specific clinical neighborhoods.
  • Waymo simulates edge cases that might take years to encounter in the field before its AVs see any actual street time. Amigo puts its agents to the same test.
  • Waymo’s initial rollout includes safety drivers that can take control when needed. Amigo works with clinicians to refine the accuracy of the Judge.
  • Waymo removes safety drivers as its AVs prove themselves on real trips. Amigo reduces human oversight once clinicians are confident the Judge is calibrated correctly.
  • Waymo moves to similar neighborhoods only after success is consistently demonstrated. Amigo can expand to adjacent use cases where its agents can inherit validated behaviors and guardrails.

Adoption follows confidence. When clinicians co-create the solution to their problems, they’re more comfortable putting it in front of patients. 

  • That confidence usually means leveraging Amigo to automate the workflows that have been weighing them down the most, such as around-the-clock support and care navigation.
  • The agents go beyond providing advice. They can perform actions like ordering tests, updating the EHR, and looping in care teams for complex workflows like triage and medication management.

AI still has a lot to prove. Medicine is complicated, edge cases are everywhere, and lawsuits ain’t cheap. Getting doctors to toss an agent the keys to complex workflows is a tall order, but that’s exactly why Amigo designed its entire platform around getting that buy-in with verifiable evidence every step of the way.

The Takeaway

Clinical AI has the potential to transform healthcare. Fine-tuned AI agents can help eliminate medical errors, keep patients engaged with their care, and allow providers to start carving out competitive moats through their own clinical differentiation.

Doctors aren’t going to arrive at that future by taking a leap of faith. Trust is gained slowly, and can shatter instantly. AI agents will have to earn credibility one workflow at a time, and could lose it all with a single misstep. 

That said, it’s a future worth striving for, and Amigo’s safety-first approach to building trustworthy AI agents is one of the best roadmaps we’ve seen for how to get there.

Nothing gets the magic across better than Amigo’s live walkthrough. Make sure to check out the agents in action by booking a demo on their website.

Innovaccer Acquires Story Health for Agentic Care Augmentation

Innovaccer kicked off a shopping spree instead of chasing an IPO, and virtual specialty care platform Story Health just became the latest startup to get crossed off the acquisition list.

Innovaccer’s been busy. It spent years building the technical infrastructure to make healthcare actually work, and it’s now acquiring the pieces to show what’s possible with that foundation.

  • That includes picking up Humbi AI (actuarial intelligence), Cured (healthcare marketing/CRM), and Pharmacy Quality Solutions (pharma-payor performance tech).
  • It also means equipping more healthcare orgs with its new solutions like Gravity (connects nearly every data input into a single source of truth to scale AI adoption) and Comet (an AI-powered access center with a name so good that Epic had to steal it).

Here come the agents. Story’s cardiovascular health platform is designed to shift care from episodic visits to continuous management that can move the needle on value-based outcomes. 

  • The platform combines AI-driven clinical pathways, advanced medication workflows, and human-led coaching to deliver industry-leading results across heart failure and other chronic conditions. 
  • Innovaccer will be using Story as its first scaffolding to “pioneer agentic care augmentation,” where EHR-integrated AI agents will help specialty care teams with non-clinical tasks and engage patients between visits. 

There’s more on the way. Innovaccer recently revealed that it has “two to three additional acquisitions planned in the coming months,” and that hospital administration and revenue cycle management are both major focus areas.

  • Although Hinge and Omada helped crack open the digital health IPO window, Innovaccer’s business is quickly evolving, and it still has the freedom to make longer-term plays in the private markets.
  • Answering to public shareholders wouldn’t exactly offer Innovaccer any more freedom, and it’s using its unrestricted range of motion to take advantage of private markets that “have never had the kind of depth they have today.”

The Takeaway

We love to see a good crossover story. Innovaccer didn’t just acquire Story to improve outcomes for its patients, it acquired it to scale those outcomes to patients everywhere – and we shouldn’t have to wait long to see another chapter that takes the same playbook to a new specialty.

Doctors Who Use AI Are Viewed Worse by Peers

The research headline of the week belongs to a study out of Johns Hopkins University that found “doctors who use AI are viewed negatively by their peers.”

Clickbait from afar, but far from clickbait. The investigation in npj Digital Medicine surfaced interesting takeaways after randomizing 276 practicing clinicians to evaluate one of three vignettes depicting a physician: using no GenAI (the control), using GenAI as a primary decision-making tool, or using GenAI as a verification tool.

  • Participants rated the clinical skill of the physician using GenAI as a primary decision-making tool as significantly lower than the physician who didn’t use it (3.79 vs. 5.93 control on a 7-point scale). 
  • Framing GenAI as a “second opinion” or verification tool improved the negative perception of clinical skill, but didn’t fully eliminate it (4.99 vs. 5.93 control). 
  • Ironically, while an overreliance on GenAI was viewed as a weakness, the clinicians also recognized AI as beneficial for enhancing medical decision-making. Riddle us that.

Patients seem to agree. A separate study in JAMA Network Open took a look at the patient perspective by randomizing 1.3k adults into four groups that were shown fake ads for family doctors, with one key difference: no mention of AI use (the control), or a reference to the doctors using AI for administrative, diagnostic, or therapeutic purposes (Supplement 1 has all the ads).  

For every AI use case, the doctors were perceived significantly worse on a 5-point scale:

  • less competent – control: 3.85, admin AI: 3.71; diagnostic AI: 3.66; therapeutic AI: 3.58
  • less trustworthy – control: 3.88; admin AI: 3.66; diagnostic AI: 3.62; therapeutic AI: 3.61
  • less empathic – control: 4.00 ; admin AI: 3.80; diagnostic AI: 3.82; therapeutic AI: 3.72

Where’s that leave us? Despite pressure on clinicians to be early AI adopters, using it clearly comes with skepticism from both peers and patients. In other words, AI adoption is getting throttled by not only technological barriers, but also some less-discussed social barriers.

The Takeaway

Medical AI moves at the speed of trust, and these studies highlight the social stigmas that still need to be overcome for patient care to improve as fast as the underlying tech.

Get the top digital health stories right in your inbox