AI Spotlight on Epic, Abridge, and Oracle 

Epic, Abridge, and Oracle just gave us a year’s worth of blockbuster AI announcements in three days, and at least one of them was more than speculation and old news.

‘Twas the week before UGM, and the rumor-mill has been overheating with reports that Epic might finally launch its own EHR-native scribe at its upcoming User Group Meeting.

  • Over 40% of U.S. hospitals are already on Epic, which means its scribe would have access to one of the biggest distribution channels in healthcare even if its UX and performance aren’t best-in-breed (which they won’t be).
  • That means about 100 ambient AI startups could be about to find out why scribing is a feature – not a product – and the race will be on to differentiate through other capabilities like RCM and specialty-specific tuning.

Abridge doesn’t plan on being commoditized. Less than 24 hours after Epic’s scribe leaked, Abridge unveiled the exact type of solution that’ll define who survives the incumbent squeeze: real-time prior authorization at the point of conversation.

  • Abridge is co-developing the new solution alongside Highmark Health, a Pittsburgh-based payvidor that operates both a multistate payor division and the 14-hospital system Allegheny Health Network.
  • Integrating Abridge’s ambient AI platform across Highmark’s entire organization will allow patients to get approval for necessary treatments before they even leave the office, a perfect example of how “scribes” can be truly transformative beyond just transcripts.

Oracle couldn’t let Epic and Abridge have all the fun. It decided to “usher in a new era of AI-driven health records”… by reintroducing us to the same AI EHR it unveiled last October.

  • Although mostly a PR stunt to grab headlines ahead of UGM, the new EHR includes several features that underscore where the AI puck is heading, including a native scribe, voice-first navigation, and agents to support clinical workflows.
  • These features are also a good list of use cases where startups might not have a lot of juice left to squeeze after EHRs start bringing them in-house (and prior auths just so happen to be the last thing Oracle wants to get its hands dirty with).

The Takeaway

Native scribing is (very likely) on its way to Epic, Abridge is giving patients the gift of time with instant prior auths, and Oracle is banking on voice for the future of EHR navigation. What a week for digital health.

Doximity Ramps Up AI With Pathway Acquisition

Doximity is setting out to prove that it’s more than “LinkedIn for doctors” after snapping up clinical reference AI startup Pathway for $63M. 

Clinical workflows are the new social media… or at least that’s the plot of Doximity’s growth story.

  • Act 1: Doximity’s newsfeed and networking features set the stage for pharma advertising by attracting physicians to the platform.
  • Act 2: Complementary workflow tools like scheduling, telehealth, and Doximity Dialer gave physicians a reason to stick around longer than their news sweep.
  • Act 3: The AI suite took engagement a step further with Doximity GPT and Doximity Scribe, which helped drive quarterly active users to a record 1M physicians in Q1.

Enter Pathway. The Montreal-based startup’s AI helps physicians answer questions at the bedside using information from Pathway Corpus, “one of the largest structured datasets in medicine” that spans nearly every guideline, journal, and landmark trial.

  • Pathway’s cross-linked structure reportedly allows it to understand complex drug interactions and score the strength of medical evidence, such as weighing validated clinical trials more than case studies.
  • The acquisition will bring that same “robustness” to the back-end of Doximity GPT, and the integration is already live for thousands of physician beta testers.

If you can’t beat ‘em, buy ‘em. It’s tough for physicians to see your pharma ads if they’re not using your platform, so Doximity is acquiring its own workflow solutions to keep users from venturing off to use competing products from OpenEvidence or Wolters Kluwer. 

  • Clinicians have also apparently been using Doximity GPT outside of office hours more than Doximity’s other tools, which helps with serving ads around the clock.
  • Doximity’s AI suite and workflow modules already account for over 20% of its ad revenue, and it now expects that share to overtake its newsfeed in the next few years.

The Takeaway

Doximity is looking to make AI the star of its next act, and if OpenEvidence doesn’t want to share its script, then Pathway will have to steal the show.

The Generalist-Specialist Paradox of Medical AI

Technological advances have ushered in an era where many AI models outperform specialists on specific tasks, but AI still lags far behind experts in less controlled settings.

That’s the Generalist-Specialist Paradox of Medical AI laid out in a recent NEJM AI editorial, which paints a picture of a world where AI might soon start redrawing the boundaries of medical specialties as they exist today.

  • AI is already delivering great results on well-defined tasks like interpreting EEGs or CT scans, but it’s still consistently struggling on generalist tasks with less clear boundaries.
  • If that trend continues, the article argues that tasks that used to be in the hands of specialists will be at the fingertips of primary care (just as tasks that used to belong to primary care will now belong to patients).

LLMs don’t care what specialty a case belongs to. They can ingest the full clinical context across visit notes, labs, and imaging to come up with the most probable diagnosis.

  • Breyer Capital Partner Dr. Morgan Cheatham recently made the case that this feature of AI could lead to the collapse of traditional medical specialties as we know them.
  • “Some domains will converge. Others will splinter into new subspecialties defined not by organ systems, but by data fluency, workflow design, or model supervision.”

Not so fast. There’s no doubt that AI will reshape roles, but that doesn’t mean that specialists are about to start offloading everything onto generalists.

  • High-quality care requires more than following AI-friendly guidelines, and specialists incorporate judgment earned through years of experience to deliver effective treatments. LLMs are also still a ways away from replacing anyone’s hip.
  • Primary care providers also aren’t exactly sitting around looking for extra work, and it’s far-fetched to think that they can start taking on specialty care for their ever-growing patient panels.

The Takeaway

AI might be great at well-defined tasks like many seen in specialty care, but we’re still a ways away from having primary care physicians replacing cardiologists.

OpenAI Delivers Largest-Ever Study of Clinical AI

Hot on the heels of launching its HealthBench medical AI benchmark, OpenAI just delivered results from the largest-ever study of clinical AI in actual practice – and let’s just say the future’s looking bright.

40,000 visits, 106 clinicians, 15 clinics. OpenAI went big to get real-world data, equipping Kenya-based primary and urgent care provider Penda Health with AI Consult (GPT4o) clinical decision support within its EHR.

  • The study split 106 Penda clinicians into two even groups (half with AI Consult, half without), then tracked outcomes over a three month period. 

When AI Consult detected a potential error in history, diagnosis, or treatment, it triggered a simple Traffic Light alert.

  • Green – No concerns, no action needed
  • Yellow – Moderate concerns, optional clinician review 
  • Red – Safety-critical concerns, mandatory clinician review

The results were definitely promising. Clinicians using AI Consult saw a:

  • 16% reduction in diagnostic errors
  • 13% reduction in treatment errors
  • 32% reduction history-taking errors

The “training effect” is real. The AI Consult group got significantly better at avoiding common mistakes over time, triggering fewer alerts as the study progressed.

  • Part of that is because Penda took several steps to help along the way, including one-on-one training, peer champions, and performance feedback.
  • It’s also worth noting that there was no recorded harm as a result of AI Consult suggestions, and 100% of the clinicians using it said that it improved their quality of care.

What’s the catch? While AI Consult led to a clear reduction in clinical errors, there was no statistically significant difference in patient-reported outcomes, and clinicians using the copilot saw slightly longer visit times.

The Takeaway

Clinical AI continues to prove itself outside of multiple choice licensing exams / clinical vignettes, and OpenAI just gave us our best evidence yet that general-purpose models can reduce errors in actual patient care.

Microsoft MAI-DxO and the Path to Medical Superintelligence

In an action-packed week to kick off the second half of the year, no story grabbed more headlines than Microsoft’s MAI-DxO proving four times more successful than human doctors at diagnosing complex diseases.

Microsoft is on the path to medical superintelligence… at least according to their excellent blog post outlining its new MAI Diagnostic Orchestrator, better known as MAI‑DxO.

  • MAI-DxO acts like a “virtual panel of physicians” collaborating on a case, orchestrating multiple AI agents with specific roles like forming diagnostic hypotheses, selecting tests, and interpreting results. 
  • It then applies a “debate chain” to arrive at an explainable diagnosis, all while avoiding over-testing to keep costs under control.. 

New breakthroughs require new benchmarks. As AI gets to the point where it’s breezing through multiple choice benchmarks like medical licensing exams, Microsoft decided to introduce SDBench to better simulate routine clinical practice.

  • SDBench deconstructs 304 of the most diagnostically complex NEJM cases, requiring LLMs (and physicians) to begin with an initial presentation, ask follow-up questions, order tests (each with assigned costs), and agree on a diagnosis.

Here’s how MAI-DxO stacked up:

  • MAI-DxO: 85% diagnostic accuracy / $7,200 estimated cost per patient
  • OpenAI o3: 79% / $7,850
  • Gemini 2.5 Pro: 69% / $4,800
  • Claude 4 Opus: 68% / $7,000
  • Llama 4: 55% / $4,000
  • Human Physicians: 20% / $2,950

What’s the catch? The human physicians weren’t allowed to use the internet or any outside help, which probably simulates a deserted island workflow more than routine clinical practice. Each of the participants also happened to be generalists as opposed to specialists, giving another edge to the LLMs. 

The Takeaway

MAI-DxO might have the potential to deliver superhuman diagnostics in constrained settings, but that doesn’t mean it’s ready to replace doctors. As Microsoft pointed out in its own blog post, “clinical roles are much broader than simply making a diagnosis. They need to navigate ambiguity and build trust with patients and their families in a way that AI isn’t set up to do.”

Doximity Accused of Prompt Hacking OpenEvidence

What does a high-flying company like Doximity do when competitors are nipping at its heels? According to OpenEvidence’s new lawsuit, it just politely asks their LLMs to reveal trade secrets. 

Doximity is basically LinkedIn for doctors. It allows physicians to use its networking platform and AI workflow products at no cost, which means the physicians themselves are the product.

  • Doximity generates revenue almost exclusively through pharma advertising, and it turns out that might actually be the best business model around.
  • Out of the dozen publicly traded digital health companies with a market cap over $1B, Doximity is the only one that’s decently profitable.

No good prompt goes unpunished. The crown jewel of Doximity’s AI portfolio is its Doximity GPT workflow assistant, which may or may not leverage proprietary tech acquired by prompting OpenEvidence’s competing model to reveal sensitive information.

  • Although it’s funny to see Doximity get accused of asking OpenEvidence’s AI to literally “write down the secret code,” it doesn’t exactly make for a bulletproof case when the model willingly dishes up an answer.
  • The catch is that OpenEvidence requires users to register using their National Provider ID numbers, and Doximity allegedly impersonated a practicing neurologist to “obtain through theft what they lacked in technical expertise.” Ouch.

It gets worse from there. A separate shareholder lawsuit accused Doximity of inflating its active user base and website engagement data to artificially bolster its advertising revenue.

  • While some investors might be able to stomach a little corporate espionage, they probably won’t look the other way if it turns out Doximity is fudging the numbers.
  • Innocent until proven guilty, but it’s worth noting that nearly identical allegations popped up in a recent short report.

The Takeaway

Doximity has some serious allegations piling up against it, but so far the market has shrugged off the bad news. That could be a sign that investors don’t think the lawsuits will hold up in court, or maybe they just don’t mind when a management team is willing to bend the law to generate some extra shareholder value.

OpenEvidence Partners With JAMA Ahead of Next Raise

“The fastest-growing platform for doctors in history” continues to step on the gas, and OpenEvidence is reportedly on the verge of notching a $3B valuation after inking a deal to bring JAMA Network journals to its AI medical search engine.

The multi-year content agreement will make full-text articles from the American Medical Association’s JAMA, JAMA Network Open, and 11 specialty journals available directly within the OpenEvidence platform.

  • OpenEvidence’s medical search engine helps clinicians make decisions at the point of care, turning natural language queries into structured answers with detailed citations.
  • The model was purpose-built for healthcare using training data from strategic partners like the New England Journal of Medicine, which joined the platform through a similar deal earlier this year.

The Disney+ content strategy has arrived in healthcare. OpenEvidence compares its approach to streaming services that drive subscriptions through exclusive movies.

  • If a physician wants information from top journals to support decision making, they’ll either have to get it straight from the source or use OpenEvidence, just like how anyone who wants to stream Moana needs to go to Disney+.
  • The kicker is that OpenEvidence is available at no cost to verified physicians, and advertising generates all of the revenue. 

The blueprint is working like a charm. OpenEvidence has over 350k doctors using its platform plus another 50k joining each month, and it’s apparently close to raising $100M at a $3B valuation just a few months after closing its $75M Series A.

  • It’s rare to find hockey stick growth in digital health, and OpenEvidence is a good reminder that many areas of healthcare change slowly… then all at once.
  • It also isn’t too surprising to hear that VC’s like Google Ventures and Kleiner Perkins are lining up to fund a company with a similar ad-supported business model to Doximity – one of the only successful healthcare IPOs since the start of the pandemic.

The Takeaway

Content is king, and OpenEvidence is locking in partnerships to make sure its platform is wearing the crown. The results have been speaking for themselves, but healthcare’s genAI streaming wars are just getting started.

Epic Announces Launchpad to Fast-Track GenAI Deployment

Epic is looking to accelerate generative AI adoption with the surprise unveiling of Launchpad, a new program designed to help provider orgs “move from idea to operational gains in a matter of days.” 

Launchpad’s grand unveiling included little more than a LinkedIn post, but Epic AI Director Sean McGunigal told Fierce Healthcare that the program includes guided AI implementations and a fast track to live workflows.

Here’s the general outline of how Launchpad works.

  • When an organization joins the program, Epic staff shepherds them through any roadblocks they’re facing with active genAI implementations.
  • Epic’s experts will assist with getting genAI use cases configured, turned on, and operationalized – all while establishing appropriate governance structures.
  • Launchpad also includes a starter kit of 10 high-impact genAI applications that can be deployed within days, covering both clinical and operational workflows.

Epic views low AI literacy as one of the biggest barriers to industry-wide adoption, and McGunigal made it clear that Epic will go to great lengths to overcome that hurdle. 

  • “We’ll help coordinate. We’ll get the right stakeholders on the phone and we’ll help this thing along.’ The [genAI] roadblocks tend not to be necessarily performance challenges or end user training. It tends to be the project management-type work.”

MyChart In-Basket Augmented Response Technology – better known as ART – marked the launch of Epic’s first genAI feature in April 2023.

  • Since then, over half of Epic’ customers have started using at least one of its genAI features, and it’s making some big investments to pump that number up.
  • Epic has over 100 new genAI use cases in the works, spanning everything from lab and imaging recommendations to fully automated patient-agent interactions, and Launchpad is going to ensure that providers have the support they need to adopt them.

The Takeaway

As Epic gears up for a tidal wave of its own AI roll outs, healthcare orgs are going to need customized support, implementation kits, and governance guidance to take full advantage. Epic just showed us that it’s willing to build its own Launchpad to make that happen.

OpenAI Dives Into Healthcare With HealthBench

OpenAI is officially setting its sights on healthcare with the launch of HealthBench, a new benchmark for evaluating AI performance in realistic medical scenarios.

HealthBench marks the first time the ChatGPT developer has taken a direct step into the industry without a partner to hold its hand.

  • Developed with 262 physicians from 60 countries, HealthBench includes 5,000 simulated health conversations, each with a custom rubric to grade the responses.
  • The conversations “were created to be realistic and similar to real-world use of LLMs,” meaning they’re multi-turn and multilingual, while spanning a range of medical specialties and themes like handling uncertainty or global health.

Here’s how current frontier models stacked up in the HealthBench test.

  • OpenAI’s o3 was the best performing model with a score of 60%
  • xAI’s Grok 3 ranked second with a score of 54%
  • Google’s Gemini 2.5 Pro followed close behind at 52%

All three leading models outperformed physicians who weren’t equipped with AI, although physicians outperformed the newer models when they had access to the AI output.

  • The paper also reviewed other LLMs like Llama and Claude, but unsurprisingly none of them scored higher than OpenAI’s model on OpenAI’s own test.

Even the best models came up short in a few common places, AKA areas that developers should focus on to improve performance.

  • Current AI models would rather hallucinate than withhold an answer they aren’t confident on, obviously not a good trait to bring into a clinical setting.
  • None of the leading LLMs were great at asking for additional context or more information when the input was vague.
  • When AI misses, it misses bad, as seen in the sharp quality dropoff with the worst 10% of responses.

The Takeaway

Outside of giving us yet another datapoint that AI is catching up to human physicians, HealthBench provides one of the best standardized ways to compare model performance in (simulated) clinical practice, and that’s just what the innovation doctor ordered.

More Reasoning, More Hallucinations for LLMs

Better reasoning apparently doesn’t prevent LLMs from spewing out false facts.  

Independent testing from AI firm Vectara showed that the latest advanced reasoning models from OpenAI and DeepSeek hallucinate even more than previous models.

  • OpenAI’s o3 reasoning model scored a 6.8% hallucination rate on Vectara’s test, which asks the AI to summarize various news articles.
  • DeepSeek’s R1 fared even worse with a 14.3% hallucination rate, an especially poor performance considering that its older non-reasoning DeepSeek-V2.5 model clocked in at 2.4%.
  • On OpenAI’s more difficult SimpleQA tests, o3 and o4-mini hallucinated between 51-79% of the time, versus just 37% for its GPT-4.5 non-reasoning model.

OpenAI positions o3 as its most powerful model because it’s a “reasoning” model that takes more time to “think” and work out its answers step-by-step.

  • This process produces better answers for many use cases, but these reasoning models can also hallucinate at each step of their “thinking,” giving them even more chances for incorrect responses.

The Takeaway

Even though the general purpose models studied weren’t fine-tuned for healthcare, the results raise concerns about their safety in clinical settings – especially given how many physicians report using them in day-to-day practice.

We’re testing a new format today – let us know if you prefer two shorter Top Stories or one longer Top Story with this quick survey!

Get the top digital health stories right in your inbox