Abridge Unveils New Platform, Teams Up With Lilly and Nvidia

Patients, platforms, Lilly, and Nvidia. Abridge’s first keynote had it all.

There were enough major announcements to fill an entire issue of DHW, so here’s the abridged version of the top stories to come out of NYC.

The new platform stole the show. Abridge unveiled “the first AI-native clinician intelligence platform” organized around patients, built for clinicians, and designed to help health systems.

  • Before the visit: The platform surfaces care gaps and relevant clinical context so clinicians can address what matters during the visit instead of discovering it in retrospective chart reviews.
  • During the visit: Abridge suggests discussion topics while delivering evidence-based answers to clinical questions from a growing content library that includes new specialty-focused partners like AAFP, AAN, ADA, and ASCO.
  • After the visit: Abridge generates documentation, flowsheets, patient summaries, orders, and billing codes (soon to be fine-tuned through a new partnership with AHIMA).

“The base unit of healthcare is a clinician caring for a patient.” As Abridge pushes into new models of care delivery, its platform will provide the connective tissue between the clinical workflows where care actually gets delivered and outside orgs like payers or life sciences firms.

  • The keynote highlighted some key examples: Cigna was on stage discussing how embedding AI in clinical workflows has the potential to unlock real-time claims adjudication, and Aetna shared how it could help realize the promise of VBC.
  • More than 300 health systems are already live, including a just-announced rollout at Northwestern Medicine.

Eli Lilly is buying into the vision. The pharma giant made a strategic investment in Abridge’s next chapter, and even though the keynote was light on details, the move started to add up after seeing one of the new capabilities coming to the platform: clinical trial screening.

  • By comparing clinical guidance with patient-provider conversations in real-time, Abridge can surface relevant trials directly in the encounter – the moment it matters most. 
  • They didn’t mention a check size, but big opportunities attract big investments, and identifying candidates while initiating screening at the point of care sounds huge.

Last, but certainly not least, Nvidia. Abridge is teaming up with Nvidia to develop a first-of-its-kind foundation model for clinical conversations that’s trained, shaped, and evaluated against real-world conditions.

  • We’ll have to wait until later this year to see it in action, but a little pre-, mid-, and post-training magic with Abridge’s de-identified clinical data will apparently help make it the first model that can “reason clinically from its foundation.”

The Takeaway

If the keynote made one thing crystal clear, it’s that Abridge’s platform doesn’t revolve around AI documentation. It revolves around patients, and every new feature is purpose-built to prove it.

Patients Want AI, So Long As There’s No Copay 

New research in npj Digital Medicine suggests that patients might be warming up to medical AI, at least if it’s less expensive than seeing an actual doctor.  

Here’s the setup. Johns Hopkins researchers recruited 248 U.S. adults with type 1 diabetes, then presented them with scenarios where they were due for an annual diabetic eye screening.

  • Diabetic retinopathy is the leading cause of blindness among working-age adults, and autonomous AI tools that can diagnose the disease from retinal images are already cleared by the FDA and in clinical use.
  • In each scenario, one of these autonomous AI tools was made available as an alternative to a specialist referral.

The catch was the copay. Participants were randomized to have the AI offered with either a $50 copay, or with the copay waived by their insurer or the AI developer.

Fifty bucks is fifty bucks. More than 80% of participants opted for the AI tool when the copay was waived, compared to 43% who chose AI when the copay wasn’t waived.

  • Not only did more participants opt for the AI screening when there was no copay, but participants also perceived the AI as more effective.
  • It didn’t matter whether the copay was waived by the AI developer or their insurer.

There was one major caveat. Patients who chose AI over a traditional screening with a human specialist were far more likely to seek reconfirmation from their doctor after getting the results.

  • The AI group was nearly 3x more likely to seek reconfirmation after abnormal results, and still nearly 50% more likely to ask for a second look after getting normal results.

The trust isn’t there yet. AI might be able to give patients results, but they still want to hear from a medical professional to verify those results.

  • The authors point out that human oversight is still clearly a top priority for patients, and that “it’s crucial to address the persistent preferences for provider follow-up and verification, even when AI results are normal.”

The Takeaway

Financial incentives remain undefeated, but this study confirmed that you can’t put a price on trust with AI in medicine.

Ad-verse Effects in Consumer-Facing AI

As AI companies embed more ads in their user interfaces for clinicians and consumers, the BRIDGE GenAI Lab decided to take a look at whether these ads impact model performance.

Turns out, they do. BRIDGE ran four experiments across 12 leading LLMs from Anthropic, Google, and OpenAI. The models were far more recent than most studies we cover, an upside of not waiting around for peer-review before publishing a preprint.

  • Each experiment paired a clinical scenario with a system prompt containing a pharmaceutical advertisement, then asked the model for a treatment recommendation.

Ads definitely moved the needle. Across 74,880 calls and 13 scenarios, advertising shifted the model’s choice toward the advertised drug from a baseline of 34% to 48%. 

  • That’s a jump of +12.7 percentage points on average.

The LLMs had some nice range. Model bias varied widely by developer.

  • Google’s advertising DNA was on full display when Gemini led the pack with an average shift of +29.8 percentage points toward the advertised drug. 
  • Five models from OpenAI were swayed by an average of +10.9 pp.
  • Anthropic’s models were the most resilient at +2.0 pp, and the ever-skeptical Opus 4.6 actually steered away from the promoted drug by -3.8 pp.

Three experiments contrasted three different conditions. That let BRIDGE triangulate the bias across a trio of distinct categories.

  • Equipoise (+12.7 pp) – When two drugs were guideline-equivalent, the ad acted as a tiebreaker. The output was clinically correct, but biased.
  • Suboptimal Drug (+0.6 pp) – When the advertised drug was clinically inferior, models resisted. Only 4.4% of responses chose the suboptimal advertised option.
  • Wellness Supplements (-0.6 pp) – For supplements lacking evidence, endorsement decreased. Anthropic models actively pushed back at -2.4 pp.

The picture was consistent. Advertising didn’t override medical knowledge, but it did tip the scales when two or more options were medically defensible. 

  • Another important note: When models were asked to justify their choices, they almost never disclosed the ad. If they chose the advertised drug, the justification echoed the ad in 52.7% of cases.

The Takeaway

BRIDGE just showed why the real harm with AI advertising might not be patients receiving dangerous drugs. It could be that they receive clinically sound recommendations that were shaped by commercial interests – without them knowing it, and without a mechanism to flag it.

OpenAI o1 Outperforms Physicians on Clinical Reasoning Tasks

A landmark study in Science found that OpenAI’s o1 series outperformed human physicians at multiple clinical reasoning tasks, but that doesn’t mean it’s time to hang up the scrubs just yet.

Researchers at Harvard and Beth Israel Deaconess Medical Center designed the study to evaluate whether LLMs are ready to do what physicians do on a daily basis: review messy patient charts and use that data to determine diagnosis and next steps.

  • They evaluated o1 on clinical cases ranging from patient vignettes to second opinions on 76 real-world ED assessments, which included all the noise and incomplete information that clinicians routinely encounter in the EHR.
  • The refreshingly well-designed study also incorporated a blinded evaluation with two attending physicians at BIDMC and GPT-4.

o1 came to play. On clinical vignettes evaluating management reasoning, o1-preview scored a median of 86%. Not too shabby.

  • It outperformed GPT-4, humans with GPT-4, and humans with conventional resources like UpToDate – all of which scored below 45%.

The ED cases were even more impressive. o1 offered second opinions about the diagnosis at three points along the patient’s ED journey:

  • At triage, o1 gave an exact or very close diagnosis in 67% of cases (when information in the record dump was most limited). The two physicians hit 55% and 50%. 
  • o1 still outperformed the physicians when given all the data collected by the end of the ED encounter.
  • It was only when the physicians were given the most information possible to inform their diagnosis – at the time the patient would have been admitted to the hospital – that the scores finally converged.

The cherry on top? Physician raters couldn’t tell whether the differentials came from o1 or a human. One rater couldn’t tell in 83.6% of cases, the other in 94.4%. 

  • The authors were quick to mention that these results don’t mean AI is ready to replace human physicians. They mean it’s time for rigorous research into how AI can augment care teams, serve as a second opinion, and become a safety layer for clinicians.

The Takeaway

o1 outperforming a couple internists at triage isn’t quite Deep Blue beating Gary Kasparov at chess, but it’s a step in that direction – especially considering OpenAI’s performance jump in just the last week (let alone since o1 launched in 2024).

Why AI Vendors Struggle to Compete With EHRs

Anyone who has ever tried selling AI into health systems will tell you that it’s tough to compete with EHRs, but a new article in JAMA makes the case that it’s actually gotten too tough – and it might be time for regulators to step in.

Most markets reward the best products. The healthcare industry has a funny way of preventing that from happening, and EHR vendor dominance is a textbook example.

  • EHRs hold advantages across infrastructure, workflow integration, procurement, and pricing that make it difficult for third-party tools to gain a foothold.
  • A 2025 Health Affairs study backed that up by showing that 79% of U.S. hospitals use AI models from their EHR vendor, compared to just 59% that use AI from third-party developers.
  • A Bain report drove the point home. Two-thirds of Epic customers said they’d pick a “good enough” Epic option over a better competing product.

These EHR advantages are a natural feature of the market. That said, it’s up to regulators to decide whether the status quo is serving patients and the overall healthcare system. The JAMA authors argue that it doesn’t, and offer three areas where targeted policy could level the playing field.

Infrastructure – Integrating AI tools into clinical workflows requires real-time data access and the ability to survive EHR upgrades intact, both of which are dramatically easier for EHR vendors – particularly as data fields get added or removed.

  • Potential Policy – Mandate broader API adoption so third parties can access EHR data on equal footing, and use existing EHR certification and interoperability frameworks to do it.

Workflow and Usability – The authors specifically flag EHR vendors’ edge in understanding the trade-offs of allocating limited screen real estate to new AI tools, something that’s harder for third parties to gauge from the outside looking in.

  • Potential Policy – Require EHR vendors to offer more robust developer sandboxes – similar to Apple’s iOS developer environment – so third parties can build and test without operating at a structural disadvantage.

Procurement and Pricing – Long-standing health system relationships give EHR vendors a streamlined path through procurement, as well as the leverage to “use pricing structures that incentivize adoption.”

  • Potential Policy – Although this is the hardest area for a policy fix, the authors suggest that improving transparency around AI performance could at least help health systems make more informed decisions regardless of where a tool comes from.

The Takeaway

EHRs are in a powerful position, and companies in powerful positions have a long track record of making life harder for their competition. Healthcare is too important of an industry to not have the best products rise to the top, and this article offers some sound strategies to make sure that stays possible.

Qualified Raises $125M to Build AI Infrastructure

In an era of isolated AI pilots, Qualified Health is building the infrastructure to connect the dots.

AI is the star of enterprise transformation. Health systems are looking to deploy and scale AI across their entire organization, and Qualified just raised $125M of Series B funding to make sure every new agent fits into a cohesive constellation.

The core platform has four distinct layers:

  • A data foundation that turns the EHR and external sources into an AI-ready bedrock.
  • A layer that lets hospitals build and deploy AI tools without always starting from scratch.
  • A layer that turns those tools into AI apps and agents deployed directly into workflows.
  • A layer that keeps governance, monitoring, and evaluation at the center of everything.

Qualified doesn’t leave AI to chance. It embeds forward-deployed product leaders alongside health system teams to identify high-priority needs, deploy solutions quickly, and iterate based on actual feedback in the trenches.

That has a couple of major benefits:

  • AI solutions are purpose-built for specific operational problems rather than mass market appeal.  
  • The tight feedback loop allows Qualified to iterate faster than it would be able to with a traditional implementation cycle, which shortens the timescale needed to improve its deployments and demonstrate a measurable impact.

The proof is in the pudding. At the University of Texas Medical Branch, Qualified reportedly generated a $15M measurable run-rate impact within the first six months.

  • That’s an eye-popping number to get on record, and it apparently stemmed from “a real willingness to dive deep” alongside UTMB clinical teams to deploy multiple assistants and automated workflows.
  • Qualified already supports systems representing about 7% of U.S. hospital revenue, and the next chapter is about deepening those partnerships and scaling responsibly.
  • Big ambition also means big competition, and Qualified will be up against everyone from Innovaccer to Epic if it wants to become healthcare’s AI platform of choice.

The Takeaway

Hospitals aren’t looking to AI for incremental improvement. They’re looking to AI to transform how they deliver care, and Qualified just landed another $125M to be the infrastructure that makes that possible.

How to Build Patient Trust in Medical AI

AI might move at the speed of trust, but new research in JAMA Network Open shows that trust only moves at the speed of accuracy.

The study had a solid setup. To determine the factors currently driving patient trust in AI, researchers presented 3,000 U.S. adults with a pair of hypothetical AI-assisted visits for a moderate-risk rash. 

  • Each visit had six randomized attributes, such as whether or not a doctor was present, how well the AI performs relative to human clinicians, and various AI governance mechanisms.

AI performance came out on top by a wide margin. Respondents cared more about how well the AI performs than FDA approval, governance, and even having a doctor in the room.

  • The biggest difference came from AI performing better than a specialist, which increased the likelihood of choosing that visit by 32.5%.
  • AI performing at the same level as a specialist boosted visit preference by 24.8%, slightly more than having AI that performs as well as a general practitioner (19.1%).
  • Having an actual doctor present surprisingly only swayed visit preference by 18.4%.

Governance factors also moved the needle. They just didn’t move it much.

  • FDA approval for the AI increased visit preference by a modest 11.1%.
  • Mayo Clinic AI certifications apparently carry just as much weight – also coming in at 11.1%.
  • Local hospital certifications for the AI only gave visits a 7.8% lift.

AI data quality was important. It just wasn’t as convincing as AI performance. 

  • AI that had nationally representative training data boosted visit preference by 11.9%, but it was interesting to see that disclosing bias in the training data had no effect versus not providing any data details.

The written explanations told the same story. Respondents cited AI performance and clinician involvement as the primary reasons for their choices, with many of them expressing comfort with AI as a tool – but not as a standalone decision-maker.

The Takeaway

Widespread AI adoption requires patient trust, and this study did a great job highlighting the specific areas that should be prioritized for building it.

Microsoft Dragon Copilot Gets AI Upgrades

Microsoft might have had the biggest presence at the biggest health IT conference, and it made sure all the lights in Las Vegas were on Dragon Copilot

Unify. Simplify. Scale. Microsoft’s theme at HIMSS was all about making Dragon Copilot a one-stop-shop for information within clinical workflows. It debuted several new capabilities at the show:

  • Integrated medical content from trusted sources
  • Partner-powered AI apps and agents
  • Proactive ICD‑10 specificity suggestions
  • Expanded role-based experiences for physicians, nurses, and radiologists

Partnering is quicker than building. Rather than developing every Dragon Copilot capability in-house, Microsoft has been leaning on outside partners to round out the platform.

  • Dragon Copilot’s clinical evidence feature is a prime example. It brings medical content and other relevant contextual information in-workflow, all curated through new partnerships with Wolters Kluwer, Elsevier, and other vetted sources.

Microsoft Marketplace fills the gaps. It allows users to add AI partner apps directly into their Dragon Copilot workflows. Picture a modular side panel with insights from folks like: 

  • Regard – surfaces comorbidities and relevant diagnoses 
  • Canary Speech – analyzes voice biomarkers for mental health conditions
  • Humata Health – automates prior authorization processes for clinicians 
  • Atropos – generates personalized real-world evidence 
  • Optum – identifies potential coverage issues and supports claims processing 

All roads lead to scribes. When Microsoft first acquired Nuance for $20M back in 2022, it was its second largest acquisition ever behind LinkedIn, and the core offerings were radiology report automation, dictation, and transcription (with humans still pulling a ton of weight).

  • The product formerly known as Dragon Ambient eXperience is now the backbone of Dragon Copilot, and it’s been adding features at a breakneck pace.
  • Microsoft is looking to make Dragon Copilot everything, everywhere, all at once, and so far new partnerships have been the key to making that happen.

The Takeaway

As every digital health company rushes to add scribing to their platform, the OG scribe is rushing to add everything else. Now it just needs to maintain a unified user eXperience.

Anterior Closes $40M to Take AI to the Largest Plans in the Country

The AI race between payors and providers is healthcare’s Kentucky Derby, and Anterior just closed $40M to help turn the dark horses into the frontrunners.

Anterior uses AI to ease the back-office burden on health plans. It started with a laser focus on prior authorizations, translating huge amounts of unstructured data into the information that’s actually needed to make quicker decisions.

  • When Anterior helps payors deploy AI in their clinical and operational workflows, it doesn’t just dump a bunch of models on them and disappear into the sunset.
  • It embeds its own clinicians and engineers alongside the platform to support its partners, optimize accuracy, and drive a measurable impact.

Trust is a differentiator. Payors are a cautious crowd, and they aren’t exactly known for trusting new friends with their critical workflows. 

  • Anterior’s clinicians are its secret sauce. They make up about 40% of the company, and many of them have even started contributing directly to the platform’s code base.
  • This hands-on support why partners build trust, and that hard-earned resource is what allowed Anterior to take the same tech underpinning its prior auth tools and expand it to other workflows.

New partners lead to new proof points. New proof points lead to new use cases. 

  • Anterior’s early successes – from both its people and technology – have allowed it to quickly land and expand into areas like payment integrity and risk adjustment. 
  • Since closing its $20M Series A in June 2024, Anterior has deployed its AI across major payors like Geisinger Health Plan, and worked alongside enterprise technology partners like HealthEdge to build out key strategic integrations.
  • The platform now supports orgs representing over 50M covered lives, and the fresh funds will help it use those case studies to pry open the door to the biggest national plans in the business.  

The Takeaway

Anterior’s earliest partners had to gamble on an unproven platform without any real-world evidence to back it up. Now, the proof is in the success stories, and Anterior just landed another $40M to go after the largest and most risk-averse payors in the country.

LLMs Still Struggle With Medical Misinformation 

The Lancet Digital Health just published one of the largest-ever stress tests on medical misinformation in LLMs, and it looks like most models still struggle to separate fact from fiction.

Here’s the setup. Researchers probed 20 LLMs with over 3M prompts containing medical information from three different sources: social media posts, simulated clinical vignettes, or real hospital discharge notes with a single fabricated recommendation inserted.

  • Each prompt was presented in multiple versions, once with neutral wording to establish a baseline, then with a series of variations that were emotionally charged or leading.
  • Ten logical fallacies were also used to test how framing influences model behavior, such as appeals to authority (a physician said…) or popularity (everyone agrees that…).

LLMs love fake news. The susceptibility was shockingly high across all models, with the medical misinformation accepted in 32% of the neutral base prompts.

  • That jumped to 46% when the misinformation was embedded in formal discharge notes, but at least the models were more skeptical of the social media content (9%).

Other findings were more counter-intuitive. Eight of the 10 logical fallacies ended up reducing the misinformation acceptance rate rather than increasing it like the authors expected.

  • Only appeals to authority (+2.9 percentage points above the base prompts) and slippery slope prompts (+2.2pp) increased susceptibility, a relatively small impact considering appeals to popularity slashed it by nearly 20pp.
  • Larger models were generally safer, although the language and phrasing had a far greater influence than the parameter count alone. 
  • It was also surprising to see that the medical models performed worse than the general purpose models, with many having weaker lie detectors despite the specialization.

Improving LLM safety is about more than making bigger models. It’s about knowing how information gets presented by actual humans, and having guardrails in place that hold up even when that information is wrong.

The Takeaway

Benchmark performance isn’t real-world performance, and this study provides another reminder that a model’s ability to separate fact from fiction is often more important than its test scores.

Get the top digital health stories right in your inbox