Scribes Show Modest Impact at Major Academics

Ambient scribes are back in the spotlight after a new study in JAMA confirmed that they move the needle on productivity metrics, but the jury’s still out on whether that’s the best yardstick for success.

This was a big one. The study examined the impact of AI scribe use on over 1,800 clinicians at five major academic medical centers from 2023 to 2025.

  • The academics: MGB, YNHH, UCSD, UCSF, UC Davis 
  • The scribes: Abridge, Ambience, Microsoft DAX Copilot

Here’s what they found. Clinicians who used AI scribes:

  • Saved 16 minutes of documentation time per eight hours of patient care 
  • Saved 13 minutes of EHR time 
  • Could see one additional patient every two weeks
  • Saw no significant impact on EHR timeoutside of working hours

Usage patterns helped color in the story. While 1,800 AI scribe adopters is one of the largest samples out there, the 6,770 control clinicians were also offered scribes and opted not to use them.

  • The biggest gains went to the biggest users. Clinicians who used the AI scribe for over 50% of visits experienced twice the reduction in EHR time and 3x the reduction in documentation time, yet only 32% of adopters fell into this bucket.

What’s counted? What matters? This isn’t the first study we’ve covered that scores AI scribes based on metrics that researchers can easily measure (EHR time, visits), which isn’t necessarily the same as the metrics that matter most to patients or clinicians.

  • Although this study solidifies that scribes can cut documentation time, the question now is if that time gets reinvested in ways that improve care and outcomes for patients.
  • The results also confirm that the mechanism of action for scribes reducing burnout isn’t through time savings, but it’s still unclear whether it’s from having a couple more moments to take a deep breath throughout the day or from reallocating the extra minutes to things that feel valuable.

The Takeaway

This study offers the most definitive real-world data yet that AI scribes have a modest impact on productivity metrics, but it also confirms that cleaner notes aren’t the only key to improving healthcare experiences.

Qualified Raises $125M to Build AI Infrastructure

In an era of isolated AI pilots, Qualified Health is building the infrastructure to connect the dots.

AI is the star of enterprise transformation. Health systems are looking to deploy and scale AI across their entire organization, and Qualified just raised $125M of Series B funding to make sure every new agent fits into a cohesive constellation.

The core platform has four distinct layers:

  • A data foundation that turns the EHR and external sources into an AI-ready bedrock.
  • A layer that lets hospitals build and deploy AI tools without always starting from scratch.
  • A layer that turns those tools into AI apps and agents deployed directly into workflows.
  • A layer that keeps governance, monitoring, and evaluation at the center of everything.

Qualified doesn’t leave AI to chance. It embeds forward-deployed product leaders alongside health system teams to identify high-priority needs, deploy solutions quickly, and iterate based on actual feedback in the trenches.

That has a couple of major benefits:

  • AI solutions are purpose-built for specific operational problems rather than mass market appeal.  
  • The tight feedback loop allows Qualified to iterate faster than it would be able to with a traditional implementation cycle, which shortens the timescale needed to improve its deployments and demonstrate a measurable impact.

The proof is in the pudding. At the University of Texas Medical Branch, Qualified reportedly generated a $15M measurable run-rate impact within the first six months.

  • That’s an eye-popping number to get on record, and it apparently stemmed from “a real willingness to dive deep” alongside UTMB clinical teams to deploy multiple assistants and automated workflows.
  • Qualified already supports systems representing about 7% of U.S. hospital revenue, and the next chapter is about deepening those partnerships and scaling responsibly.
  • Big ambition also means big competition, and Qualified will be up against everyone from Innovaccer to Epic if it wants to become healthcare’s AI platform of choice.

The Takeaway

Hospitals aren’t looking to AI for incremental improvement. They’re looking to AI to transform how they deliver care, and Qualified just landed another $125M to be the infrastructure that makes that possible.

Google AMIE Shines in First Real-World Study

The gap between benchmark scores and real-world performance has been the theme of the year in AI research, so Google was right on cue with its first prospective clinical trial for AMIE using actual patients. 

Meet the Articulate Medical Intelligence Explorer. AMIE is Google’s flagship “medical AI researcher,” and it teamed up with Beth Israel Deaconess Medical Center to gauge performance in real clinical workflows.

  • 100 patients completed an AMIE interaction before their primary care visit, with AMIE taking medical histories and equipping patients with potential diagnoses to discuss with their PCP.
  • PCPs received the transcript, summary, and AMIE’s management plan prior to the visit. All interactions were monitored live by physicians trained to intervene if safety criteria weren’t met.

AMIE got a gold star. Not only were there zero safety stops across all 100 interactions, patients reported that their attitudes toward AI significantly improved after chatting with AMIE.

  • AMIE’s differential included the correct final diagnosis in 90% of cases (per chart review 8 weeks post-encounter), with 75% top-3 accuracy.
  • PCPs using AMIE reported increased visit preparedness in 75% of cases, as well as potential behavior change in nearly 60%.
  • The quality of AMIE’s differential diagnosis and management plan appropriateness was similar to PCPs, although PCPs won on management plan practicality and cost-effectiveness.

Other findings were less obvious. PCPs had the chart, the physical exam, and the pre-visit transcript, yet AMIE still matched them on differential quality and management safety without taking a single peak at the EHR.

  • That speaks to the ceiling (or lack there-of) for structured AI history-taking, and shows that AI is gearing up to improve patient care in more ways than just making predictions.
  • The fact that PCPs reported better visit preparedness and potential behavior change in over half of cases also highlights how AI can augment – not just replace – clinical reasoning.

The Takeaway

The distance between the bench and bedside is getting shorter, and Google’s AMIE results suggest that conversational AI in primary care is closer to reality than most people might think.

How to Build Patient Trust in Medical AI

AI might move at the speed of trust, but new research in JAMA Network Open shows that trust only moves at the speed of accuracy.

The study had a solid setup. To determine the factors currently driving patient trust in AI, researchers presented 3,000 U.S. adults with a pair of hypothetical AI-assisted visits for a moderate-risk rash. 

  • Each visit had six randomized attributes, such as whether or not a doctor was present, how well the AI performs relative to human clinicians, and various AI governance mechanisms.

AI performance came out on top by a wide margin. Respondents cared more about how well the AI performs than FDA approval, governance, and even having a doctor in the room.

  • The biggest difference came from AI performing better than a specialist, which increased the likelihood of choosing that visit by 32.5%.
  • AI performing at the same level as a specialist boosted visit preference by 24.8%, slightly more than having AI that performs as well as a general practitioner (19.1%).
  • Having an actual doctor present surprisingly only swayed visit preference by 18.4%.

Governance factors also moved the needle. They just didn’t move it much.

  • FDA approval for the AI increased visit preference by a modest 11.1%.
  • Mayo Clinic AI certifications apparently carry just as much weight – also coming in at 11.1%.
  • Local hospital certifications for the AI only gave visits a 7.8% lift.

AI data quality was important. It just wasn’t as convincing as AI performance. 

  • AI that had nationally representative training data boosted visit preference by 11.9%, but it was interesting to see that disclosing bias in the training data had no effect versus not providing any data details.

The written explanations told the same story. Respondents cited AI performance and clinician involvement as the primary reasons for their choices, with many of them expressing comfort with AI as a tool – but not as a standalone decision-maker.

The Takeaway

Widespread AI adoption requires patient trust, and this study did a great job highlighting the specific areas that should be prioritized for building it.

Microsoft Dragon Copilot Gets AI Upgrades

Microsoft might have had the biggest presence at the biggest health IT conference, and it made sure all the lights in Las Vegas were on Dragon Copilot

Unify. Simplify. Scale. Microsoft’s theme at HIMSS was all about making Dragon Copilot a one-stop-shop for information within clinical workflows. It debuted several new capabilities at the show:

  • Integrated medical content from trusted sources
  • Partner-powered AI apps and agents
  • Proactive ICD‑10 specificity suggestions
  • Expanded role-based experiences for physicians, nurses, and radiologists

Partnering is quicker than building. Rather than developing every Dragon Copilot capability in-house, Microsoft has been leaning on outside partners to round out the platform.

  • Dragon Copilot’s clinical evidence feature is a prime example. It brings medical content and other relevant contextual information in-workflow, all curated through new partnerships with Wolters Kluwer, Elsevier, and other vetted sources.

Microsoft Marketplace fills the gaps. It allows users to add AI partner apps directly into their Dragon Copilot workflows. Picture a modular side panel with insights from folks like: 

  • Regard – surfaces comorbidities and relevant diagnoses 
  • Canary Speech – analyzes voice biomarkers for mental health conditions
  • Humata Health – automates prior authorization processes for clinicians 
  • Atropos – generates personalized real-world evidence 
  • Optum – identifies potential coverage issues and supports claims processing 

All roads lead to scribes. When Microsoft first acquired Nuance for $20M back in 2022, it was its second largest acquisition ever behind LinkedIn, and the core offerings were radiology report automation, dictation, and transcription (with humans still pulling a ton of weight).

  • The product formerly known as Dragon Ambient eXperience is now the backbone of Dragon Copilot, and it’s been adding features at a breakneck pace.
  • Microsoft is looking to make Dragon Copilot everything, everywhere, all at once, and so far new partnerships have been the key to making that happen.

The Takeaway

As every digital health company rushes to add scribing to their platform, the OG scribe is rushing to add everything else. Now it just needs to maintain a unified user eXperience.

Infinite Healthcare, What’s It Worth?

Healthcare is one of the few industries where rising usage is treated as a failure, and a16z just published some solid arguments for why that framing might be completely backwards.

Everybody wants to be healthy. The demand for services that help people get and stay healthy is almost limitless, but the supply has always been limited by clinician time and cost.

  • AI balances the equation. It expands our capacity to provide care and drives down its marginal cost, and a16z makes the case that AI opens the door for us to consume an effectively unlimited amount of proactive care – consistent coaching, continuous monitoring, and earlier interventions.

Health is invaluable. As it stands today, when a payor sets reimbursement for a medical service, the rate assumes a certain volume to assess the overall budget for that service.

  • Price x Quantity = Total Medical Expense
  • If AI sends the quantity of the service through the roof while holding the price constant, the total medical expense would skyrocket.

The question isn’t how to avoid this. It’s “what do we get for it?” 

  • Half of all U.S. health expenditures go to 5% of the population, and AI that helps avoid hospitalizations or acute events can generate huge savings from a few patients.
  • Healthier people are also more productive. If AI can help just 1% of the 160M workers in the U.S. work an additional year because they’re healthy, that’s worth $260B in GDP.

How do you price AI for abundant consumption? In a world with truly proactive AI-driven care, delivering more care earlier is what actually bends the cost curve. Pricing shouldn’t punish usage.

a16z looks to other industries as good examples for healthcare:

  • Telecom used to charge for voice and data by the minute because network capacity was scarce, but pricing shifted to unlimited plans as infrastructure improved. Usage went up significantly, but the total market value grew alongside consumption.
  • Music followed the same arc. iTunes sold songs one at a time. Spotify sold access instead. People started listening to more songs, and consumer surplus expanded.

The Takeaway

As AI expands care capacity and access, consumption naturally increases. Affordable access leads to explosions in usage, and business models shift to subscriptions over per-unit pricing. Other industries have made the transition before, and a16z thinks it might be healthcare’s turn.

Amazon Health Connect Sends AI to the Back Office

If the competition for the back office was already hot, it’s a certified wildfire after last week’s debut of Amazon Health Connect

Amazon is pitching Amazon Connect Health as a purpose-built agentic AI solution for the administrative work that gets in the way of care. That’s definitely not fun to read for all the companies that had the same tagline on their booth at ViVE.

It comes with five capabilities straight out of the box: 

  • Patient verification
  • Appointment scheduling 
  • Pre-visit summaries
  • Ambient documentation
  • Medical coding 

What’s the core use case? AWS Director of Healthcare AI Naji Shafi says it’s the entire patient journey.

  • When a patient calls to book an appointment, Amazon Connect Health answers immediately, confirms their identity, checks their coverage, and lines up the visit while they’re still on the line.
  • Before the visit, it reviews their complete medical history across care settings, then surfaces previsit insights like active conditions or trends that might be relevant to closing care gaps.
  • During the visit, it drafts clinical notes for provider review in real-time, with every detail linked back to the moment in the conversation where it was discussed.
  • After the visit, it generates patient-friendly summaries and the medical codes needed for billing, allowing the visit to be payor-ready and submitted within minutes.

But wait, there’s more. Amazon Connect Health integrates natively with Epic, and connects to 100+ EHRs and 35+ HIEs through data integration partners like Redox.

  • It’s also built entirely on AWS HealthLake, the cloud giant’s FHIR data repository that’s now getting new agentic capabilities to help convert records into standard formats.

Early users love it. Amazon One Medical was the perfect sandbox for polishing Amazon Connect Health in clinical settings before opening it to outside partners. It shows in the results.

  • UC San Diego Health is saving a minute per call, diverting 630 hours a week from patient verification to direct support, and slashed call abandonment by 30%.
  • Netsmart’s EHR supports more than 1,300 community provider orgs, and it saw ambient documentation adoption skyrocket 275% – and better staff retention as a result.

The Takeaway

There were already tons of agentic AI solutions competing to automate healthcare’s administrative waste, and now there’s one that’s bankrolled by the biggest bookstore in human history. It’s a crowded space, but $1 trillion per year is also enough bloat to go around.

PHTI Breaks Down Barriers to Clinical AI

PHTI’s new Clinical AI report delivered exactly what we’ve come to expect from their research: top tier industry analysis through the lens of actual stakeholders.

They assembled the A Team for this one. The report was built from an in-person workshop that PHTI convened with senior industry leaders – from health systems and health plans to tech firms and federal agencies – to explore what’s needed to safely scale clinical AI.

  • The workshop underscored the policy, reimbursement, and evidence gaps holding back adoption, with several key themes emerging from the discussion around their example use cases (hypertension management and mental health chatbots).

Theme 1: Evidence standards should compare AI to current standards of care and scale with risk.

  • That means comparing AI to the care that patients actually receive today rather than idealized care, then having different standards that align with the clinical risk of using the tool.
  • Highlight: Evidence should assess whether the full workflow (including multiple models, devices, and human oversight) improves outcomes, not merely model performance.

Theme 2: Performance benchmarks should be based on clinical outcomes, and safety standards should adapt as the evidence grows.

  • Ambiguity around what constitutes “good” performance is a persistent barrier. Metrics need to be anchored to specific clinical outcomes instead of vague process measures.
  • Highlight: Across both use cases, participants emphasized the need not only to set benchmarks but to set minimum safety floors, which could adjust dynamically over time on the basis of observed outcomes, changing patient risk profiles, & emerging evidence.

Theme 3: New technologies may be initially tested in lower-risk populations, but should scale quickly to high-risk populations to maximize impact.

  • Low-risk patients are tempting on-ramps, but AI’s greatest benefits come from reaching the high-need patients, and reaching them carries higher evidence expectations and more clinical risk.
  • Highlight: For mental health, engagement and retention are huge barriers to treatment. Participants cautioned that overly restrictive AI deployments risk limiting access and instead emphasized the need for appropriate care routing following LLM engagement.

The Takeaway

Even the most effective clinical AI tools still have plenty of questions to address before adoption can scale, and PHTI just crowdsourced some promising answers straight from the boots-on-the-ground in the healthcare trenches.

LLMs Still Struggle With Medical Misinformation 

The Lancet Digital Health just published one of the largest-ever stress tests on medical misinformation in LLMs, and it looks like most models still struggle to separate fact from fiction.

Here’s the setup. Researchers probed 20 LLMs with over 3M prompts containing medical information from three different sources: social media posts, simulated clinical vignettes, or real hospital discharge notes with a single fabricated recommendation inserted.

  • Each prompt was presented in multiple versions, once with neutral wording to establish a baseline, then with a series of variations that were emotionally charged or leading.
  • Ten logical fallacies were also used to test how framing influences model behavior, such as appeals to authority (a physician said…) or popularity (everyone agrees that…).

LLMs love fake news. The susceptibility was shockingly high across all models, with the medical misinformation accepted in 32% of the neutral base prompts.

  • That jumped to 46% when the misinformation was embedded in formal discharge notes, but at least the models were more skeptical of the social media content (9%).

Other findings were more counter-intuitive. Eight of the 10 logical fallacies ended up reducing the misinformation acceptance rate rather than increasing it like the authors expected.

  • Only appeals to authority (+2.9 percentage points above the base prompts) and slippery slope prompts (+2.2pp) increased susceptibility, a relatively small impact considering appeals to popularity slashed it by nearly 20pp.
  • Larger models were generally safer, although the language and phrasing had a far greater influence than the parameter count alone. 
  • It was also surprising to see that the medical models performed worse than the general purpose models, with many having weaker lie detectors despite the specialization.

Improving LLM safety is about more than making bigger models. It’s about knowing how information gets presented by actual humans, and having guardrails in place that hold up even when that information is wrong.

The Takeaway

Benchmark performance isn’t real-world performance, and this study provides another reminder that a model’s ability to separate fact from fiction is often more important than its test scores.

The Patient You Lost Before They Ever Walked In

Thousands of patients are referred for procedures but vanish into the void because no one called them back within 48 hours.

By Shani Fargun, VP Healthcare at StackAI
Sponsored by StackAI

While the headlines at major cardiology conferences focus on AI that can read angiograms or predict arrhythmias, a quieter, unsexy revolution is happening in the back office, and it might be the key to actually using those advanced clinical tools.

The biggest bottleneck in modern cardiology is administrative friction. It’s the death by 1,000 faxes that occurs when a patient is referred for a TAVR, but the pre-op workup is trapped in a PDF from an external hospital. It’s the prior authorization that sits in a queue for weeks because a specific keyword was missing from the submission.

  • According to the AMA, 94% of physicians report that these administrative hurdles lead to delays in accessing necessary care.

Healthcare has a data problem. The industry runs on unstructured data. Referral letters, handwritten call notes, faxed labs, and denial letters make up the bulk of cardiac operations.

  • Nearly 80% of all healthcare data is unstructured and inaccessible to traditional automation. This forces highly trained clinical staff to spend hours acting as data entry clerks rather than treating patients.

Agentic AI is the solution. Agentic AI isn’t a chatbot or a diagnostic model, it’s a digital worker. 

  • Unlike traditional software that waits for a human to input data, Agentic AI can autonomously perform tasks across different systems.

How can agentic workflows change modern practices?

  • Patient Scheduling & Follow-Up  Agents autonomously handle the last mile of care coordination, reaching out to patients to schedule diagnostic testing, confirming procedure dates, and answering routine logistical questions without burdening clinical staff. This directly combats referral leakage, which costs health systems an estimated $971,000 per physician annually. 
  • Automated Prior Auth – Agents cross-reference patient charts against payer-specific guidelines to draft authorization requests that minimize technical denials. Download the free whitepaper of use cases for healthcare here.
  • Referral Velocity – Agents ingest incoming faxes and emails, extract clinical criteria, and draft the patient chart for review: reducing time-to-appointment from weeks to days.

The Takeaway

The future of healthcare starts with better flows. By automating the administrative burden, we allow interventionalists to focus on what they do best: treating patients.

Request a demo to see customized use cases for your organization here.

Get the top digital health stories right in your inbox