UpDoc Lands First FDA Clearance for Patient-Facing AI 

UpDoc just landed the first FDA clearance for a patient-facing AI model, which acts as a “concierge doctor” to support patients between visits. Good news for the human docs reading this – it isn’t going after your job just yet.

What’s UpDoc? It’s a clinical AI platform that unifies clinical guidelines, longitudinal patient context, and physician governance to safely execute real-world care workflows.

What isn’t UpDoc? An AI doctor.

  • The 510(k) clearance had a narrow scope. It allows the AI to call or message patients between visits and adjust their insulin doses within parameters set by human clinicians.
  • UpDoc says its AI will ease doctors’ workloads and help patients better manage illnesses like Type 2 diabetes. 

The data backs that up. A study in JAMA Network Open saw 32 patients with T2D randomized to receive support from UpDoc (daily voice AI check-ins to record blood glucose and adjust insulin) or standard care (AKA log their own data until they see their doctor in person).

  • The AI group hit their target blood glucose in 15 days, compared to the standard care group where less than half got there at all within the 8 week study period.
  • That trial provided the clinical foundation for the now-cleared solution, which is set to be piloted at Cleveland Clinic, UCSF Health, and Allegheny Health Network.

UpDoc is taking the road less traveled. It’s not the only AI startup in this wheelhouse, but so far it’s one of the only ones that doesn’t seem to be actively avoiding FDA regulation. 

  • The most notable example is Doctronic, which has been testing its AI prescription tech through a state-run program in Utah rather than seeking a full-fledged authorization.
  • That’s an easier path to market than vaulting over FDA hurdles, but it doesn’t get you the “world first” feather in your cap that now belongs to UpDoc.

So, now what? The FDA has long debated how to regulate AI, and UpDoc could be the first sign that they’re getting comfortable enough with the tech to give the green light to more models.

  • With the first clearance out of the way, other AI developers also have an established precedent and a blueprint to follow suit.

Don’t forget about the docs. Besides the regulatory shakeout, it’ll be equally interesting to see how this new breed of AI ends up in the hands of clinicians.

  • We saw OpenEvidence fold a new biomarker for heart disease into its platform just last week, and the most direct path to a wide distribution for many soon-to-be-cleared AI tools could be similar licensing partnerships.
  • Plenty of companies already have a massive user base and are actively expanding the clinical scope of their platforms – Abridge, OE, Doximity, the list goes on for a while. It feels like licensing models from the UpDocs of the world is a natural next step after all the journal partnerships we’ve been seeing now that FDA clearance is part of the picture.

The Takeaway

The FDA finally cleared its first patient-facing clinical AI model, and UpDoc might have been the first domino, but it definitely won’t be the last.

Assort Closes $120M to Scale Voice AI Across Healthcare

If you needed any more proof that communication friction is one of the biggest pain points for patients and providers, look no further than Assort Health’s just-closed $120M Series C – its third funding round in 18 months.

Assort started with a simple thesis. Unlock the front door of healthcare, and the rest will follow. Assort originally aimed its voice AI agents at scheduling because it meant solving for two key ingredients needed to solve everything else downstream: 

  • The care protocols required to handle that first interaction.
  • The patient communication data that flows into the rest of the journey.

The first call is an important moment. Mistakes here mean the patient never comes back, and Assort’s edge in preventing that is its Synapse agentic model.

  • Synapse learns specialty workflows across every deployment, then simulates the edge cases to stress test them before any agents go live.
  • That allows even non-technical teams to safely implement Assort’s agents at scale, which fuels an AI development flywheel that’s already learning from 190M patient interactions, 62M care protocols, and 1.6M decision pathways.

Assort covers the entire patient journey. What began as the first voice AI agent to schedule a specialty appointment has grown into a full-fledged voice AI platform that includes:

  • Concierge – handles inbound calls, triage, lab requests, med refills, scheduling, eligibility checks, and intake.
  • Activate – reaches patients proactively to close referral loops and act on care gaps, recover no-shows, and resolve payments.
  • Orchestrate – runs the operational work behind each visit and writes every detail back to the EHR.
  • Empower – equips staff with an AI copilot to manage complex patient access needs in real time.

Patient Journey Memory ties it all together. The capability is built on three pillars.

  • Each patient gets a personal AI agent that knows their context and preferences so they don’t have to keep repeating the same story every time they interact with their provider.
  • The agents share the same data and talk to each other, so care gaps surface wherever the patient happens to engage.
  • Having a continuous journey across every interactions allows the platform to activate patients when they’re high intent.

Next stop: everywhere. Every new tech generation sees a flood of new solutions, then only a few survive. Voice AI is about to hit that same shakeout, and Assort plans on sticking around.

  • The funding was earmarked for bringing on veteran C-levels to make that happen, and expanding into health systems ranging from community-based organizations to the biggest academic medical centers in the country.
  • Major systems like John Muir Health are already signed on as demand grows for platforms that can support increasingly complex ambulatory operations – the exact kind Assort is uniquely tuned to solve.

The Takeaway

Assort is looking to become the voice AI transformation partner for every healthcare provider in the country, and if its funding tempo is any indication, it’s moving with enough urgency to actually pull it off.

New Studies Show AI Outperforms Physicians, Just Not at Medicine

In case last week’s AI drama wasn’t hot enough, a pair of new studies in Nature cranked up the heat by finding that AI agents beat physicians on ER and care management tasks – just not real ones.

“Towards autonomous medical artificial intelligence agents.” The first study took a look at MIRA, an AI agent developed in Germany that operates inside a sandboxed EHR environment.

  • Using 574 real emergency department cases, researchers had MIRA chat with another patient agent and execute entire care workflows, such as investigating diagnoses, ordering labs, and triaging for hospital admission. 

The headline: MIRA significantly outperformed four board-certified physicians. The agent had higher overall diagnostic accuracy (87.8% vs. 78.1%), was better at ordering correct procedures like laparoscopic appendectomy (53.5% vs 38.3%), and had 35% better guideline alignment.

The reality: ER doc Graham Walker, MD, put it perfectly on LinkedIn: “There is no way in hell that humans mismanaged almost 30% of appendicitis cases, the most common ‘surgical emergency’ that we’ve all seen hundreds of in our career.”

  • It turns out the EHR sandbox needed 21 keystrokes to get this right, and the physicians failed unless they explicitly searched and entered a “laparoscopic appendectomy.” AI is built for that, humans not so much.

“Towards conversational AI for disease management.” The second study explored whether Google’s AMIE agent could expand from pure diagnostics to longitudinal care management.

  • The blinded study pitted AMIE against 21 primary care physicians on 100 multi-visit cases, with the agent pulling live guidelines and drug references to produce structured management plans.

The headline: AMIE’s care plans were better than PCPs across the board. The agent notched higher marks on management reasoning, precision of investigations, and guideline alignment.

The reality: AMIE operated in a world without prior auths, without formulary restrictions, and without social needs that patients didn’t want to bring up. The authors didn’t pretend otherwise.

The Takeaway

This might sound familiar, but these studies show that MIRA and AMIE performed well in ideal scenarios, not in the messy trenches of real-world medicine. That said, the results aren’t important because AI beat a benchmark, they’re important because AI took another big step toward “delivering actions” instead of just “delivering answers.”

General-Purpose LLMs Outperform Healthcare-Specific Models

We might have just gotten our spiciest study of the year after new findings in Nature Medicine showed that general-purpose LLMs outperform specialized healthcare models straight out of the box.

It was a battle of the bots. Researchers pitted OpenEvidence and UpToDate Expert AI against three frontier models that anyone with a web browser can pull up in two seconds: GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6.

The models were tested across three domains:

  • medical knowledge (MedQA)
  • expert clinician alignment (HealthBench)
  • 100 real physician queries (RCQ) scored by 12 blinded clinicians  

It was a clean sweep. The general-purpose LLMs outperformed the specialized models on all three evals, and by a healthy margin. This chart gets the point across.

  • On MedQA, Gemini led the pack with 97.4% accuracy (vs. 89.6% for OE and 88.4% for UTD). Fun fact, the frontier models had a huge advantage here since their training data included these exact questions (and answers).
  • On HealthBench, GPT-5.2 dominated with an 88%. It’s almost like OpenAI invented the benchmark.
  • The RCQs were probably the most clinically meaningful component, and all three frontier models took the podium here as well. It was a bit odd that the researchers didn’t share the specific questions, and OE definitely thought so too.

OpenEvidence hit back hard and fast. It went straight to its socials to let the world know that the study was not only poorly designed and biased, but that the authors had reached out for API access to help build a competing product. Request denied.

  • OE also pointing out the training data contamination issue with MedQA, and critiqued HealthBench for scoring responses based on subjective stylistic choices (in one example OE scored 20% “worse” because it didn’t use a specific email header).
  • The cherry on top was OE revealing that the real-world clinician queries were only added after peer reviewers flagged the study for having weak evidence. Big if true.

Obligatory disclaimer: the models were evaluated back in February, and the performance gap could easily be even wider today. 

The Takeaway

OpenEvidence and UpToDate didn’t become successful by being better AI developers than OpenAI and Anthropic. They did it by doing the things that don’t show up in benchmarks – curating sources of verifiable evidence, wrapping them in an interface that docs actually enjoy using, and earning their trust one question at a time. If anything, this study confirmed that those matter now more than ever.

Abridge Unveils New Platform, Teams Up With Lilly and Nvidia

Patients, platforms, Lilly, and Nvidia. Abridge’s first keynote had it all.

There were enough major announcements to fill an entire issue of DHW, so here’s the abridged version of the top stories to come out of NYC.

The new platform stole the show. Abridge unveiled “the first AI-native clinician intelligence platform” organized around patients, built for clinicians, and designed to help health systems.

  • Before the visit: The platform surfaces care gaps and relevant clinical context so clinicians can address what matters during the visit instead of discovering it in retrospective chart reviews.
  • During the visit: Abridge suggests discussion topics while delivering evidence-based answers to clinical questions from a growing content library that includes new specialty-focused partners like AAFP, AAN, ADA, and ASCO.
  • After the visit: Abridge generates documentation, flowsheets, patient summaries, orders, and billing codes (soon to be fine-tuned through a new partnership with AHIMA).

“The base unit of healthcare is a clinician caring for a patient.” As Abridge pushes into new models of care delivery, its platform will provide the connective tissue between the clinical workflows where care actually gets delivered and outside orgs like payers or life sciences firms.

  • The keynote highlighted some key examples: Cigna was on stage discussing how embedding AI in clinical workflows has the potential to unlock real-time claims adjudication, and Aetna shared how it could help realize the promise of VBC.
  • More than 300 health systems are already live, including a just-announced rollout at Northwestern Medicine.

Eli Lilly is buying into the vision. The pharma giant made a strategic investment in Abridge’s next chapter, and even though the keynote was light on details, the move started to add up after seeing one of the new capabilities coming to the platform: clinical trial screening.

  • By comparing clinical guidance with patient-provider conversations in real-time, Abridge can surface relevant trials directly in the encounter – the moment it matters most. 
  • They didn’t mention a check size, but big opportunities attract big investments, and identifying candidates while initiating screening at the point of care sounds huge.

Last, but certainly not least, Nvidia. Abridge is teaming up with Nvidia to develop a first-of-its-kind foundation model for clinical conversations that’s trained, shaped, and evaluated against real-world conditions.

  • We’ll have to wait until later this year to see it in action, but a little pre-, mid-, and post-training magic with Abridge’s de-identified clinical data will apparently help make it the first model that can “reason clinically from its foundation.”

The Takeaway

If the keynote made one thing crystal clear, it’s that Abridge’s platform doesn’t revolve around AI documentation. It revolves around patients, and every new feature is purpose-built to prove it.

Patients Want AI, So Long As There’s No Copay 

New research in npj Digital Medicine suggests that patients might be warming up to medical AI, at least if it’s less expensive than seeing an actual doctor.  

Here’s the setup. Johns Hopkins researchers recruited 248 U.S. adults with type 1 diabetes, then presented them with scenarios where they were due for an annual diabetic eye screening.

  • Diabetic retinopathy is the leading cause of blindness among working-age adults, and autonomous AI tools that can diagnose the disease from retinal images are already cleared by the FDA and in clinical use.
  • In each scenario, one of these autonomous AI tools was made available as an alternative to a specialist referral.

The catch was the copay. Participants were randomized to have the AI offered with either a $50 copay, or with the copay waived by their insurer or the AI developer.

Fifty bucks is fifty bucks. More than 80% of participants opted for the AI tool when the copay was waived, compared to 43% who chose AI when the copay wasn’t waived.

  • Not only did more participants opt for the AI screening when there was no copay, but participants also perceived the AI as more effective.
  • It didn’t matter whether the copay was waived by the AI developer or their insurer.

There was one major caveat. Patients who chose AI over a traditional screening with a human specialist were far more likely to seek reconfirmation from their doctor after getting the results.

  • The AI group was nearly 3x more likely to seek reconfirmation after abnormal results, and still nearly 50% more likely to ask for a second look after getting normal results.

The trust isn’t there yet. AI might be able to give patients results, but they still want to hear from a medical professional to verify those results.

  • The authors point out that human oversight is still clearly a top priority for patients, and that “it’s crucial to address the persistent preferences for provider follow-up and verification, even when AI results are normal.”

The Takeaway

Financial incentives remain undefeated, but this study confirmed that you can’t put a price on trust with AI in medicine.

Ad-verse Effects in Consumer-Facing AI

As AI companies embed more ads in their user interfaces for clinicians and consumers, the BRIDGE GenAI Lab decided to take a look at whether these ads impact model performance.

Turns out, they do. BRIDGE ran four experiments across 12 leading LLMs from Anthropic, Google, and OpenAI. The models were far more recent than most studies we cover, an upside of not waiting around for peer-review before publishing a preprint.

  • Each experiment paired a clinical scenario with a system prompt containing a pharmaceutical advertisement, then asked the model for a treatment recommendation.

Ads definitely moved the needle. Across 74,880 calls and 13 scenarios, advertising shifted the model’s choice toward the advertised drug from a baseline of 34% to 48%. 

  • That’s a jump of +12.7 percentage points on average.

The LLMs had some nice range. Model bias varied widely by developer.

  • Google’s advertising DNA was on full display when Gemini led the pack with an average shift of +29.8 percentage points toward the advertised drug. 
  • Five models from OpenAI were swayed by an average of +10.9 pp.
  • Anthropic’s models were the most resilient at +2.0 pp, and the ever-skeptical Opus 4.6 actually steered away from the promoted drug by -3.8 pp.

Three experiments contrasted three different conditions. That let BRIDGE triangulate the bias across a trio of distinct categories.

  • Equipoise (+12.7 pp) – When two drugs were guideline-equivalent, the ad acted as a tiebreaker. The output was clinically correct, but biased.
  • Suboptimal Drug (+0.6 pp) – When the advertised drug was clinically inferior, models resisted. Only 4.4% of responses chose the suboptimal advertised option.
  • Wellness Supplements (-0.6 pp) – For supplements lacking evidence, endorsement decreased. Anthropic models actively pushed back at -2.4 pp.

The picture was consistent. Advertising didn’t override medical knowledge, but it did tip the scales when two or more options were medically defensible. 

  • Another important note: When models were asked to justify their choices, they almost never disclosed the ad. If they chose the advertised drug, the justification echoed the ad in 52.7% of cases.

The Takeaway

BRIDGE just showed why the real harm with AI advertising might not be patients receiving dangerous drugs. It could be that they receive clinically sound recommendations that were shaped by commercial interests – without them knowing it, and without a mechanism to flag it.

OpenAI o1 Outperforms Physicians on Clinical Reasoning Tasks

A landmark study in Science found that OpenAI’s o1 series outperformed human physicians at multiple clinical reasoning tasks, but that doesn’t mean it’s time to hang up the scrubs just yet.

Researchers at Harvard and Beth Israel Deaconess Medical Center designed the study to evaluate whether LLMs are ready to do what physicians do on a daily basis: review messy patient charts and use that data to determine diagnosis and next steps.

  • They evaluated o1 on clinical cases ranging from patient vignettes to second opinions on 76 real-world ED assessments, which included all the noise and incomplete information that clinicians routinely encounter in the EHR.
  • The refreshingly well-designed study also incorporated a blinded evaluation with two attending physicians at BIDMC and GPT-4.

o1 came to play. On clinical vignettes evaluating management reasoning, o1-preview scored a median of 86%. Not too shabby.

  • It outperformed GPT-4, humans with GPT-4, and humans with conventional resources like UpToDate – all of which scored below 45%.

The ED cases were even more impressive. o1 offered second opinions about the diagnosis at three points along the patient’s ED journey:

  • At triage, o1 gave an exact or very close diagnosis in 67% of cases (when information in the record dump was most limited). The two physicians hit 55% and 50%. 
  • o1 still outperformed the physicians when given all the data collected by the end of the ED encounter.
  • It was only when the physicians were given the most information possible to inform their diagnosis – at the time the patient would have been admitted to the hospital – that the scores finally converged.

The cherry on top? Physician raters couldn’t tell whether the differentials came from o1 or a human. One rater couldn’t tell in 83.6% of cases, the other in 94.4%. 

  • The authors were quick to mention that these results don’t mean AI is ready to replace human physicians. They mean it’s time for rigorous research into how AI can augment care teams, serve as a second opinion, and become a safety layer for clinicians.

The Takeaway

o1 outperforming a couple internists at triage isn’t quite Deep Blue beating Gary Kasparov at chess, but it’s a step in that direction – especially considering OpenAI’s performance jump in just the last week (let alone since o1 launched in 2024).

AI Moves From Proof-of-Concept to Proof-of-Return

Healthcare can cover a lot of ground when it’s moving at the speed of AI, and a new report from McKinsey found that the AI conversation is quickly shifting from proof-of-concept to proof-of-return.

The analysis was based on a survey of U.S. healthcare execs spanning payors, providers, and health services/technology groups.

AI adoption is skyrocketing at all of them. For the first time since McKinsey began tracking the metric in 2023, the orgs that have already implemented GenAI outnumbered those that haven’t.

  • Half of respondents have deployed at least one GenAI use case at their organization, up from just 25% two years ago. Here’s a nice graphic on AI adoption by org type.
  • McKinsey found that leadership teams are no longer questioning whether and where GenAI is relevant, they’re focusing on how it can be used responsibly at scale.

Agents are also building momentum. Despite being the new kid on the AI block, 19% of orgs reported that they’ve deployed agentic AI capabilities.

  • That’s not a huge percentage considering all the new agents we’ve been covering, but another 51% of orgs are actively pursuing agentic AI proofs-of-concept.

Administrative efficiency is the priority. This chart breaks down the areas that respondents see the most potential for GenAI and multiagent systems.

  • 87% ranked administrative efficiency as their leading GenAI use case.
  • 76% said it was also their top priority for multiagent systems.
  • Software infrastructure and engagement trailed as distant contenders for both categories.

Adoption varies by org type. Here’s the overview.

  • Providers are leaning in on clinical productivity (54% are using GenAI to help).
  • Payers are prioritizing administrative efficiency (34%).
  • Health services and tech firms are using GenAI as software infrastructure (52%).

Adoption barriers had more overlap. Across all org types, the chief concerns with GenAI were difficulty integrating with existing workflows, risk/liability, and inaccuracies/bias.

The other shared belief? Nobody implements AI for fun. Everyone expects an ROI.

The Takeaway

AI has arrived in a big way, and McKinsey’s report confirmed that ROI is now the name of the game in every corner of the industry.

Scribes Show Modest Impact at Major Academics

Ambient scribes are back in the spotlight after a new study in JAMA confirmed that they move the needle on productivity metrics, but the jury’s still out on whether that’s the best yardstick for success.

This was a big one. The study examined the impact of AI scribe use on over 1,800 clinicians at five major academic medical centers from 2023 to 2025.

  • The academics: MGB, YNHH, UCSD, UCSF, UC Davis 
  • The scribes: Abridge, Ambience, Microsoft DAX Copilot

Here’s what they found. Clinicians who used AI scribes:

  • Saved 16 minutes of documentation time per eight hours of patient care 
  • Saved 13 minutes of EHR time 
  • Could see one additional patient every two weeks
  • Saw no significant impact on EHR timeoutside of working hours

Usage patterns helped color in the story. While 1,800 AI scribe adopters is one of the largest samples out there, the 6,770 control clinicians were also offered scribes and opted not to use them.

  • The biggest gains went to the biggest users. Clinicians who used the AI scribe for over 50% of visits experienced twice the reduction in EHR time and 3x the reduction in documentation time, yet only 32% of adopters fell into this bucket.

What’s counted? What matters? This isn’t the first study we’ve covered that scores AI scribes based on metrics that researchers can easily measure (EHR time, visits), which isn’t necessarily the same as the metrics that matter most to patients or clinicians.

  • Although this study solidifies that scribes can cut documentation time, the question now is if that time gets reinvested in ways that improve care and outcomes for patients.
  • The results also confirm that the mechanism of action for scribes reducing burnout isn’t through time savings, but it’s still unclear whether it’s from having a couple more moments to take a deep breath throughout the day or from reallocating the extra minutes to things that feel valuable.

The Takeaway

This study offers the most definitive real-world data yet that AI scribes have a modest impact on productivity metrics, but it also confirms that cleaner notes aren’t the only key to improving healthcare experiences.

Get the top digital health stories right in your inbox