Why AI Vendors Struggle to Compete With EHRs

Anyone who has ever tried selling AI into health systems will tell you that it’s tough to compete with EHRs, but a new article in JAMA makes the case that it’s actually gotten too tough – and it might be time for regulators to step in.

Most markets reward the best products. The healthcare industry has a funny way of preventing that from happening, and EHR vendor dominance is a textbook example.

  • EHRs hold advantages across infrastructure, workflow integration, procurement, and pricing that make it difficult for third-party tools to gain a foothold.
  • A 2025 Health Affairs study backed that up by showing that 79% of U.S. hospitals use AI models from their EHR vendor, compared to just 59% that use AI from third-party developers.
  • A Bain report drove the point home. Two-thirds of Epic customers said they’d pick a “good enough” Epic option over a better competing product.

These EHR advantages are a natural feature of the market. That said, it’s up to regulators to decide whether the status quo is serving patients and the overall healthcare system. The JAMA authors argue that it doesn’t, and offer three areas where targeted policy could level the playing field.

Infrastructure – Integrating AI tools into clinical workflows requires real-time data access and the ability to survive EHR upgrades intact, both of which are dramatically easier for EHR vendors – particularly as data fields get added or removed.

  • Potential Policy – Mandate broader API adoption so third parties can access EHR data on equal footing, and use existing EHR certification and interoperability frameworks to do it.

Workflow and Usability – The authors specifically flag EHR vendors’ edge in understanding the trade-offs of allocating limited screen real estate to new AI tools, something that’s harder for third parties to gauge from the outside looking in.

  • Potential Policy – Require EHR vendors to offer more robust developer sandboxes – similar to Apple’s iOS developer environment – so third parties can build and test without operating at a structural disadvantage.

Procurement and Pricing – Long-standing health system relationships give EHR vendors a streamlined path through procurement, as well as the leverage to “use pricing structures that incentivize adoption.”

  • Potential Policy – Although this is the hardest area for a policy fix, the authors suggest that improving transparency around AI performance could at least help health systems make more informed decisions regardless of where a tool comes from.

The Takeaway

EHRs are in a powerful position, and companies in powerful positions have a long track record of making life harder for their competition. Healthcare is too important of an industry to not have the best products rise to the top, and this article offers some sound strategies to make sure that stays possible.

Qualified Raises $125M to Build AI Infrastructure

In an era of isolated AI pilots, Qualified Health is building the infrastructure to connect the dots.

AI is the star of enterprise transformation. Health systems are looking to deploy and scale AI across their entire organization, and Qualified just raised $125M of Series B funding to make sure every new agent fits into a cohesive constellation.

The core platform has four distinct layers:

  • A data foundation that turns the EHR and external sources into an AI-ready bedrock.
  • A layer that lets hospitals build and deploy AI tools without always starting from scratch.
  • A layer that turns those tools into AI apps and agents deployed directly into workflows.
  • A layer that keeps governance, monitoring, and evaluation at the center of everything.

Qualified doesn’t leave AI to chance. It embeds forward-deployed product leaders alongside health system teams to identify high-priority needs, deploy solutions quickly, and iterate based on actual feedback in the trenches.

That has a couple of major benefits:

  • AI solutions are purpose-built for specific operational problems rather than mass market appeal.  
  • The tight feedback loop allows Qualified to iterate faster than it would be able to with a traditional implementation cycle, which shortens the timescale needed to improve its deployments and demonstrate a measurable impact.

The proof is in the pudding. At the University of Texas Medical Branch, Qualified reportedly generated a $15M measurable run-rate impact within the first six months.

  • That’s an eye-popping number to get on record, and it apparently stemmed from “a real willingness to dive deep” alongside UTMB clinical teams to deploy multiple assistants and automated workflows.
  • Qualified already supports systems representing about 7% of U.S. hospital revenue, and the next chapter is about deepening those partnerships and scaling responsibly.
  • Big ambition also means big competition, and Qualified will be up against everyone from Innovaccer to Epic if it wants to become healthcare’s AI platform of choice.

The Takeaway

Hospitals aren’t looking to AI for incremental improvement. They’re looking to AI to transform how they deliver care, and Qualified just landed another $125M to be the infrastructure that makes that possible.

How to Build Patient Trust in Medical AI

AI might move at the speed of trust, but new research in JAMA Network Open shows that trust only moves at the speed of accuracy.

The study had a solid setup. To determine the factors currently driving patient trust in AI, researchers presented 3,000 U.S. adults with a pair of hypothetical AI-assisted visits for a moderate-risk rash. 

  • Each visit had six randomized attributes, such as whether or not a doctor was present, how well the AI performs relative to human clinicians, and various AI governance mechanisms.

AI performance came out on top by a wide margin. Respondents cared more about how well the AI performs than FDA approval, governance, and even having a doctor in the room.

  • The biggest difference came from AI performing better than a specialist, which increased the likelihood of choosing that visit by 32.5%.
  • AI performing at the same level as a specialist boosted visit preference by 24.8%, slightly more than having AI that performs as well as a general practitioner (19.1%).
  • Having an actual doctor present surprisingly only swayed visit preference by 18.4%.

Governance factors also moved the needle. They just didn’t move it much.

  • FDA approval for the AI increased visit preference by a modest 11.1%.
  • Mayo Clinic AI certifications apparently carry just as much weight – also coming in at 11.1%.
  • Local hospital certifications for the AI only gave visits a 7.8% lift.

AI data quality was important. It just wasn’t as convincing as AI performance. 

  • AI that had nationally representative training data boosted visit preference by 11.9%, but it was interesting to see that disclosing bias in the training data had no effect versus not providing any data details.

The written explanations told the same story. Respondents cited AI performance and clinician involvement as the primary reasons for their choices, with many of them expressing comfort with AI as a tool – but not as a standalone decision-maker.

The Takeaway

Widespread AI adoption requires patient trust, and this study did a great job highlighting the specific areas that should be prioritized for building it.

Microsoft Dragon Copilot Gets AI Upgrades

Microsoft might have had the biggest presence at the biggest health IT conference, and it made sure all the lights in Las Vegas were on Dragon Copilot

Unify. Simplify. Scale. Microsoft’s theme at HIMSS was all about making Dragon Copilot a one-stop-shop for information within clinical workflows. It debuted several new capabilities at the show:

  • Integrated medical content from trusted sources
  • Partner-powered AI apps and agents
  • Proactive ICD‑10 specificity suggestions
  • Expanded role-based experiences for physicians, nurses, and radiologists

Partnering is quicker than building. Rather than developing every Dragon Copilot capability in-house, Microsoft has been leaning on outside partners to round out the platform.

  • Dragon Copilot’s clinical evidence feature is a prime example. It brings medical content and other relevant contextual information in-workflow, all curated through new partnerships with Wolters Kluwer, Elsevier, and other vetted sources.

Microsoft Marketplace fills the gaps. It allows users to add AI partner apps directly into their Dragon Copilot workflows. Picture a modular side panel with insights from folks like: 

  • Regard – surfaces comorbidities and relevant diagnoses 
  • Canary Speech – analyzes voice biomarkers for mental health conditions
  • Humata Health – automates prior authorization processes for clinicians 
  • Atropos – generates personalized real-world evidence 
  • Optum – identifies potential coverage issues and supports claims processing 

All roads lead to scribes. When Microsoft first acquired Nuance for $20M back in 2022, it was its second largest acquisition ever behind LinkedIn, and the core offerings were radiology report automation, dictation, and transcription (with humans still pulling a ton of weight).

  • The product formerly known as Dragon Ambient eXperience is now the backbone of Dragon Copilot, and it’s been adding features at a breakneck pace.
  • Microsoft is looking to make Dragon Copilot everything, everywhere, all at once, and so far new partnerships have been the key to making that happen.

The Takeaway

As every digital health company rushes to add scribing to their platform, the OG scribe is rushing to add everything else. Now it just needs to maintain a unified user eXperience.

Anterior Closes $40M to Take AI to the Largest Plans in the Country

The AI race between payors and providers is healthcare’s Kentucky Derby, and Anterior just closed $40M to help turn the dark horses into the frontrunners.

Anterior uses AI to ease the back-office burden on health plans. It started with a laser focus on prior authorizations, translating huge amounts of unstructured data into the information that’s actually needed to make quicker decisions.

  • When Anterior helps payors deploy AI in their clinical and operational workflows, it doesn’t just dump a bunch of models on them and disappear into the sunset.
  • It embeds its own clinicians and engineers alongside the platform to support its partners, optimize accuracy, and drive a measurable impact.

Trust is a differentiator. Payors are a cautious crowd, and they aren’t exactly known for trusting new friends with their critical workflows. 

  • Anterior’s clinicians are its secret sauce. They make up about 40% of the company, and many of them have even started contributing directly to the platform’s code base.
  • This hands-on support why partners build trust, and that hard-earned resource is what allowed Anterior to take the same tech underpinning its prior auth tools and expand it to other workflows.

New partners lead to new proof points. New proof points lead to new use cases. 

  • Anterior’s early successes – from both its people and technology – have allowed it to quickly land and expand into areas like payment integrity and risk adjustment. 
  • Since closing its $20M Series A in June 2024, Anterior has deployed its AI across major payors like Geisinger Health Plan, and worked alongside enterprise technology partners like HealthEdge to build out key strategic integrations.
  • The platform now supports orgs representing over 50M covered lives, and the fresh funds will help it use those case studies to pry open the door to the biggest national plans in the business.  

The Takeaway

Anterior’s earliest partners had to gamble on an unproven platform without any real-world evidence to back it up. Now, the proof is in the success stories, and Anterior just landed another $40M to go after the largest and most risk-averse payors in the country.

LLMs Still Struggle With Medical Misinformation 

The Lancet Digital Health just published one of the largest-ever stress tests on medical misinformation in LLMs, and it looks like most models still struggle to separate fact from fiction.

Here’s the setup. Researchers probed 20 LLMs with over 3M prompts containing medical information from three different sources: social media posts, simulated clinical vignettes, or real hospital discharge notes with a single fabricated recommendation inserted.

  • Each prompt was presented in multiple versions, once with neutral wording to establish a baseline, then with a series of variations that were emotionally charged or leading.
  • Ten logical fallacies were also used to test how framing influences model behavior, such as appeals to authority (a physician said…) or popularity (everyone agrees that…).

LLMs love fake news. The susceptibility was shockingly high across all models, with the medical misinformation accepted in 32% of the neutral base prompts.

  • That jumped to 46% when the misinformation was embedded in formal discharge notes, but at least the models were more skeptical of the social media content (9%).

Other findings were more counter-intuitive. Eight of the 10 logical fallacies ended up reducing the misinformation acceptance rate rather than increasing it like the authors expected.

  • Only appeals to authority (+2.9 percentage points above the base prompts) and slippery slope prompts (+2.2pp) increased susceptibility, a relatively small impact considering appeals to popularity slashed it by nearly 20pp.
  • Larger models were generally safer, although the language and phrasing had a far greater influence than the parameter count alone. 
  • It was also surprising to see that the medical models performed worse than the general purpose models, with many having weaker lie detectors despite the specialization.

Improving LLM safety is about more than making bigger models. It’s about knowing how information gets presented by actual humans, and having guardrails in place that hold up even when that information is wrong.

The Takeaway

Benchmark performance isn’t real-world performance, and this study provides another reminder that a model’s ability to separate fact from fiction is often more important than its test scores.

AI Spots Early Cognitive Decline in Clinical Notes

Early disease detection is entering the AI era, and a new study in npj Digital Medicine shows that autonomous agents can now flag cognitive decline using nothing but clinical notes.

Cognitive decline is difficult to detect. It remains significantly underdiagnosed in routine care, and traditional screening usually requires a dedicated clinician and tests that can take hours. 

  • At the same time, early detection is becoming increasingly important, especially with the recent approval of Alzheimer’s therapies that are most effective when administered early. 

Mass General Brigham might have an answer. Clinical notes contain whispers of cognitive decline that busy clinicians can’t always hear. MGB built a system that listens at scale.

  • These whispers include everything from linguistic shifts and sentence pauses to disorganized narratives and family member concerns. 
  • MGB developed an AI system that scans for these signals in routine clinical documentation, leveraging five specialized agents that critique each other and refine their reasoning.

It worked like a charm. The MGB researchers set their agents loose on over 3,300 clinical notes from 200 anonymized patients, then had human reviewers take their own look.

  • The agents detected cognitive impairment with 91% sensitivity, nearly matching expert-level accuracy – without any human intervention needed after deployment.
  • When the AI and human reviewers disagreed, an independent expert validated the AI’s reasoning 58% of the time – meaning the system was often making sound clinical judgments that initial human review had missed.

The cherry on top? The MGB team open-sourced Pythia alongside the study, enabling any provider org to deploy autonomous prompt optimization for their own AI screening applications.

The Takeaway

LLMs have opened the door to proactive screening at scale, and MGB just provided an excellent proof of concept using AI agents that turn everyday documentation into a chance to catch cognitive decline during the optimal treatment window.

ARISE Maps the State of Clinical AI

There have probably been hundreds of reports on the medical AI landscape, but there’s only been one State of Clinical AI from the rockstar team at ARISE.

The AI opus delivers the most complete review we’ve seen of a field that’s moving faster than its evaluation practices. It looked at the most influential clinical AI studies from 2025 to answer a trio of important questions:

  • Where does AI meaningfully improve care once it leaves research settings?
  • Where does performance break down?
  • Where do risks remain underexamined?

ARISE brought the heat. The Stanford-Harvard research network produced more highlights than we could count, but here’s a roundup of some of our favorites.

Impressive results in narrow evaluations. AI models have shown “superhuman performance” in research settings, but these results often depend on how narrowly the problem is framed. 

  • In one study, researchers modified standard medical multiple-choice questions so that the correct answer became “none of the other answers.” The clinical reasoning required to solve the question didn’t change. Model performance did. Accuracy dropped sharply across leading AI models, in some cases by over a third.

AI clearly helps prediction at scale. Although diagnostic reasoning was a mixed bag, several studies demonstrated that AI excels at identifying early warning signals from large datasets.

  • A hospital-based study found that a model trained on continuous wearable vital signs predicted patient deterioration up to 24 hours before standard alerts, identifying patients at risk for ICU transfer, cardiac arrest, or death while there was still time to intervene.

Most studies still don’t resemble the reality of healthcare. Clinical work has little to do with answering exam questions, and much to do with reviewing charts, coordinating care, and deciding when not to intervene.

  • A review of 500+ studies found that nearly half of them tested models using medical exam-style questions. Only 5% used real patient data, very few measured whether the models recognized uncertainty, and even fewer examined bias or fairness.

Now what? ARISE offered a few focus areas for 2026 that hit the center of the bullseye for building trust in the latest AI models.  

  • Evaluate models using real-world scenarios to drive evidence-based medicine.
  • Prioritize human-computer interaction design as much as primary outcomes.
  • Measure uncertainty, bias, and harm – especially when it comes to patient-facing AI.

The Takeaway

Healthcare AI has arrived, and ARISE made it clear that innovation won’t be driven by newer models alone. It will depend on whether health systems, researchers, and regulators are willing to apply the same evidence standards to AI that they expect out of any other clinical solution.

Foundation Models Can Compromise Patient Privacy

Foundation models trained on EHR data hold massive potential for clinical applications, but a new study out of MIT shows that they might have just as much potential to violate patient privacy.

Generalized knowledge makes better predictions. EHR foundation models normally draw on a collection of de-identified patient records to produce their outputs.

  • That’s not a problem on its own, but unintended “memorization” also allows these models to serve answers based on a single record from their training data. 

Therein lies the problem. To quantify the risk of these models revealing sensitive information, MIT researchers developed structured tests to determine how easily an attacker with partial knowledge of a patient – think lab results or demographic details – could extract further identifiable info through targeted prompts.

The tests measured memorization as a function of: 

  • the amount of information an attacker needs to reveal information
  • the risk associated with the revealed information

What did they find? After validating the tests using EHRMamba, an EHR foundation model with publicly available training data, the researchers reached a pair of conclusions that weren’t too surprising to see.

  • The more information attackers have on a patient, the greater their privacy risk.
  • Some patients, particularly those with rare conditions, are more susceptible.

Not all information is harmful. The researchers found that some details, such as a patient’s age or gender, present a relatively lower risk in the event of a data breach. 

  • This info wasn’t very helpful in targeted prompts that probed the model for memorized records, and it isn’t very damaging if the answers reveal it.
  • Other info, such as a rare disease diagnosis, was flagged as significantly more harmful. It posed a higher risk of getting the model to expose patient-specific details (especially in combination with other identifiers), and it can be especially sensitive if revealed through probing.

The Takeaway

EHR foundation models need some degree of memorization to solve complex tasks, but memorizing and revealing patient records is obviously out of the question. The tradeoff between performance and privacy is an ongoing challenge, but MIT just delivered a framework for evaluating some of the risks that can help strike the right balance.

OpenAI Jumps Into Healthcare Arena With ChatGPT Health

If OpenAI wasn’t already a major healthcare player, the launch of ChatGPT Health definitely just made it one.

It’s the gamechanger everyone saw coming. OpenAI even teed up the launch with a report showing that 40M people are already using ChatGPT for healthcare advice on a daily basis. 

ChatGPT Health is about to take that a massive step further. 

Here’s a look at the core features:

  • ChatGPT Health operates inside a dedicated health environment with additional privacy layers (conversations aren’t used for model training, optional two-factor authentication).
  • Users can securely upload their complete medical records (courtesy of b.well).
  • Users can connect apps to inform answers (Apple Health, Function, MyFitnessPal).
  • The model uses longitudinal health data, labs, and visit summaries to help spot trends.

OpenAI is moving beyond general health advice. The extra clinical context gives ChatGPT Health the ability to give better answers at scale, and that’s good news for patients.

A few of the most obvious benefits for patients include:

  • Empowering them to take a more active role in their care.
  • Helping them uncover trends in their overall health.
  • Reducing confusion around test results.
  • Reinforcing care plans between visits.
  • The list could go on for a while.

ChatGPT Health isn’t actually HIPAA compliant. Then again, it doesn’t need to be.

  • Consumer health apps like ChatGPT Health aren’t covered by HIPAA, and to OpenAI’s credit it appears to have done a great job with the necessary disclaimers.
  • The dedicated health environment was also developed with input from 260+ physicians, and it leverages a physician-authored framework for safety, clarity, and escalation.

The question now is, who’s accountable when things go wrong? Millions of patients are about to start showing up to visits armed with advice from ChatGPT Health, which means its AI fingerprints will be all over their questions, concerns, and even clinical decisions. The tech might be ready. The governance isn’t.

  • When ChatGPT Health mentions an unproven treatment and a patient follows through, or interprets a worrying lab value as benign, who carries the liability?
  • OpenAI? The physicians who authored the safety framework? The patient who followed the advice? It’s tough to say, but providers – and their patients – still need a clear answer.

The Takeaway

Everyone wants a doctor in their pocket, and ChatGPT Health just filled that role for millions of patients… even if OpenAI explicitly told them it wasn’t up for the job.

Get the top digital health stories right in your inbox