PHTI Delivers Mixed Reviews on Ambient Scribes

The Peterson Health Technology Institute’s latest technology review is here, and it had a decidedly mixed report card for the ambient AI scribes sweeping across the industry. 

PHTI’s total count of ambient scribe vendors stands at over 60, but the bulk of its report focuses on the early experiences and lessons learned from the top 10 scribes across leading health systems.

According to PHTI’s conversations with health system execs, the primary driver of ambient scribe adoption has been addressing clinician burnout – and AI’s promise is clear on that front.

  • Mass General Brigham reported a 40% reduction in burnout during a six-week pilot.
  • MultiCare reported a 63% reduction in burnout and a 64% improvement in work-life balance.
  • Another study from the Permanente Medical Group found that 81% of patients felt their physician spent less time looking at their computer when using an ambient scribe.

Despite these drastic improvements, PHTI concludes that the financial returns and efficiency of ambient scribes remain unclear.

  • On one hand, enhanced documentation quality “could lead to higher reimbursements, potentially offsetting expenses.”
  • On the other hand, the cumulative costs “may be greater than any savings achieved through improved efficiency, reduced administrative burden, or reduced clinician attrition.”

It’s a bold conclusion considering the cost of losing a single provider, let alone the downstream effects of having a burned out workforce. 

PHTI’s advice to health systems? Define the outcomes you’re looking for and then measure ambient AI’s performance and financial impacts against those goals. Bit of a no-brainer, but sound advice nonetheless. 

The Takeaway

Ambient scribes are seeing the fastest adoption of any recent healthcare technology that wasn’t accompanied by a regulatory mandate, and that’s mostly because of magic that’s hard to capture in a spreadsheet. That said, health systems will eventually need to justify these solutions beyond their impact on the clinical experience, and PHTI’s report brings a solid framework and standardized methodologies for bridging that gap.

AI Misses the Mark on Detecting Critical Conditions

Most health systems have already begun turning to AI to predict if patient health conditions will deteriorate, but a new study in Nature Communications Medicine suggests that current models aren’t cut out for the task. 

Virginia Tech researchers looked at several popular machine learning models cited in medical literature for predicting patient deterioration, then fed them datasets about the health of patients in ICUs or with cancer.

  • They then created test cases for the models to predict potential health issues and risk scores in the event that patient metrics were changed from the initial dataset.

AI missed the mark. For in-hospital mortality prediction, the models tested using the synthesized cases failed to recognize a staggering 66% of relevant patient injuries.

  • In some instances, the models failed to generate adequate mortality risk scores for every single test case.
  • That’s clearly not great news, especially considering that algorithms that can’t recognize critical patient conditions obviously can’t alert doctors when urgent action is needed.

The study authors point out that it’s extremely important for technology being used in patient care decisions to incorporate medical knowledge, and that “purely data-driven training alone is not sufficient.”

  • Not only did the study unearth “alarming deficiencies” in models being used for in-hospital mortality predictions, but it also turned up similar concerns with models predicting the prognosis of breast and lung cancer over five-year periods.
  • The authors conclude that a significant gap exists between raw data and the complexities of medical reality, so models trained solely on patient data are “grossly insufficient and have many dangerous blind spots.”

The Takeaway

The promise of AI remains just as immense as ever, but studies like this provide constant reminders that we need a diligent approach to adoption – not just for the technology itself but for the lives of the patients it touches. Ensuring that medical knowledge gets incorporated into clinical AI models also seems like a theme that we’re about to start hearing more often.

OpenEvidence Closes $75M in Series A Funding

OpenEvidence might be the new kid on the medical chatbot block, but it’s already “the fastest-growing platform for doctors in history,” and $75M of Series A funding just made it the youngest unicorn in healthcare.

Founder Daniel Nadler describes OpenEvidence as an AI copilot, with an experience that feels similar to ChatGPT yet is actually a “very different organism” due to the data it was trained on.

OpenEvidence functions as a specialized medical search engine that helps clinicians make decisions at the point of care, turning natural language queries into structured answers with detailed citations.

  • The model was purpose-built for healthcare by exclusively using training data from strategic partners like the New England Journal of Medicine – no internet forums or Reddit threads in sight.
  • The kicker? It’s available at no cost to verified physicians and generates its revenue through advertising. 

Happy users are their own growth strategy, and OpenEvidence claims that 25% of doctors in the U.S. have already used the product since its launch in 2023. It’s also adding 40k new doctors each month through word-of-mouth referrals and glowing reviews of its ability to:

  • Handle complex case-based prompts
  • Address clinical cases holistically
  • Provide really good references

The 1,000 pound gorilla in this space is Wolters Kluwer and its UpToDate clinical evidence engine. 

  • Although Wolters Kluwer has been inking partnerships with companies like Corti and Abridge to bring new AI capabilities to UpToDate, OpenEvidence is built from the ground up as an AI-first solution.
  • If WoltersKluwer is an encyclopedia, OpenEvidence is ChatGPT, and it’ll be interesting to watch the plays that both sides make as they battle for market share.

The Takeaway

OpenEvidence isn’t a solution in search of a problem, it’s a sleek new tool addressing an immediate need for plenty of providers. It’s rare to see the type of viral adoption that OpenEvidence managed to generate, which is a good reminder that many areas of healthcare change slowly… then all at once.

AI Enthusiasm Heats Up With Doctors

The unstoppable march of AI only seems to be gaining momentum, with an American Medical Association survey noting greater enthusiasm – and less apprehension – among physicians. 

The AMA’s Augmented Intelligence Research survey of 1,183 physicians found that those whose enthusiasm outweighs their concerns with health AI rose to 35% in 2024, up from 30% in 2023. 

  • The lion’s share of doctors recognize AI’s benefits, with 68% reporting at least some advantage in patient care (up from 63% in 2023).
  • In both years, about 40% of doctors were equally excited and concerned about health AI, with almost no change between surveys.

The positive sentiment could be stemming from more physicians using the tech in practice. AI use cases nearly doubled from 38% in 2023 to 66% in 2024.

  • The most common uses now include medical research, clinical documentation, and drafting care plans or discharge summaries.

The dramatic drop in non-users (62% to 33%) over the course of a year is impressive for any new health tech, but doctors in the latest survey called out several needs that have to be addressed for adoption to continue.

  • 88% wanted a designated feedback channel
  • 87% wanted data privacy assurances
  • 84% wanted EHR integration

While physicians are still concerned about the potential of AI to harm data privacy or offer incorrect recommendations (and liability risks), they’re also optimistic about its ability to put a dent in burnout.

  • The biggest area of opportunity for AI according to 57% of physicians was “addressing administrative burden through automation,” reclaiming the top spot it reached in 2023.
  • That said, nearly half of physicians (47%) ranked increased AI oversight as the number one regulatory action needed to increase trust in AI enough to drive further adoption.

The Takeaway

It’s encouraging to see the shifting sentiment around health AI, especially as more doctors embrace its potential to cut down on burnout. Although the survey pinpoints better oversight as the key to maximizing trust, AI innovation is moving so quickly that it wouldn’t be surprising if not-too-distant breakthroughs were magical enough to inspire more confidence on their own.

House Task Force AI Policy Recommendations

The House Bipartisan Task Force on Artificial Intelligence closed out the year with a bang, launching 273-pages of AI policy fireworks.

The report includes recommendations to “advance America’s leadership in AI innovation” across multiple industries, and the healthcare section definitely packed a punch.

The task force started by highlighting AI’s potential across a long list of use cases, which could have been the tracklist for healthcare’s greatest hits of 2024:

  • Drug Development – 300+ drug applications contained AI components this year.
  • Ambient AI – Burnout is bad. Patient time is good.
  • Diagnostics – AI can help cut down on $100B in annual costs tied to diagnostic errors.
  • Population Health – Population-level data can feed models to improve various programs.

While many expect the Trump administration’s “AI Czar” David Sacks to take a less-is-more approach to AI regulation, the task force urged Congress to consider guardrails in key areas:

  • Data Availability, Utility, and Quality
  • Privacy and Cybersecurity
  • Interoperability
  • Transparency
  • Liability

Several recommendations were offered to ensure these guardrails are effective, although the task force didn’t go as far as to prescribe specific regulations. 

  • The report suggested that Congress establish clear liability standards given that they can affect clinical-decision making (the risk of penalties may change whether a provider relies on their judgment or defers to an algorithm).
  • Another common theme was to maintain robust support for healthcare research related to AI, which included more NIH funding since it’s “critical to maintaining U.S. leadership.” 

The capstone recommendation – which was naturally well-received by the industry – was to support appropriate AI payment mechanisms without stifling innovation.

  • CMS calculates reimbursements by accounting for physician time, acuity of care, and practice expenses, yet fails to adequately reimburse AI for impacting those metrics.
  • The task force said there won’t be a “one size fits all” policy, so appropriate payment mechanisms should recognize AI’s impact across multiple technologies and settings (Ex. many AI use cases may fit into existing benefit categories or facility fees).

The Takeaway

AI arrived faster than policy makers could keep up, and it’ll be up to the incoming White House to get AI past its Wild West regulatory era without hobbling the pioneers driving the progress. One way or another, that’s a sign that AI is starting a new chapter, and we’re excited to see where the story goes in 2025.

Patients Ready For GenAI, But Not For Everything

Bain & Company’s US Frontline of Consumer Healthcare Survey turned up the surprising result that patients are more comfortable with generative AI “analyzing their radiology scan and making a diagnosis than answering the phone at their doctor’s office.”

That’s quite the headline, but the authors were quick to point out that it’s probably less of a measure of confidence in GenAI’s medical expertise than a sign that patients aren’t yet comfortable interacting with the technology directly.

Here’s the breakdown of patient comfort with different GenAI use cases:

While it does appear that patients are more prepared to have GenAI supporting their doctor than engaging with it themselves, it’s just as notable that less than half reported feeling comfortable with even a single GenAI application in healthcare.

  • No “comfortable” response was above 37%, and after adding in the “neutral” votes, there was still only one application that broke 50%: note taking during appointments.
  • The fact that only 19% felt comfortable with GenAI answering calls for providers or payors could also just be a sign that patients would far rather talk to a human in either situation, regardless of the tech’s capabilities.

The next chart looks at GenAI perceptions among healthcare workers: 

Physicians and administrators are feeling a similar mix of excitement and apprehension, sharing a generally positive view of GenAI’s potential to alleviate admin burdens and clinician workloads, as well as a concern that it could undermine the patient-provider relationship.

  • Worries over new technology threatening the relationship of patients and providers aren’t new, and we just witnessed them play out at an accelerated pace with telehealth.
  • Despite initial fears, the value of the relationship prevailed, which Bain backed up with the fact that 61% of patients who use telehealth only do so with their own provider.

Whether you’re measuring by patient or provider comfort, GenAI’s progress will be closely tied to trust in the technology on an application-by-application basis. Trust takes time to build and first impressions are key, so this survey underscores the importance of nailing the user experience early on.

The Takeaway
The story of generative AI in healthcare is just getting started, and as we saw with telehealth, the first few pages could take some serious willpower to get through. New technologies mean new workflows, revenue models, and countless other barriers to overcome, but trust will only keep building every step of the way. Plus, the next chapter looks pretty dang good.

Hidden Flaws Behind High Accuracy of Clinical AI

AI is getting pretty darn good at patient diagnosis challenges… but don’t bother asking it to show its work.

A new study in npj Digital Medicine pitted GPT-4V against human physicians on 207 image challenges designed to test the reader’s ability to diagnose a patient based on a series of pictures and some basic clinical background info.

  • Researchers at the NIH and Weill Cornell Medicine then asked GPT-4V to provide step-by-step reasoning for how it chose the answer.
  • Nine physicians then tackled the same questions in both a closed-book (no outside help) and open-book format (could use outside materials and online resources).

How’d they stack up?

  • GPT-4V and the physicians both scored high marks for accurate diagnoses (81.6% vs. 77.8%), with a statistically insignificant difference in performance. 
  • GPT-4V bested the physicians on the closed-book test, selecting more correct diagnoses.
  • Physicians bounced back to beat GPT-4V on the open-book test, particularly on the most difficult questions.
  • GPT-4V also performed well in cases where physicians answered incorrectly, maintaining over 78% accuracy.

Good job AI, but there’s a catch. The rationales that GPT-4V provided were riddled with mistakes – even if the final answer was correct – with error rates as high as 27% for image comprehension.

The Takeaway

There could easily come a day when clinical AI surpasses human physicians on the diagnosis front, but that day isn’t here quite yet. Real care delivery also doesn’t bless physicians with a set of multiple choice options, and hallucinating the rationale behind diagnoses doesn’t cut it with actual patients.

GenAI Still Working Toward Prime Time With Patients

When it rains it pours for AI research, and a trio of studies published just last week suggest that many new generative AI tools might not be ready for prime time with patients.

The research that grabbed the most headlines came out of UCSD, finding that GenAI-drafted replies to patient messages led to more compassionate responses, but didn’t cut down on overall messaging time.

  • Although GenAI reduced the time physicians spent writing replies by 6%, that was more than offset by a 22% increase in read time, while also increasing average reply lengths by 18%.
  • Some of the physicians were also put off by the “overly nice” tone of the GenAI message drafts, and recommended that future research look into “how much empathy is too much empathy” from the patient perspective.

Another study in Lancet Digital Health showed that GPT-4 can effectively generate replies to health questions from cancer patients… as well as replies that might kill them.

  • Mass General Brigham researchers had six radiation oncologists review GPT-4’s responses to simulated questions from cancer patients for 100 scenarios, finding that 58% of its replies were acceptable to send to patients without any editing, 7% could lead to severe harm, and one was potentially lethal.
  • The verdict? Generative AI has the potential to reduce workloads, but it’s still essential to “keep doctors in the loop.”

A team at Mount Sinai took a different path to a similar conclusion, finding that four popular GenAI models have a long way to go until they’re better than humans at matching medical issues to the correct diagnostic codes.

  • After having GPT-3.5, GPT-4, Gemini Pro, and Llama2-70b analyze and code 27,000 unique diagnoses, GPT-4 came out on top in terms of exact matches, achieving an uninspiring accuracy of 49.8%.

The Takeaway

While it isn’t exactly earth-shattering news that GenAI still has room to improve, the underlying theme with each of these studies is more that its impact is far from black and white. GenAI is rarely completely right or completely wrong, and although there’s no doubt we’ll get to the point where it’s working its magic without as many tradeoffs, this research confirms that we’re definitely not there yet.

Scaling Adoption of Medical AI

Medical AI is on the brink of improving outcomes for countless patients, prompting a trio of all-star researchers to pen an NEJM AI article tackling what might be its biggest obstacle: real-world adoption.

What drives real-world adoption? Those who have been around the block as many times as Dr. Michael Abramoff, Dr. Tinglong Dai, and Dr. James Zou are all-too familiar with the answer… Reimbursement makes the world go ‘round.

To help medical AI developers get their tools in front of the patients who need them, the authors explore the pros and cons of current paths to reimbursement, while offering novel frameworks that could lead to better financial sustainability.

Traditional Fee-for-Service treats medical AI similarly to how new drugs or medical devices are reimbursed, and is a viable path for AI that can clear the hurdle of demonstrating improvements to clinical outcomes, health equity, clinician productivity, and cost-effectiveness (e.g. AI for diabetic eye exams).

  • Meeting these criteria is a prerequisite for adopting AI in healthcare, yet even among the 692 FDA-authorized AI systems, few have been able to pass the test. The approach carries substantial risk in terms of time and resources for AI developers.
  • Despite those limitations, FFS might be appropriate for AI because health systems are adept at assessing the financial impact of new technologies under it, and reimbursement through a CPT code provides hard-to-match financial sustainability.

Value-based care frameworks provide reimbursement on the basis of patient- or population-related metrics (MIPS, HEDIS, full capitation), and obtaining authorization for medical AI to “count” toward closing care gaps for MIPS and HEDIS has been shown to be considerably more straightforward than attaining a CPT code.

  • That said, if a given measure is not met (e.g. 80% of the population must receive an annual diabetic eye examination), the financial benefit of closing even three quarters of that care gap is typically zero, potentially disincentivizing AI adoption.

Given the limitations of existing pathways, the authors offer a potential new approach that’s derived from the Medicare Part B model, which reimburses drugs administered in an outpatient setting based on a “cost plus” markup.

  • Here, providers could acquire the rights to use AI, then get reimbursed based on the average cost of the service plus a specified margin, contingent upon CMS coverage of a particular CPT code.
  • This model essentially splits revenue between AI creators and users, and would alleviate some of the tensions of both FFS and VBC models.

The Takeaway

Without sustainable reimbursement, widespread medical AI adoption won’t be possible. Although the quest continues for a silver bullet (even the authors’ revenue-sharing model still carries the risk of overutilization and requires the creation of new CPT codes), exploring novel approaches is essential given the challenges of achieving reimbursement through existing FFS and VBC pathways.

How Health Systems Are Approaching AI

The New England Journal of Medicine’s just-released NEJM AI publication is off to the races, with its February issue including a stellar breakdown of how academic medical centers are managing the influx of predictive models and AI tools.

Researchers identified three governance phenotypes for managing the AI deluge:

  • Well-Defined Governance – health systems have explicit, comprehensive procedures for the evaluation of AI and predictive models.
  • Emerging Governance – systems are in the process of adapting previously established approaches for things like EHRs to govern AI.
  • Interpersonal Governance – a small team or single person is tasked with making decisions about model implementation without consistent evaluation requirements. 

Regardless of the phenotype, interviews with AI leadership at 13 academic medical centers revealed that chaotic implementations are hard to avoid, partly due to external factors like vague regulatory standards.

  • Most AI decision makers were aware of how the FDA regulates software, but believed those rules were “broad and loose,” and many thought they only applied to EHRs and third party vendors rather than health systems.

AI governance teams report better adherence to new solutions that prioritize limiting clicks for providers when they’re implemented. Effective governance of prediction models requires a broader approach, yet streamlining workflows is still a primary consideration for most implementations. That’s leading to trouble down the road considering predictive models’ impact on patient care, health equity, and quality care.

The Takeaway

Even well-equipped academic medical centers are struggling to effectively identify and mitigate the countless potential pitfalls that come along with predictive AI implementation. Existing AI governance structures within healthcare orgs all seem to be in need of additional guidance, and more guardrails from both the industry and regulators might help turn AI ambitions into AI-improved outcomes.

Get the top digital health stories right in your inbox

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Digital Health Wire team

You're all set!