The Healthcare AI Adoption Index

Bessemer Venture Partners’ market reports are always some of the best in the business, but its recent Healthcare AI Adoption Index might just be its finest work yet.

The Healthcare AI Adoption Index is based on survey data from 400+ execs across Payors, Providers, and Pharma – breaking down how buyers are approaching GenAI applications, what jobs-to-be-done they’re prioritizing, and where their projects sit on the adoption curve.

Here’s a look at what they found:

  • AI is high on the agenda across the board, with AI budgets outpacing IT spend in each of the three segments. Over half (54%) are seeing ROI within the first 12 months.
  • Only a third of AI pilots end up reaching production, held back by everything from security and data readiness to integration costs and limited in-house expertise.
  • Despite all the trendsetters we cover on a weekly basis, only 15% of active AI projects are being driven by startups. The rest are being built internally or led by the usual suspects like major EHRs and Big Tech.
  • That said, 48% of executives say they prefer working with startups over incumbents, and Bessemer encourages founders to co-develop solutions with their customers and lean in on partnerships that provide access to distribution, proprietary datasets, and credibility.

The highlight of the report was Bessemer’s analysis of the 59 jobs-to-be-done as potential use cases for AI. 

  • Of the 22 jobs-to-be-done for Payors (claims, network, member, pricing), 19 jobs for Pharma (preclinical, clinical, marketing, sales), and 18 jobs for Providers (care delivery, RCM) – 45% are still in the ideation or proof of concept phase.
  • Providers are ahead in POC experimentation, while most Payor and Pharma use cases remain in the ideation phase. Here’s a beautiful look at where different use cases stand.

Bessemer topped off its analysis with the debut of its AI Dx Index, which factors in market size, urgency, and current adoption to help startups map and prioritize AI use cases. One of the best graphics so far this year.

The Takeaway

Healthcare’s AI-powered paradigm shift is kicking into overdrive, and Bessemer just delivered one of the most comprehensive views of where the puck is going that we’ve seen to date.

K Health’s AI Clinical Recommendations Rival Doctors in Real-World Setting

Real-world comparisons of AI recommendations and doctors’ clinical decisions have been few and far between, but a new study in the Annals of Internal Medicine gave us a great look at how performance stacks up with actual patients.

The early verdict? AI came out on top, but that doesn’t mean doctors should pack their bags quite yet.

Researchers from Cedars-Sinai and Tel Aviv University compared recommendations made by K Health’s AI Physician Mode to the final decisions made by physicians for 461 virtual urgent care visits. Here’s what they found:

  • In 68% of cases, the AI and physician recommendations were rated as equal
  • AI rated better on 21% of cases, versus just 11% for physicians 
  • AI recommendations were rated “optimal” in 77% of cases, versus 67% for physicians

Although AI takes the cake with the top line numbers, unpacking the data reveals some not-too-surprising strengths and weaknesses. AI was primarily rated better when physicians:

  • Missed important lab tests (22.8%)
  • Didn’t follow clinical guidelines (16.3%)
  • Failed to refer patients to specialists or the ED if needed (15.2%)
  • Overlooked risk factors and red flags (4.4%)

Physicians beat out AI when the human elements of care delivery came into play, such as adapting to new information or making nuanced decisions. Physicians were rated better when:

  • AI made unnecessary ED referrals (8.0%)
  • There was evolving or inconsistent information during consultations (6.2%)
  • They made necessary referrals that the AI missed (5.9%)
  • They correctly adjusted diagnoses based on visual examinations (4.4%)

While the study focused on the exact types of common conditions that AI excels at diagnosing (respiratory, urinary, vaginal, eye, and dental), it’s still impressive to see the outperformance in the messy trenches of a real clinical setting – a far cry from the static medical exams that have been the go-to for similar evaluations. 

The Takeaway

For AI to truly transform healthcare, it’ll need to do a lot more than automate administrative work and back office operations. This study demonstrates AI’s potential to enhance decision-making in actual medical practice, and points toward a future where delivering high-quality patient care becomes genuinely scalable.

PHTI Delivers Mixed Reviews on Ambient Scribes

The Peterson Health Technology Institute’s latest technology review is here, and it had a decidedly mixed report card for the ambient AI scribes sweeping across the industry. 

PHTI’s total count of ambient scribe vendors stands at over 60, but the bulk of its report focuses on the early experiences and lessons learned from the top 10 scribes across leading health systems.

According to PHTI’s conversations with health system execs, the primary driver of ambient scribe adoption has been addressing clinician burnout – and AI’s promise is clear on that front.

  • Mass General Brigham reported a 40% reduction in burnout during a six-week pilot.
  • MultiCare reported a 63% reduction in burnout and a 64% improvement in work-life balance.
  • Another study from the Permanente Medical Group found that 81% of patients felt their physician spent less time looking at their computer when using an ambient scribe.

Despite these drastic improvements, PHTI concludes that the financial returns and efficiency of ambient scribes remain unclear.

  • On one hand, enhanced documentation quality “could lead to higher reimbursements, potentially offsetting expenses.”
  • On the other hand, the cumulative costs “may be greater than any savings achieved through improved efficiency, reduced administrative burden, or reduced clinician attrition.”

It’s a bold conclusion considering the cost of losing a single provider, let alone the downstream effects of having a burned out workforce. 

PHTI’s advice to health systems? Define the outcomes you’re looking for and then measure ambient AI’s performance and financial impacts against those goals. Bit of a no-brainer, but sound advice nonetheless. 

The Takeaway

Ambient scribes are seeing the fastest adoption of any recent healthcare technology that wasn’t accompanied by a regulatory mandate, and that’s mostly because of magic that’s hard to capture in a spreadsheet. That said, health systems will eventually need to justify these solutions beyond their impact on the clinical experience, and PHTI’s report brings a solid framework and standardized methodologies for bridging that gap.

AI Misses the Mark on Detecting Critical Conditions

Most health systems have already begun turning to AI to predict if patient health conditions will deteriorate, but a new study in Nature Communications Medicine suggests that current models aren’t cut out for the task. 

Virginia Tech researchers looked at several popular machine learning models cited in medical literature for predicting patient deterioration, then fed them datasets about the health of patients in ICUs or with cancer.

  • They then created test cases for the models to predict potential health issues and risk scores in the event that patient metrics were changed from the initial dataset.

AI missed the mark. For in-hospital mortality prediction, the models tested using the synthesized cases failed to recognize a staggering 66% of relevant patient injuries.

  • In some instances, the models failed to generate adequate mortality risk scores for every single test case.
  • That’s clearly not great news, especially considering that algorithms that can’t recognize critical patient conditions obviously can’t alert doctors when urgent action is needed.

The study authors point out that it’s extremely important for technology being used in patient care decisions to incorporate medical knowledge, and that “purely data-driven training alone is not sufficient.”

  • Not only did the study unearth “alarming deficiencies” in models being used for in-hospital mortality predictions, but it also turned up similar concerns with models predicting the prognosis of breast and lung cancer over five-year periods.
  • The authors conclude that a significant gap exists between raw data and the complexities of medical reality, so models trained solely on patient data are “grossly insufficient and have many dangerous blind spots.”

The Takeaway

The promise of AI remains just as immense as ever, but studies like this provide constant reminders that we need a diligent approach to adoption – not just for the technology itself but for the lives of the patients it touches. Ensuring that medical knowledge gets incorporated into clinical AI models also seems like a theme that we’re about to start hearing more often.

OpenEvidence Closes $75M in Series A Funding

OpenEvidence might be the new kid on the medical chatbot block, but it’s already “the fastest-growing platform for doctors in history,” and $75M of Series A funding just made it the youngest unicorn in healthcare.

Founder Daniel Nadler describes OpenEvidence as an AI copilot, with an experience that feels similar to ChatGPT yet is actually a “very different organism” due to the data it was trained on.

OpenEvidence functions as a specialized medical search engine that helps clinicians make decisions at the point of care, turning natural language queries into structured answers with detailed citations.

  • The model was purpose-built for healthcare by exclusively using training data from strategic partners like the New England Journal of Medicine – no internet forums or Reddit threads in sight.
  • The kicker? It’s available at no cost to verified physicians and generates its revenue through advertising. 

Happy users are their own growth strategy, and OpenEvidence claims that 25% of doctors in the U.S. have already used the product since its launch in 2023. It’s also adding 40k new doctors each month through word-of-mouth referrals and glowing reviews of its ability to:

  • Handle complex case-based prompts
  • Address clinical cases holistically
  • Provide really good references

The 1,000 pound gorilla in this space is Wolters Kluwer and its UpToDate clinical evidence engine. 

  • Although Wolters Kluwer has been inking partnerships with companies like Corti and Abridge to bring new AI capabilities to UpToDate, OpenEvidence is built from the ground up as an AI-first solution.
  • If WoltersKluwer is an encyclopedia, OpenEvidence is ChatGPT, and it’ll be interesting to watch the plays that both sides make as they battle for market share.

The Takeaway

OpenEvidence isn’t a solution in search of a problem, it’s a sleek new tool addressing an immediate need for plenty of providers. It’s rare to see the type of viral adoption that OpenEvidence managed to generate, which is a good reminder that many areas of healthcare change slowly… then all at once.

AI Enthusiasm Heats Up With Doctors

The unstoppable march of AI only seems to be gaining momentum, with an American Medical Association survey noting greater enthusiasm – and less apprehension – among physicians. 

The AMA’s Augmented Intelligence Research survey of 1,183 physicians found that those whose enthusiasm outweighs their concerns with health AI rose to 35% in 2024, up from 30% in 2023. 

  • The lion’s share of doctors recognize AI’s benefits, with 68% reporting at least some advantage in patient care (up from 63% in 2023).
  • In both years, about 40% of doctors were equally excited and concerned about health AI, with almost no change between surveys.

The positive sentiment could be stemming from more physicians using the tech in practice. AI use cases nearly doubled from 38% in 2023 to 66% in 2024.

  • The most common uses now include medical research, clinical documentation, and drafting care plans or discharge summaries.

The dramatic drop in non-users (62% to 33%) over the course of a year is impressive for any new health tech, but doctors in the latest survey called out several needs that have to be addressed for adoption to continue.

  • 88% wanted a designated feedback channel
  • 87% wanted data privacy assurances
  • 84% wanted EHR integration

While physicians are still concerned about the potential of AI to harm data privacy or offer incorrect recommendations (and liability risks), they’re also optimistic about its ability to put a dent in burnout.

  • The biggest area of opportunity for AI according to 57% of physicians was “addressing administrative burden through automation,” reclaiming the top spot it reached in 2023.
  • That said, nearly half of physicians (47%) ranked increased AI oversight as the number one regulatory action needed to increase trust in AI enough to drive further adoption.

The Takeaway

It’s encouraging to see the shifting sentiment around health AI, especially as more doctors embrace its potential to cut down on burnout. Although the survey pinpoints better oversight as the key to maximizing trust, AI innovation is moving so quickly that it wouldn’t be surprising if not-too-distant breakthroughs were magical enough to inspire more confidence on their own.

House Task Force AI Policy Recommendations

The House Bipartisan Task Force on Artificial Intelligence closed out the year with a bang, launching 273-pages of AI policy fireworks.

The report includes recommendations to “advance America’s leadership in AI innovation” across multiple industries, and the healthcare section definitely packed a punch.

The task force started by highlighting AI’s potential across a long list of use cases, which could have been the tracklist for healthcare’s greatest hits of 2024:

  • Drug Development – 300+ drug applications contained AI components this year.
  • Ambient AI – Burnout is bad. Patient time is good.
  • Diagnostics – AI can help cut down on $100B in annual costs tied to diagnostic errors.
  • Population Health – Population-level data can feed models to improve various programs.

While many expect the Trump administration’s “AI Czar” David Sacks to take a less-is-more approach to AI regulation, the task force urged Congress to consider guardrails in key areas:

  • Data Availability, Utility, and Quality
  • Privacy and Cybersecurity
  • Interoperability
  • Transparency
  • Liability

Several recommendations were offered to ensure these guardrails are effective, although the task force didn’t go as far as to prescribe specific regulations. 

  • The report suggested that Congress establish clear liability standards given that they can affect clinical-decision making (the risk of penalties may change whether a provider relies on their judgment or defers to an algorithm).
  • Another common theme was to maintain robust support for healthcare research related to AI, which included more NIH funding since it’s “critical to maintaining U.S. leadership.” 

The capstone recommendation – which was naturally well-received by the industry – was to support appropriate AI payment mechanisms without stifling innovation.

  • CMS calculates reimbursements by accounting for physician time, acuity of care, and practice expenses, yet fails to adequately reimburse AI for impacting those metrics.
  • The task force said there won’t be a “one size fits all” policy, so appropriate payment mechanisms should recognize AI’s impact across multiple technologies and settings (Ex. many AI use cases may fit into existing benefit categories or facility fees).

The Takeaway

AI arrived faster than policy makers could keep up, and it’ll be up to the incoming White House to get AI past its Wild West regulatory era without hobbling the pioneers driving the progress. One way or another, that’s a sign that AI is starting a new chapter, and we’re excited to see where the story goes in 2025.

Patients Ready For GenAI, But Not For Everything

Bain & Company’s US Frontline of Consumer Healthcare Survey turned up the surprising result that patients are more comfortable with generative AI “analyzing their radiology scan and making a diagnosis than answering the phone at their doctor’s office.”

That’s quite the headline, but the authors were quick to point out that it’s probably less of a measure of confidence in GenAI’s medical expertise than a sign that patients aren’t yet comfortable interacting with the technology directly.

Here’s the breakdown of patient comfort with different GenAI use cases:

While it does appear that patients are more prepared to have GenAI supporting their doctor than engaging with it themselves, it’s just as notable that less than half reported feeling comfortable with even a single GenAI application in healthcare.

  • No “comfortable” response was above 37%, and after adding in the “neutral” votes, there was still only one application that broke 50%: note taking during appointments.
  • The fact that only 19% felt comfortable with GenAI answering calls for providers or payors could also just be a sign that patients would far rather talk to a human in either situation, regardless of the tech’s capabilities.

The next chart looks at GenAI perceptions among healthcare workers: 

Physicians and administrators are feeling a similar mix of excitement and apprehension, sharing a generally positive view of GenAI’s potential to alleviate admin burdens and clinician workloads, as well as a concern that it could undermine the patient-provider relationship.

  • Worries over new technology threatening the relationship of patients and providers aren’t new, and we just witnessed them play out at an accelerated pace with telehealth.
  • Despite initial fears, the value of the relationship prevailed, which Bain backed up with the fact that 61% of patients who use telehealth only do so with their own provider.

Whether you’re measuring by patient or provider comfort, GenAI’s progress will be closely tied to trust in the technology on an application-by-application basis. Trust takes time to build and first impressions are key, so this survey underscores the importance of nailing the user experience early on.

The Takeaway
The story of generative AI in healthcare is just getting started, and as we saw with telehealth, the first few pages could take some serious willpower to get through. New technologies mean new workflows, revenue models, and countless other barriers to overcome, but trust will only keep building every step of the way. Plus, the next chapter looks pretty dang good.

Hidden Flaws Behind High Accuracy of Clinical AI

AI is getting pretty darn good at patient diagnosis challenges… but don’t bother asking it to show its work.

A new study in npj Digital Medicine pitted GPT-4V against human physicians on 207 image challenges designed to test the reader’s ability to diagnose a patient based on a series of pictures and some basic clinical background info.

  • Researchers at the NIH and Weill Cornell Medicine then asked GPT-4V to provide step-by-step reasoning for how it chose the answer.
  • Nine physicians then tackled the same questions in both a closed-book (no outside help) and open-book format (could use outside materials and online resources).

How’d they stack up?

  • GPT-4V and the physicians both scored high marks for accurate diagnoses (81.6% vs. 77.8%), with a statistically insignificant difference in performance. 
  • GPT-4V bested the physicians on the closed-book test, selecting more correct diagnoses.
  • Physicians bounced back to beat GPT-4V on the open-book test, particularly on the most difficult questions.
  • GPT-4V also performed well in cases where physicians answered incorrectly, maintaining over 78% accuracy.

Good job AI, but there’s a catch. The rationales that GPT-4V provided were riddled with mistakes – even if the final answer was correct – with error rates as high as 27% for image comprehension.

The Takeaway

There could easily come a day when clinical AI surpasses human physicians on the diagnosis front, but that day isn’t here quite yet. Real care delivery also doesn’t bless physicians with a set of multiple choice options, and hallucinating the rationale behind diagnoses doesn’t cut it with actual patients.

GenAI Still Working Toward Prime Time With Patients

When it rains it pours for AI research, and a trio of studies published just last week suggest that many new generative AI tools might not be ready for prime time with patients.

The research that grabbed the most headlines came out of UCSD, finding that GenAI-drafted replies to patient messages led to more compassionate responses, but didn’t cut down on overall messaging time.

  • Although GenAI reduced the time physicians spent writing replies by 6%, that was more than offset by a 22% increase in read time, while also increasing average reply lengths by 18%.
  • Some of the physicians were also put off by the “overly nice” tone of the GenAI message drafts, and recommended that future research look into “how much empathy is too much empathy” from the patient perspective.

Another study in Lancet Digital Health showed that GPT-4 can effectively generate replies to health questions from cancer patients… as well as replies that might kill them.

  • Mass General Brigham researchers had six radiation oncologists review GPT-4’s responses to simulated questions from cancer patients for 100 scenarios, finding that 58% of its replies were acceptable to send to patients without any editing, 7% could lead to severe harm, and one was potentially lethal.
  • The verdict? Generative AI has the potential to reduce workloads, but it’s still essential to “keep doctors in the loop.”

A team at Mount Sinai took a different path to a similar conclusion, finding that four popular GenAI models have a long way to go until they’re better than humans at matching medical issues to the correct diagnostic codes.

  • After having GPT-3.5, GPT-4, Gemini Pro, and Llama2-70b analyze and code 27,000 unique diagnoses, GPT-4 came out on top in terms of exact matches, achieving an uninspiring accuracy of 49.8%.

The Takeaway

While it isn’t exactly earth-shattering news that GenAI still has room to improve, the underlying theme with each of these studies is more that its impact is far from black and white. GenAI is rarely completely right or completely wrong, and although there’s no doubt we’ll get to the point where it’s working its magic without as many tradeoffs, this research confirms that we’re definitely not there yet.

Get the top digital health stories right in your inbox

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Digital Health Wire team

You're all set!