OpenEvidence Closes $75M in Series A Funding

OpenEvidence might be the new kid on the medical chatbot block, but it’s already “the fastest-growing platform for doctors in history,” and $75M of Series A funding just made it the youngest unicorn in healthcare.

Founder Daniel Nadler describes OpenEvidence as an AI copilot, with an experience that feels similar to ChatGPT yet is actually a “very different organism” due to the data it was trained on.

OpenEvidence functions as a specialized medical search engine that helps clinicians make decisions at the point of care, turning natural language queries into structured answers with detailed citations.

  • The model was purpose-built for healthcare by exclusively using training data from strategic partners like the New England Journal of Medicine – no internet forums or Reddit threads in sight.
  • The kicker? It’s available at no cost to verified physicians and generates its revenue through advertising. 

Happy users are their own growth strategy, and OpenEvidence claims that 25% of doctors in the U.S. have already used the product since its launch in 2023. It’s also adding 40k new doctors each month through word-of-mouth referrals and glowing reviews of its ability to:

  • Handle complex case-based prompts
  • Address clinical cases holistically
  • Provide really good references

The 1,000 pound gorilla in this space is Wolters Kluwer and its UpToDate clinical evidence engine. 

  • Although Wolters Kluwer has been inking partnerships with companies like Corti and Abridge to bring new AI capabilities to UpToDate, OpenEvidence is built from the ground up as an AI-first solution.
  • If WoltersKluwer is an encyclopedia, OpenEvidence is ChatGPT, and it’ll be interesting to watch the plays that both sides make as they battle for market share.

The Takeaway

OpenEvidence isn’t a solution in search of a problem, it’s a sleek new tool addressing an immediate need for plenty of providers. It’s rare to see the type of viral adoption that OpenEvidence managed to generate, which is a good reminder that many areas of healthcare change slowly… then all at once.

AI Enthusiasm Heats Up With Doctors

The unstoppable march of AI only seems to be gaining momentum, with an American Medical Association survey noting greater enthusiasm – and less apprehension – among physicians. 

The AMA’s Augmented Intelligence Research survey of 1,183 physicians found that those whose enthusiasm outweighs their concerns with health AI rose to 35% in 2024, up from 30% in 2023. 

  • The lion’s share of doctors recognize AI’s benefits, with 68% reporting at least some advantage in patient care (up from 63% in 2023).
  • In both years, about 40% of doctors were equally excited and concerned about health AI, with almost no change between surveys.

The positive sentiment could be stemming from more physicians using the tech in practice. AI use cases nearly doubled from 38% in 2023 to 66% in 2024.

  • The most common uses now include medical research, clinical documentation, and drafting care plans or discharge summaries.

The dramatic drop in non-users (62% to 33%) over the course of a year is impressive for any new health tech, but doctors in the latest survey called out several needs that have to be addressed for adoption to continue.

  • 88% wanted a designated feedback channel
  • 87% wanted data privacy assurances
  • 84% wanted EHR integration

While physicians are still concerned about the potential of AI to harm data privacy or offer incorrect recommendations (and liability risks), they’re also optimistic about its ability to put a dent in burnout.

  • The biggest area of opportunity for AI according to 57% of physicians was “addressing administrative burden through automation,” reclaiming the top spot it reached in 2023.
  • That said, nearly half of physicians (47%) ranked increased AI oversight as the number one regulatory action needed to increase trust in AI enough to drive further adoption.

The Takeaway

It’s encouraging to see the shifting sentiment around health AI, especially as more doctors embrace its potential to cut down on burnout. Although the survey pinpoints better oversight as the key to maximizing trust, AI innovation is moving so quickly that it wouldn’t be surprising if not-too-distant breakthroughs were magical enough to inspire more confidence on their own.

House Task Force AI Policy Recommendations

The House Bipartisan Task Force on Artificial Intelligence closed out the year with a bang, launching 273-pages of AI policy fireworks.

The report includes recommendations to “advance America’s leadership in AI innovation” across multiple industries, and the healthcare section definitely packed a punch.

The task force started by highlighting AI’s potential across a long list of use cases, which could have been the tracklist for healthcare’s greatest hits of 2024:

  • Drug Development – 300+ drug applications contained AI components this year.
  • Ambient AI – Burnout is bad. Patient time is good.
  • Diagnostics – AI can help cut down on $100B in annual costs tied to diagnostic errors.
  • Population Health – Population-level data can feed models to improve various programs.

While many expect the Trump administration’s “AI Czar” David Sacks to take a less-is-more approach to AI regulation, the task force urged Congress to consider guardrails in key areas:

  • Data Availability, Utility, and Quality
  • Privacy and Cybersecurity
  • Interoperability
  • Transparency
  • Liability

Several recommendations were offered to ensure these guardrails are effective, although the task force didn’t go as far as to prescribe specific regulations. 

  • The report suggested that Congress establish clear liability standards given that they can affect clinical-decision making (the risk of penalties may change whether a provider relies on their judgment or defers to an algorithm).
  • Another common theme was to maintain robust support for healthcare research related to AI, which included more NIH funding since it’s “critical to maintaining U.S. leadership.” 

The capstone recommendation – which was naturally well-received by the industry – was to support appropriate AI payment mechanisms without stifling innovation.

  • CMS calculates reimbursements by accounting for physician time, acuity of care, and practice expenses, yet fails to adequately reimburse AI for impacting those metrics.
  • The task force said there won’t be a “one size fits all” policy, so appropriate payment mechanisms should recognize AI’s impact across multiple technologies and settings (Ex. many AI use cases may fit into existing benefit categories or facility fees).

The Takeaway

AI arrived faster than policy makers could keep up, and it’ll be up to the incoming White House to get AI past its Wild West regulatory era without hobbling the pioneers driving the progress. One way or another, that’s a sign that AI is starting a new chapter, and we’re excited to see where the story goes in 2025.

Patients Ready For GenAI, But Not For Everything

Bain & Company’s US Frontline of Consumer Healthcare Survey turned up the surprising result that patients are more comfortable with generative AI “analyzing their radiology scan and making a diagnosis than answering the phone at their doctor’s office.”

That’s quite the headline, but the authors were quick to point out that it’s probably less of a measure of confidence in GenAI’s medical expertise than a sign that patients aren’t yet comfortable interacting with the technology directly.

Here’s the breakdown of patient comfort with different GenAI use cases:

While it does appear that patients are more prepared to have GenAI supporting their doctor than engaging with it themselves, it’s just as notable that less than half reported feeling comfortable with even a single GenAI application in healthcare.

  • No “comfortable” response was above 37%, and after adding in the “neutral” votes, there was still only one application that broke 50%: note taking during appointments.
  • The fact that only 19% felt comfortable with GenAI answering calls for providers or payors could also just be a sign that patients would far rather talk to a human in either situation, regardless of the tech’s capabilities.

The next chart looks at GenAI perceptions among healthcare workers: 

Physicians and administrators are feeling a similar mix of excitement and apprehension, sharing a generally positive view of GenAI’s potential to alleviate admin burdens and clinician workloads, as well as a concern that it could undermine the patient-provider relationship.

  • Worries over new technology threatening the relationship of patients and providers aren’t new, and we just witnessed them play out at an accelerated pace with telehealth.
  • Despite initial fears, the value of the relationship prevailed, which Bain backed up with the fact that 61% of patients who use telehealth only do so with their own provider.

Whether you’re measuring by patient or provider comfort, GenAI’s progress will be closely tied to trust in the technology on an application-by-application basis. Trust takes time to build and first impressions are key, so this survey underscores the importance of nailing the user experience early on.

The Takeaway
The story of generative AI in healthcare is just getting started, and as we saw with telehealth, the first few pages could take some serious willpower to get through. New technologies mean new workflows, revenue models, and countless other barriers to overcome, but trust will only keep building every step of the way. Plus, the next chapter looks pretty dang good.

Hidden Flaws Behind High Accuracy of Clinical AI

AI is getting pretty darn good at patient diagnosis challenges… but don’t bother asking it to show its work.

A new study in npj Digital Medicine pitted GPT-4V against human physicians on 207 image challenges designed to test the reader’s ability to diagnose a patient based on a series of pictures and some basic clinical background info.

  • Researchers at the NIH and Weill Cornell Medicine then asked GPT-4V to provide step-by-step reasoning for how it chose the answer.
  • Nine physicians then tackled the same questions in both a closed-book (no outside help) and open-book format (could use outside materials and online resources).

How’d they stack up?

  • GPT-4V and the physicians both scored high marks for accurate diagnoses (81.6% vs. 77.8%), with a statistically insignificant difference in performance. 
  • GPT-4V bested the physicians on the closed-book test, selecting more correct diagnoses.
  • Physicians bounced back to beat GPT-4V on the open-book test, particularly on the most difficult questions.
  • GPT-4V also performed well in cases where physicians answered incorrectly, maintaining over 78% accuracy.

Good job AI, but there’s a catch. The rationales that GPT-4V provided were riddled with mistakes – even if the final answer was correct – with error rates as high as 27% for image comprehension.

The Takeaway

There could easily come a day when clinical AI surpasses human physicians on the diagnosis front, but that day isn’t here quite yet. Real care delivery also doesn’t bless physicians with a set of multiple choice options, and hallucinating the rationale behind diagnoses doesn’t cut it with actual patients.

GenAI Still Working Toward Prime Time With Patients

When it rains it pours for AI research, and a trio of studies published just last week suggest that many new generative AI tools might not be ready for prime time with patients.

The research that grabbed the most headlines came out of UCSD, finding that GenAI-drafted replies to patient messages led to more compassionate responses, but didn’t cut down on overall messaging time.

  • Although GenAI reduced the time physicians spent writing replies by 6%, that was more than offset by a 22% increase in read time, while also increasing average reply lengths by 18%.
  • Some of the physicians were also put off by the “overly nice” tone of the GenAI message drafts, and recommended that future research look into “how much empathy is too much empathy” from the patient perspective.

Another study in Lancet Digital Health showed that GPT-4 can effectively generate replies to health questions from cancer patients… as well as replies that might kill them.

  • Mass General Brigham researchers had six radiation oncologists review GPT-4’s responses to simulated questions from cancer patients for 100 scenarios, finding that 58% of its replies were acceptable to send to patients without any editing, 7% could lead to severe harm, and one was potentially lethal.
  • The verdict? Generative AI has the potential to reduce workloads, but it’s still essential to “keep doctors in the loop.”

A team at Mount Sinai took a different path to a similar conclusion, finding that four popular GenAI models have a long way to go until they’re better than humans at matching medical issues to the correct diagnostic codes.

  • After having GPT-3.5, GPT-4, Gemini Pro, and Llama2-70b analyze and code 27,000 unique diagnoses, GPT-4 came out on top in terms of exact matches, achieving an uninspiring accuracy of 49.8%.

The Takeaway

While it isn’t exactly earth-shattering news that GenAI still has room to improve, the underlying theme with each of these studies is more that its impact is far from black and white. GenAI is rarely completely right or completely wrong, and although there’s no doubt we’ll get to the point where it’s working its magic without as many tradeoffs, this research confirms that we’re definitely not there yet.

Scaling Adoption of Medical AI

Medical AI is on the brink of improving outcomes for countless patients, prompting a trio of all-star researchers to pen an NEJM AI article tackling what might be its biggest obstacle: real-world adoption.

What drives real-world adoption? Those who have been around the block as many times as Dr. Michael Abramoff, Dr. Tinglong Dai, and Dr. James Zou are all-too familiar with the answer… Reimbursement makes the world go ‘round.

To help medical AI developers get their tools in front of the patients who need them, the authors explore the pros and cons of current paths to reimbursement, while offering novel frameworks that could lead to better financial sustainability.

Traditional Fee-for-Service treats medical AI similarly to how new drugs or medical devices are reimbursed, and is a viable path for AI that can clear the hurdle of demonstrating improvements to clinical outcomes, health equity, clinician productivity, and cost-effectiveness (e.g. AI for diabetic eye exams).

  • Meeting these criteria is a prerequisite for adopting AI in healthcare, yet even among the 692 FDA-authorized AI systems, few have been able to pass the test. The approach carries substantial risk in terms of time and resources for AI developers.
  • Despite those limitations, FFS might be appropriate for AI because health systems are adept at assessing the financial impact of new technologies under it, and reimbursement through a CPT code provides hard-to-match financial sustainability.

Value-based care frameworks provide reimbursement on the basis of patient- or population-related metrics (MIPS, HEDIS, full capitation), and obtaining authorization for medical AI to “count” toward closing care gaps for MIPS and HEDIS has been shown to be considerably more straightforward than attaining a CPT code.

  • That said, if a given measure is not met (e.g. 80% of the population must receive an annual diabetic eye examination), the financial benefit of closing even three quarters of that care gap is typically zero, potentially disincentivizing AI adoption.

Given the limitations of existing pathways, the authors offer a potential new approach that’s derived from the Medicare Part B model, which reimburses drugs administered in an outpatient setting based on a “cost plus” markup.

  • Here, providers could acquire the rights to use AI, then get reimbursed based on the average cost of the service plus a specified margin, contingent upon CMS coverage of a particular CPT code.
  • This model essentially splits revenue between AI creators and users, and would alleviate some of the tensions of both FFS and VBC models.

The Takeaway

Without sustainable reimbursement, widespread medical AI adoption won’t be possible. Although the quest continues for a silver bullet (even the authors’ revenue-sharing model still carries the risk of overutilization and requires the creation of new CPT codes), exploring novel approaches is essential given the challenges of achieving reimbursement through existing FFS and VBC pathways.

How Health Systems Are Approaching AI

The New England Journal of Medicine’s just-released NEJM AI publication is off to the races, with its February issue including a stellar breakdown of how academic medical centers are managing the influx of predictive models and AI tools.

Researchers identified three governance phenotypes for managing the AI deluge:

  • Well-Defined Governance – health systems have explicit, comprehensive procedures for the evaluation of AI and predictive models.
  • Emerging Governance – systems are in the process of adapting previously established approaches for things like EHRs to govern AI.
  • Interpersonal Governance – a small team or single person is tasked with making decisions about model implementation without consistent evaluation requirements. 

Regardless of the phenotype, interviews with AI leadership at 13 academic medical centers revealed that chaotic implementations are hard to avoid, partly due to external factors like vague regulatory standards.

  • Most AI decision makers were aware of how the FDA regulates software, but believed those rules were “broad and loose,” and many thought they only applied to EHRs and third party vendors rather than health systems.

AI governance teams report better adherence to new solutions that prioritize limiting clicks for providers when they’re implemented. Effective governance of prediction models requires a broader approach, yet streamlining workflows is still a primary consideration for most implementations. That’s leading to trouble down the road considering predictive models’ impact on patient care, health equity, and quality care.

The Takeaway

Even well-equipped academic medical centers are struggling to effectively identify and mitigate the countless potential pitfalls that come along with predictive AI implementation. Existing AI governance structures within healthcare orgs all seem to be in need of additional guidance, and more guardrails from both the industry and regulators might help turn AI ambitions into AI-improved outcomes.

AI Therapists in VR Help With Provider Shortage

New research in npj Digital Medicine suggests that virtual reality might be part of the answer to the nation’s mental health provider shortage, as long as patients don’t might if their therapist is an AI avatar.

The small study had 14 participants with moderate anxiety or depression undergo immersive therapy sessions led by a trained digital avatar developed by Cedars-Sinai.

Known as XAIA, or the eXtended-Reality Artificially Intelligent Ally, the program provides a way for patients to self-administer conversational therapy in relaxing virtual reality environments, such as a creek-side meadow or a sunny beach retreat.

  • It’s unclear what the therapist avatar actually looks like, but we’re going to choose to believe it looks like this picture of XAIA in the diagram of the conversation logic.

Throughout the 30-minute therapy sessions, with topics ranging from loneliness and family problems to financial distress, XAIA successfully applied key psychotherapeutic techniques:

  • Observations that reflect an understanding of the user’s issues; Ex. “Your worries about your health and your living situation are clearly causing you a lot of distress” 
  • Normalizing feelings; Ex. “It’s understandable to feel like a failure when life gets rough”
  • Showing empathy; Ex. “It must be a challenging time for you”
  • Validation and praise where appropriate; Ex. “Your practice of not taking these relationships for granted is truly commendable”

Participants frequently responded to XAIA as if it were a human therapist, sharing raw emotions like, “I feel like I’m a failure. The only thing I have to look forward to – I know you’re going to laugh – is to go see Taylor Swift … I’m sorry I’m crying.”

  • XAIA’s response: “Firstly, there’s no need to apologize for crying. These feelings are valid.”

Most participants described XAIA as approachable, empathetic, and intelligent, but it’s worth noting that a few mentioned they would still prefer a human therapist if given the choice.

The Takeaway

Although this wasn’t exactly the largest study we’ve ever covered, the results provide early evidence that a combination of VR and AI therapy could be part of the solution to balancing behavioral health’s supply and demand equation. Over half of people facing mental health disorders aren’t getting the treatment they need, and if XAIA isn’t already a clearly better alternative than no treatment at all, new advances will only make the AI+VR path more promising going forward.

GPT-4 Capable of Diagnosing Complex Cases

The New England Journal of Medicine is adding to its library of top tier publications with the launch of a new journal focused on artificial intelligence – NEJM AI – and it’s gearing up for the January debut with a sneak peek at a few early-release articles.

Use of GPT-4 to Diagnose Complex Clinical Cases was a standout study from the preview, finding that GPT-4 correctly diagnosed over half of complex clinical cases.

Researchers asked GPT-4 to provide a diagnosis for 38 clinical case challenges that each included a medical history along with six multiple choice options. The most common diagnoses included 15 cases related to infectious disease (39.5%), five cases in endocrinology (13.1%), and four cases in rheumatology (10.5%).

  • GPT-4 was given the plain unedited text from each case, and solved each one five times to evaluate reproducibility.
  • Those answers were compared to over 248k answers from online medical-journal readers, which were used to simulate 10k complete sets of human answers.

GPT-4 correctly diagnosed an average of 21.8 cases (57%), while the medical-journal readers correctly diagnosed an average of 13.7 cases (36%). Not too shabby considering the LLM could only leverage the case text and not the included graphics.

  • Based on the simulation, GPT-4 also performed better than 99.98% of all medical-journal readers, with high reproducibility across all five tests (lowest score was 55.3%).

A couple caveats to consider are that medical-journal readers aren’t licensed physicians, and that real-world medicine doesn’t provide convenient multiple choice options. That said, a separate study found that GPT-4 performed well even without answer options (44% accuracy), and these models will only grow more precise as multimodal data gets incorporated.

The Takeaway

The race to bring AI to healthcare is on, and it’s generating a stampede of new research investigating the boundaries of the tech’s potential. As the hype of the first lap starts to give way to more measured progress, NEJM AI will most likely be one of the best places to keep up with the latest advances.

Get the top digital health stories right in your inbox

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Digital Health Wire team

You're all set!