|
HealthBench, CMMI Revamp, and Women as Digital Health Consumers May 15, 2025
|
|
|
|
Together with
|
|
|
“Improving human health will be one of the defining impacts of AGI.”
|
OpenAI’s grand introduction to HealthBench.
|
|
|
OpenAI is officially setting its sights on healthcare with the launch of HealthBench, a new benchmark for evaluating AI performance in realistic medical scenarios.
HealthBench marks the first time the ChatGPT developer has taken a direct step into the industry without a partner to hold its hand.
- Developed with 262 physicians from 60 countries, HealthBench includes 5,000 simulated health conversations, each with a custom rubric to grade the responses.
- The conversations “were created to be realistic and similar to real-world use of LLMs,” meaning they’re multi-turn and multilingual, while spanning a range of medical specialties and themes like handling uncertainty or global health.
Here’s how current frontier models stacked up in the HealthBench test.
- OpenAI’s o3 was the best performing model with a score of 60%
- xAI’s Grok 3 ranked second with a score of 54%
- Google’s Gemini 2.5 Pro followed close behind at 52%
All three leading models outperformed physicians who weren’t equipped with AI, although physicians outperformed the newer models when they had access to the AI output.
- The paper also reviewed other LLMs like Llama and Claude, but unsurprisingly none of them scored higher than OpenAI’s model on OpenAI’s own test.
Even the best models came up short in a few common places, AKA areas that developers should focus on to improve performance.
- Current AI models would rather hallucinate than withhold an answer they aren’t confident on, obviously not a good trait to bring into a clinical setting.
- None of the leading LLMs were great at asking for additional context or more information when the input was vague.
- When AI misses, it misses bad, as seen in the sharp quality dropoff with the worst 10% of responses.
The Takeaway
Outside of giving us yet another datapoint that AI is catching up to human physicians, HealthBench provides one of the best standardized ways to compare model performance in (simulated) clinical practice, and that’s just what the innovation doctor ordered.
|
|
How Navina Empowers Privia Health Physicians to Excel in VBC
The shift to value-based care comes with many challenges, but AI can make a powerful difference. At Privia Health, Navina helped improve HCC coding accuracy while reducing the manual burden on physicians and empowering them to focus on patient care – without being distracted by administrative responsibilities. Read on to learn the practical lessons from Privia Health’s experience leveraging Navina’s AI to excel in value-based care.
|
|
The First 30 Days: What to Expect With AI
Implementing AI documentation tools promises significant benefits, but how do you ensure a smooth transition? Playback Health has you covered with this comprehensive 30-day roadmap outlining what to expect, industry best practices, and its own proven implementation approach.
|
|
- UPMC Launches Glimmer Health: UPMC joined forces with Redesign to launch chronic pain management startup Glimmer Health. Since chronic pain has largely become the responsibility of primary care physicians, Glimmer equips PCPs with nurse practitioners and case managers to deliver personalized care plans while coordinating specialty pain treatment and behavioral health resources. The launch marks the second spin out from UPMC and Redesign following the 2022 debut of surgical app Pip Care.
- CMMI Strategy Revamp: The CMMI is revamping its strategy to focus on cutting costs and tackling the root causes of chronic diseases, rather than treating them after onset. The new approach is built on three pillars: (1) promoting the prevention of illness and helping manage chronic conditions, (2) improving access to data and tech so that beneficiaries can achieve their health goals, (3) encouraging competition and choice for healthcare services. The news coincidentally arrives weeks after an Avalere report showing that only a third of CMMI alternative care delivery models yield any savings.
- Olio Series B: Olio landed $11M of Series B funding to expand the reach of its care coordination software that connects hospital teams with post-acute providers. The Olio platform helps clinicians set specialty-specific treatment goals, then facilitates data exchange across care settings like skilled nursing, home health, behavioral health, and long-term care.
- Determinants of AI Acceptance: New research in BMJ Open identified “universal determinants of AI acceptance” among healthcare workers. The review of 46 studies found that two primary drivers impact AI acceptance above all others, regardless of specialty or skill level. The first was “performance expectancy,” or the extent to which the clinician believes that the AI will help them improve their job performance. The second was “facilitating conditions,” or the degree to which a clinician believes that their organizational infrastructure can support the use of that AI tool.
- Primary Care 101: Dr. Paulius Mui and Dr. Kenneth Qiu opened up their Primary Care 101 course to anyone looking to level up their foundational knowledge of primary care. The open access course includes 15 modules and great video explainers on how PCPs get paid, coordinate care, run clinical operations, manage workflows, and more. One of the best resources out there for anyone new to the space or in need of a quick refresh.
- VR Beats Audio For Pain: A recent study in npj Digital Medicine suggests that virtual reality-based telehealth outperforms audio-only content for chronic pain patients. Researchers randomly assigned 54 participants with chronic orofacial pain to receive either a 5-day VR intervention or an audio-only version of the same content. VR significantly outperformed audio-only content in reducing pain intensity, anxiety, and improving mood – plus it had the added benefit of better sleep quality.
- Women as Digital Health Consumers: Rock Health put out a great deep dive exploring women’s role as the “Chief Medical Officer” of their household. In 2024, $671M was invested in startups addressing “women+ health needs,” and they’re responding to the surge in innovation by becoming avid digital health users. Women are adopting virtual care and embracing new tools to support their family’s health, but the report unpacks the more nuanced story about the differences in how women navigate digital care experiences, where they find value, and which tools are earning their trust.
- Dazos Secures $25M: Dazos secured $25M of Series A funding to equip more behavioral health clinicians with specialty-specific practice management software. Behavioral health practices often have to balance unique marketing constraints while fostering referral relationships with other providers in their community, and Dazos’ software suite helps accomplish that through a purpose built CRM, a DazosIQ billing manager, and its iVerify marketing and benefits verification solution.
- Veradigm GLP-1 Data: Veradigm debuted a GLP-1 real-world evidence resource to help researchers uncover insights into areas like side effects, discontinuation reasons, and outcomes tracking. By applying AI to EHR data within the Veradigm Network, the curated GLP-1 dataset has already uncovered some interesting findings, including that 85% of obese patients discontinued use after two years (mainly due to GI issues or high OOP costs) and that providers often input “semaglutide injection” without clarifying its source.
- HHS Calls for Healthcare Deregulation: HHS Secretary Robert F. Kennedy, Jr. this week announced a plan to deregulate HHS and FDA by identifying and eliminating “outdated or unnecessary regulations.” The plan’s centerpiece is a “10 to 1” deregulatory policy – for every new regulation proposed, at least 10 existing regulatory actions will be rescinded. What’s more, the total cost of all new regulations for fiscal 2025 “must be significantly less than zero.” A 60-day comment period on the proposal opened this week.
|
|
2025 State of Payer Enrollment & Credentialing
With increasing costs and shifting regulations, outdated processes won’t cut it. Medallion’s new report breaks down the biggest challenges, emerging trends, and how automation is transforming the landscape. Get the insights you need – read the full report today.
|
|
What Nurses Are Teaching Us About Ambient AI
Nurses are the backbone of healthcare, and their tools need to keep up with the growing complexity of their roles. For nurses, documentation isn’t just about compliance – it’s how they track key observations, ensure care continuity, and support billing. Nabla’s ambient AI is already helping 8,500 nurses streamline documentation and reclaim time for patients across 30+ orgs. See what Nabla’s building for the fast-paced needs of nursing workflows.
|
|
- Transforming Remote Blood Pressure Monitoring: BPM Pro 2 integrates effortlessly into health programs, streamlines care delivery, and boosts patient adherence and engagement. Discover a better experience for your patients and care teams.
- Future-Proof Your Health System with Scalable Primary Care: The reality of modern health system primary care: more patients, fewer resources, constant pressure to deliver. K Health offers a strategic advantage, with a clinical AI and Virtualist care model that provides the scalable solution you need to meet patient demand anytime, anywhere. Join the ranks of forward-thinking systems like Cedars-Sinai, Hackensack Meridian Health, and Hartford HealthCare. Explore the future of primary care with K Health.
- Ambient AI That Delivers Real ROI – 11% wRVU Increase: Riverside Health has partnered with Abridge to address clinician burnout and improve patient experiences. Abridge’s more accurate documentation has also supported a significant increase in the health system being appropriately compensated for care delivered. Learn more about how Riverside Health saw ROI in multiples from more accurate documentation here.
|
|
|
|
|