Doximity Accused of Prompt Hacking OpenEvidence

What does a high-flying company like Doximity do when competitors are nipping at its heels? According to OpenEvidence’s new lawsuit, it just politely asks their LLMs to reveal trade secrets. 

Doximity is basically LinkedIn for doctors. It allows physicians to use its networking platform and AI workflow products at no cost, which means the physicians themselves are the product.

  • Doximity generates revenue almost exclusively through pharma advertising, and it turns out that might actually be the best business model around.
  • Out of the dozen publicly traded digital health companies with a market cap over $1B, Doximity is the only one that’s decently profitable.

No good prompt goes unpunished. The crown jewel of Doximity’s AI portfolio is its Doximity GPT workflow assistant, which may or may not leverage proprietary tech acquired by prompting OpenEvidence’s competing model to reveal sensitive information.

  • Although it’s funny to see Doximity get accused of asking OpenEvidence’s AI to literally “write down the secret code,” it doesn’t exactly make for a bulletproof case when the model willingly dishes up an answer.
  • The catch is that OpenEvidence requires users to register using their National Provider ID numbers, and Doximity allegedly impersonated a practicing neurologist to “obtain through theft what they lacked in technical expertise.” Ouch.

It gets worse from there. A separate shareholder lawsuit accused Doximity of inflating its active user base and website engagement data to artificially bolster its advertising revenue.

  • While some investors might be able to stomach a little corporate espionage, they probably won’t look the other way if it turns out Doximity is fudging the numbers.
  • Innocent until proven guilty, but it’s worth noting that nearly identical allegations popped up in a recent short report.

The Takeaway

Doximity has some serious allegations piling up against it, but so far the market has shrugged off the bad news. That could be a sign that investors don’t think the lawsuits will hold up in court, or maybe they just don’t mind when a management team is willing to bend the law to generate some extra shareholder value.

OpenEvidence Partners With JAMA Ahead of Next Raise

“The fastest-growing platform for doctors in history” continues to step on the gas, and OpenEvidence is reportedly on the verge of notching a $3B valuation after inking a deal to bring JAMA Network journals to its AI medical search engine.

The multi-year content agreement will make full-text articles from the American Medical Association’s JAMA, JAMA Network Open, and 11 specialty journals available directly within the OpenEvidence platform.

  • OpenEvidence’s medical search engine helps clinicians make decisions at the point of care, turning natural language queries into structured answers with detailed citations.
  • The model was purpose-built for healthcare using training data from strategic partners like the New England Journal of Medicine, which joined the platform through a similar deal earlier this year.

The Disney+ content strategy has arrived in healthcare. OpenEvidence compares its approach to streaming services that drive subscriptions through exclusive movies.

  • If a physician wants information from top journals to support decision making, they’ll either have to get it straight from the source or use OpenEvidence, just like how anyone who wants to stream Moana needs to go to Disney+.
  • The kicker is that OpenEvidence is available at no cost to verified physicians, and advertising generates all of the revenue. 

The blueprint is working like a charm. OpenEvidence has over 350k doctors using its platform plus another 50k joining each month, and it’s apparently close to raising $100M at a $3B valuation just a few months after closing its $75M Series A.

  • It’s rare to find hockey stick growth in digital health, and OpenEvidence is a good reminder that many areas of healthcare change slowly… then all at once.
  • It also isn’t too surprising to hear that VC’s like Google Ventures and Kleiner Perkins are lining up to fund a company with a similar ad-supported business model to Doximity – one of the only successful healthcare IPOs since the start of the pandemic.

The Takeaway

Content is king, and OpenEvidence is locking in partnerships to make sure its platform is wearing the crown. The results have been speaking for themselves, but healthcare’s genAI streaming wars are just getting started.

Epic Announces Launchpad to Fast-Track GenAI Deployment

Epic is looking to accelerate generative AI adoption with the surprise unveiling of Launchpad, a new program designed to help provider orgs “move from idea to operational gains in a matter of days.” 

Launchpad’s grand unveiling included little more than a LinkedIn post, but Epic AI Director Sean McGunigal told Fierce Healthcare that the program includes guided AI implementations and a fast track to live workflows.

Here’s the general outline of how Launchpad works.

  • When an organization joins the program, Epic staff shepherds them through any roadblocks they’re facing with active genAI implementations.
  • Epic’s experts will assist with getting genAI use cases configured, turned on, and operationalized – all while establishing appropriate governance structures.
  • Launchpad also includes a starter kit of 10 high-impact genAI applications that can be deployed within days, covering both clinical and operational workflows.

Epic views low AI literacy as one of the biggest barriers to industry-wide adoption, and McGunigal made it clear that Epic will go to great lengths to overcome that hurdle. 

  • “We’ll help coordinate. We’ll get the right stakeholders on the phone and we’ll help this thing along.’ The [genAI] roadblocks tend not to be necessarily performance challenges or end user training. It tends to be the project management-type work.”

MyChart In-Basket Augmented Response Technology – better known as ART – marked the launch of Epic’s first genAI feature in April 2023.

  • Since then, over half of Epic’ customers have started using at least one of its genAI features, and it’s making some big investments to pump that number up.
  • Epic has over 100 new genAI use cases in the works, spanning everything from lab and imaging recommendations to fully automated patient-agent interactions, and Launchpad is going to ensure that providers have the support they need to adopt them.

The Takeaway

As Epic gears up for a tidal wave of its own AI roll outs, healthcare orgs are going to need customized support, implementation kits, and governance guidance to take full advantage. Epic just showed us that it’s willing to build its own Launchpad to make that happen.

OpenAI Dives Into Healthcare With HealthBench

OpenAI is officially setting its sights on healthcare with the launch of HealthBench, a new benchmark for evaluating AI performance in realistic medical scenarios.

HealthBench marks the first time the ChatGPT developer has taken a direct step into the industry without a partner to hold its hand.

  • Developed with 262 physicians from 60 countries, HealthBench includes 5,000 simulated health conversations, each with a custom rubric to grade the responses.
  • The conversations “were created to be realistic and similar to real-world use of LLMs,” meaning they’re multi-turn and multilingual, while spanning a range of medical specialties and themes like handling uncertainty or global health.

Here’s how current frontier models stacked up in the HealthBench test.

  • OpenAI’s o3 was the best performing model with a score of 60%
  • xAI’s Grok 3 ranked second with a score of 54%
  • Google’s Gemini 2.5 Pro followed close behind at 52%

All three leading models outperformed physicians who weren’t equipped with AI, although physicians outperformed the newer models when they had access to the AI output.

  • The paper also reviewed other LLMs like Llama and Claude, but unsurprisingly none of them scored higher than OpenAI’s model on OpenAI’s own test.

Even the best models came up short in a few common places, AKA areas that developers should focus on to improve performance.

  • Current AI models would rather hallucinate than withhold an answer they aren’t confident on, obviously not a good trait to bring into a clinical setting.
  • None of the leading LLMs were great at asking for additional context or more information when the input was vague.
  • When AI misses, it misses bad, as seen in the sharp quality dropoff with the worst 10% of responses.

The Takeaway

Outside of giving us yet another datapoint that AI is catching up to human physicians, HealthBench provides one of the best standardized ways to compare model performance in (simulated) clinical practice, and that’s just what the innovation doctor ordered.

More Reasoning, More Hallucinations for LLMs

Better reasoning apparently doesn’t prevent LLMs from spewing out false facts.  

Independent testing from AI firm Vectara showed that the latest advanced reasoning models from OpenAI and DeepSeek hallucinate even more than previous models.

  • OpenAI’s o3 reasoning model scored a 6.8% hallucination rate on Vectara’s test, which asks the AI to summarize various news articles.
  • DeepSeek’s R1 fared even worse with a 14.3% hallucination rate, an especially poor performance considering that its older non-reasoning DeepSeek-V2.5 model clocked in at 2.4%.
  • On OpenAI’s more difficult SimpleQA tests, o3 and o4-mini hallucinated between 51-79% of the time, versus just 37% for its GPT-4.5 non-reasoning model.

OpenAI positions o3 as its most powerful model because it’s a “reasoning” model that takes more time to “think” and work out its answers step-by-step.

  • This process produces better answers for many use cases, but these reasoning models can also hallucinate at each step of their “thinking,” giving them even more chances for incorrect responses.

The Takeaway

Even though the general purpose models studied weren’t fine-tuned for healthcare, the results raise concerns about their safety in clinical settings – especially given how many physicians report using them in day-to-day practice.

We’re testing a new format today – let us know if you prefer two shorter Top Stories or one longer Top Story with this quick survey!

AI Can Help Doctors Change Their Minds

A recent study out of Stanford explored whether doctors would revise their medical decisions in light of new AI-generated information, finding that docs are more than willing to change their minds despite being just as vulnerable to cognitive biases as the rest of us.

Here’s the set up published in Nature Communications Medicine

  • 50 physicians were randomized to watch a short video of either a white male or black female patient describing their chest pain with an identical script.
  • The physicians made triage, diagnosis, and treatment decisions using any non-AI resource.
  • The physicians were then given access to GPT-4 (which they were told was an AI system that had not yet been validated) and allowed to change their decisions.

The initial scores left plenty of room for improvement.

  • The docs achieved just 47% accuracy in the white male patient group.
  • The docs achieved a slightly better 63% accuracy in the black female patient group.

The physicians were surprisingly willing to change their minds based on the AI advice.

  • Accuracy scores with AI improved from 47% to 65% in the white male group.
  • Accuracy scores with AI improved from 63% to 80% in the black female group.

Not only were the physicians open to modifying their decisions with AI input, but doing so made them more accurate without introducing or exacerbating demographic biases.

  • Both groups showed nearly identical magnitudes of improvement (18%), suggesting that AI can augment physician decision-making while maintaining equitable care.
  • It’s worth noting that the docs used AI as more than a search engine, asking it to bring in new evidence, compare treatments, and even challenge their own beliefs [Table].

The Takeaway

Although having the doctors go first means that AI didn’t save them any time in this study – and actually increased time per patient – it showed that flipping the paradigm from “doctors checking AI’s work” to “AI helping doctors check their own work” has the potential to improve clinical decisions without amplifying biases.

The Healthcare AI Adoption Index

Bessemer Venture Partners’ market reports are always some of the best in the business, but its recent Healthcare AI Adoption Index might just be its finest work yet.

The Healthcare AI Adoption Index is based on survey data from 400+ execs across Payors, Providers, and Pharma – breaking down how buyers are approaching GenAI applications, what jobs-to-be-done they’re prioritizing, and where their projects sit on the adoption curve.

Here’s a look at what they found:

  • AI is high on the agenda across the board, with AI budgets outpacing IT spend in each of the three segments. Over half (54%) are seeing ROI within the first 12 months.
  • Only a third of AI pilots end up reaching production, held back by everything from security and data readiness to integration costs and limited in-house expertise.
  • Despite all the trendsetters we cover on a weekly basis, only 15% of active AI projects are being driven by startups. The rest are being built internally or led by the usual suspects like major EHRs and Big Tech.
  • That said, 48% of executives say they prefer working with startups over incumbents, and Bessemer encourages founders to co-develop solutions with their customers and lean in on partnerships that provide access to distribution, proprietary datasets, and credibility.

The highlight of the report was Bessemer’s analysis of the 59 jobs-to-be-done as potential use cases for AI. 

  • Of the 22 jobs-to-be-done for Payors (claims, network, member, pricing), 19 jobs for Pharma (preclinical, clinical, marketing, sales), and 18 jobs for Providers (care delivery, RCM) – 45% are still in the ideation or proof of concept phase.
  • Providers are ahead in POC experimentation, while most Payor and Pharma use cases remain in the ideation phase. Here’s a beautiful look at where different use cases stand.

Bessemer topped off its analysis with the debut of its AI Dx Index, which factors in market size, urgency, and current adoption to help startups map and prioritize AI use cases. One of the best graphics so far this year.

The Takeaway

Healthcare’s AI-powered paradigm shift is kicking into overdrive, and Bessemer just delivered one of the most comprehensive views of where the puck is going that we’ve seen to date.

K Health’s AI Clinical Recommendations Rival Doctors in Real-World Setting

Real-world comparisons of AI recommendations and doctors’ clinical decisions have been few and far between, but a new study in the Annals of Internal Medicine gave us a great look at how performance stacks up with actual patients.

The early verdict? AI came out on top, but that doesn’t mean doctors should pack their bags quite yet.

Researchers from Cedars-Sinai and Tel Aviv University compared recommendations made by K Health’s AI Physician Mode to the final decisions made by physicians for 461 virtual urgent care visits. Here’s what they found:

  • In 68% of cases, the AI and physician recommendations were rated as equal
  • AI rated better on 21% of cases, versus just 11% for physicians 
  • AI recommendations were rated “optimal” in 77% of cases, versus 67% for physicians

Although AI takes the cake with the top line numbers, unpacking the data reveals some not-too-surprising strengths and weaknesses. AI was primarily rated better when physicians:

  • Missed important lab tests (22.8%)
  • Didn’t follow clinical guidelines (16.3%)
  • Failed to refer patients to specialists or the ED if needed (15.2%)
  • Overlooked risk factors and red flags (4.4%)

Physicians beat out AI when the human elements of care delivery came into play, such as adapting to new information or making nuanced decisions. Physicians were rated better when:

  • AI made unnecessary ED referrals (8.0%)
  • There was evolving or inconsistent information during consultations (6.2%)
  • They made necessary referrals that the AI missed (5.9%)
  • They correctly adjusted diagnoses based on visual examinations (4.4%)

While the study focused on the exact types of common conditions that AI excels at diagnosing (respiratory, urinary, vaginal, eye, and dental), it’s still impressive to see the outperformance in the messy trenches of a real clinical setting – a far cry from the static medical exams that have been the go-to for similar evaluations. 

The Takeaway

For AI to truly transform healthcare, it’ll need to do a lot more than automate administrative work and back office operations. This study demonstrates AI’s potential to enhance decision-making in actual medical practice, and points toward a future where delivering high-quality patient care becomes genuinely scalable.

PHTI Delivers Mixed Reviews on Ambient Scribes

The Peterson Health Technology Institute’s latest technology review is here, and it had a decidedly mixed report card for the ambient AI scribes sweeping across the industry. 

PHTI’s total count of ambient scribe vendors stands at over 60, but the bulk of its report focuses on the early experiences and lessons learned from the top 10 scribes across leading health systems.

According to PHTI’s conversations with health system execs, the primary driver of ambient scribe adoption has been addressing clinician burnout – and AI’s promise is clear on that front.

  • Mass General Brigham reported a 40% reduction in burnout during a six-week pilot.
  • MultiCare reported a 63% reduction in burnout and a 64% improvement in work-life balance.
  • Another study from the Permanente Medical Group found that 81% of patients felt their physician spent less time looking at their computer when using an ambient scribe.

Despite these drastic improvements, PHTI concludes that the financial returns and efficiency of ambient scribes remain unclear.

  • On one hand, enhanced documentation quality “could lead to higher reimbursements, potentially offsetting expenses.”
  • On the other hand, the cumulative costs “may be greater than any savings achieved through improved efficiency, reduced administrative burden, or reduced clinician attrition.”

It’s a bold conclusion considering the cost of losing a single provider, let alone the downstream effects of having a burned out workforce. 

PHTI’s advice to health systems? Define the outcomes you’re looking for and then measure ambient AI’s performance and financial impacts against those goals. Bit of a no-brainer, but sound advice nonetheless. 

The Takeaway

Ambient scribes are seeing the fastest adoption of any recent healthcare technology that wasn’t accompanied by a regulatory mandate, and that’s mostly because of magic that’s hard to capture in a spreadsheet. That said, health systems will eventually need to justify these solutions beyond their impact on the clinical experience, and PHTI’s report brings a solid framework and standardized methodologies for bridging that gap.

AI Misses the Mark on Detecting Critical Conditions

Most health systems have already begun turning to AI to predict if patient health conditions will deteriorate, but a new study in Nature Communications Medicine suggests that current models aren’t cut out for the task. 

Virginia Tech researchers looked at several popular machine learning models cited in medical literature for predicting patient deterioration, then fed them datasets about the health of patients in ICUs or with cancer.

  • They then created test cases for the models to predict potential health issues and risk scores in the event that patient metrics were changed from the initial dataset.

AI missed the mark. For in-hospital mortality prediction, the models tested using the synthesized cases failed to recognize a staggering 66% of relevant patient injuries.

  • In some instances, the models failed to generate adequate mortality risk scores for every single test case.
  • That’s clearly not great news, especially considering that algorithms that can’t recognize critical patient conditions obviously can’t alert doctors when urgent action is needed.

The study authors point out that it’s extremely important for technology being used in patient care decisions to incorporate medical knowledge, and that “purely data-driven training alone is not sufficient.”

  • Not only did the study unearth “alarming deficiencies” in models being used for in-hospital mortality predictions, but it also turned up similar concerns with models predicting the prognosis of breast and lung cancer over five-year periods.
  • The authors conclude that a significant gap exists between raw data and the complexities of medical reality, so models trained solely on patient data are “grossly insufficient and have many dangerous blind spots.”

The Takeaway

The promise of AI remains just as immense as ever, but studies like this provide constant reminders that we need a diligent approach to adoption – not just for the technology itself but for the lives of the patients it touches. Ensuring that medical knowledge gets incorporated into clinical AI models also seems like a theme that we’re about to start hearing more often.

Get the top digital health stories right in your inbox

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Digital Health Wire team

You're all set!