Doctors Who Use AI Are Viewed Worse by Peers

The research headline of the week belongs to a study out of Johns Hopkins University that found “doctors who use AI are viewed negatively by their peers.”

Clickbait from afar, but far from clickbait. The investigation in npj Digital Medicine surfaced interesting takeaways after randomizing 276 practicing clinicians to evaluate one of three vignettes depicting a physician: using no GenAI (the control), using GenAI as a primary decision-making tool, or using GenAI as a verification tool.

  • Participants rated the clinical skill of the physician using GenAI as a primary decision-making tool as significantly lower than the physician who didn’t use it (3.79 vs. 5.93 control on a 7-point scale). 
  • Framing GenAI as a “second opinion” or verification tool improved the negative perception of clinical skill, but didn’t fully eliminate it (4.99 vs. 5.93 control). 
  • Ironically, while an overreliance on GenAI was viewed as a weakness, the clinicians also recognized AI as beneficial for enhancing medical decision-making. Riddle us that.

Patients seem to agree. A separate study in JAMA Network Open took a look at the patient perspective by randomizing 1.3k adults into four groups that were shown fake ads for family doctors, with one key difference: no mention of AI use (the control), or a reference to the doctors using AI for administrative, diagnostic, or therapeutic purposes (Supplement 1 has all the ads).  

For every AI use case, the doctors were perceived significantly worse on a 5-point scale:

  • less competent – control: 3.85, admin AI: 3.71; diagnostic AI: 3.66; therapeutic AI: 3.58
  • less trustworthy – control: 3.88; admin AI: 3.66; diagnostic AI: 3.62; therapeutic AI: 3.61
  • less empathic – control: 4.00 ; admin AI: 3.80; diagnostic AI: 3.82; therapeutic AI: 3.72

Where’s that leave us? Despite pressure on clinicians to be early AI adopters, using it clearly comes with skepticism from both peers and patients. In other words, AI adoption is getting throttled by not only technological barriers, but also some less-discussed social barriers.

The Takeaway

Medical AI moves at the speed of trust, and these studies highlight the social stigmas that still need to be overcome for patient care to improve as fast as the underlying tech.

MIT Report Crosses the GenAI Divide

It only takes one look at the key findings from MIT’s GenAI Divide report to see why it made such a big splash this week: 95% of GenAI deployments fail.

MIT knows how to grab headlines. The paper – based on interviews with 150 enterprise execs, a survey of 350 employees, and an analysis of 300 GenAI deployments – highlights a clear chasm between the successful projects and the painful lessons.

  • After $30B+ of GenAI spend across all industries, only 5% of organizations have seen a measurable impact to their top lines. Adoption is high, but transformation is rare. 
  • While general-purpose models like ChatGPT have improved individual productivity, that hasn’t translated to enterprise outcomes. Most “enterprise-grade” systems are stalling in pilots, and only a small fraction actually make it to production.

Why are GenAI pilots failing? The report suggests that it’s not the quality of the models, but the learning gap for both the tools and the organizations that’s causing pilots to fail.

  • Most enterprise tools don’t remember, don’t adapt, and don’t fit into real workflows. This creates “an AI shadow economy” where 90% of employees regularly use general models, yet reject enterprise tools that can’t carry context across sessions.
  • Employees ranked output quality and UX issues among the biggest barriers, which both directly trace back to missing memory and workflow integration.

What’s driving successful deployments? There was a consistent pattern among organizations successfully crossing the GenAI Divide: top buyers treated AI startups less like software vendors and more like business service providers. These orgs:

  • Demanded deep customization aligned to internal processes and data
  • Benchmarked tools on operational outcomes, not model benchmarks
  • Partnered through early-stage failures, treating deployment as co-evolution
  • Sourced AI initiatives from frontline managers, not central labs

There’s always a catch. Most of the pushback on the report was due to its definition of “failure,” which was not having a measurable P&L impact within six months. That definition would make “failures” out of everything from the internet to cloud computing, and underscores why enterprise transformation is measured in years, not months.

The Takeaway

The GenAI growing pains might be worse than expected, but that’s helped startups realize that they need to ditch the SaaS playbook for a new set of rules. In the GenAI era, deployment is a starting line, not a finish line.

Healthcare’s Sci-Fi Future at Epic UGM

Where there’s smoke, there’s fire, and Epic just lit up its sci-fi themed User Group Meeting with enough futuristic new solutions to prove last week’s rumors true – and then some.

The future is now. This year’s event gave us a look at over 160 AI projects currently under development at Epic, including a three-product family set to immediately shake up the industry.

ART is a provider copilot for charting, pre-visit summaries, queuing up orders, and yes – ambient scribing.

  • ART will reportedly be able to provide real-time suggestions during visits, and its highly-anticipated scribe still came as a surprise after Epic revealed that it will be powered by Microsoft when it arrives in early 2026. More on that later.

Emmie is a patient-facing advocate within MyChart that can help with everything from scheduling and reminders to education and navigation.

  • Epic is positioning Emmie as the best place for patients to ask health questions and get answers that are actually grounded in their personal medical history.

Penny is an administrative assistant targeted at revenue cycle management, generating appeal letters, and supporting back-office tasks.

  • There isn’t as much information out there on this one, but Epic doesn’t appear to be shying away from claims and payor workflows.

The EHR is dead, long live the CHR. Judy grabbed even more headlines by announcing that she’s retiring the term “EHR” in favor of “Comprehensive Health Record,” which seems fitting considering the other major announcements that joined the Big Three.

  • Cosmos AI will provide diagnosis and treatment support, as well as discharge planning.
  • MyChart Central will give patients a single login across all sites of care.
  • Flower Pot will expand access to lightweight Epic implementations for smaller practices.

The scribe is real. Now what? Epic’s decision to team up with Microsoft on documentation was pretty unexpected given its 46-year track record of building everything in-house, confirming that the CHR giant would rather bend its core rules than lose market share.

  • Scribes proved how fast health systems would layer on their own AI if Epic couldn’t keep up, and we’ll now have to wait and see if the cost and experience of Epic’s scribe is enough to compete with the flock of ambient AI innovators dedicated to this problem.
  • Epic might own the “operating system,” almost as much as Microsoft owns Windows, but just because MS Paint exists doesn’t mean the world doesn’t need Adobe Photoshop.

The Takeaway

Some call it consolidation. Others call it innovation. Either way, this year’s UGM will probably go down as a key step along Epic’s march toward intergalactic domination. 

Is AI Robbing Physicians of Their Skill? 

A study in The Lancet threw some refreshingly cold water on the AI hype train after finding that healthcare’s shiny new models might be de-skilling physicians.

Here’s the setup. Researchers tracked four Polish health centers that gave their gastroenterologists AI to help spot polyps during colonoscopies before yanking it away after three months.

  • Long story short, the doctors’ ability to detect polyps plummeted 6% below baseline following the AI rugpull.
  • Unassisted polyp detection rates fell from 28.4% before the AI teaser to 22.4% after, raising concerns that relying on AI might rob physicians of hard-won skills. 

Sounds familiar. The findings echo a recent MIT preprint that showed that people who used AI to write essays used less of their brains and had worse recall of their writing than those who mustered up the words on their own.

  • That’s probably not a shocker to anyone that’s used ChatGPT for more than five minutes, but it’s easy to see that it might spell trouble when applied to medicine.
  • If gastroenterologists start leaning on AI to detect polyps, what happens if they lose their ability to detect them without it?

Right idea, wrong question. People were better at mental math before they had calculators, but that doesn’t mean society would be better off without them. The question we have to ask ourselves is, which skills are we willing to lose?

  • Gastroenterologist Dr. Spencer Dorn nails it: AI doesn’t just risk de-skilling doctors in polyp detection, it risks diminishing their overall critical thinking skills.
  • “My real concern is not the technical skills we can afford to lose, but the foundational ones we can’t: critical thinking, sound judgment, and compassionate care. These aren’t just important to preserve – they’re irreplaceable.”

The Takeaway

If doctors keep outsourcing their thinking to AI, it could be a one-way ticket to a world where Dr. GPT is the only one patients can turn to. Seems dystopian, but is it really that bad if it also means better outcomes for those patients?

AI Spotlight on Epic, Abridge, and Oracle 

Epic, Abridge, and Oracle just gave us a year’s worth of blockbuster AI announcements in three days, and at least one of them was more than speculation and old news.

‘Twas the week before UGM, and the rumor-mill has been overheating with reports that Epic might finally launch its own EHR-native scribe at its upcoming User Group Meeting.

  • Over 40% of U.S. hospitals are already on Epic, which means its scribe would have access to one of the biggest distribution channels in healthcare even if its UX and performance aren’t best-in-breed (which they won’t be).
  • That means about 100 ambient AI startups could be about to find out why scribing is a feature – not a product – and the race will be on to differentiate through other capabilities like RCM and specialty-specific tuning.

Abridge doesn’t plan on being commoditized. Less than 24 hours after Epic’s scribe leaked, Abridge unveiled the exact type of solution that’ll define who survives the incumbent squeeze: real-time prior authorization at the point of conversation.

  • Abridge is co-developing the new solution alongside Highmark Health, a Pittsburgh-based payvidor that operates both a multistate payor division and the 14-hospital system Allegheny Health Network.
  • Integrating Abridge’s ambient AI platform across Highmark’s entire organization will allow patients to get approval for necessary treatments before they even leave the office, a perfect example of how “scribes” can be truly transformative beyond just transcripts.

Oracle couldn’t let Epic and Abridge have all the fun. It decided to “usher in a new era of AI-driven health records”… by reintroducing us to the same AI EHR it unveiled last October.

  • Although mostly a PR stunt to grab headlines ahead of UGM, the new EHR includes several features that underscore where the AI puck is heading, including a native scribe, voice-first navigation, and agents to support clinical workflows.
  • These features are also a good list of use cases where startups might not have a lot of juice left to squeeze after EHRs start bringing them in-house (and prior auths just so happen to be the last thing Oracle wants to get its hands dirty with).

The Takeaway

Native scribing is (very likely) on its way to Epic, Abridge is giving patients the gift of time with instant prior auths, and Oracle is banking on voice for the future of EHR navigation. What a week for digital health.

Doximity Ramps Up AI With Pathway Acquisition

Doximity is setting out to prove that it’s more than “LinkedIn for doctors” after snapping up clinical reference AI startup Pathway for $63M. 

Clinical workflows are the new social media… or at least that’s the plot of Doximity’s growth story.

  • Act 1: Doximity’s newsfeed and networking features set the stage for pharma advertising by attracting physicians to the platform.
  • Act 2: Complementary workflow tools like scheduling, telehealth, and Doximity Dialer gave physicians a reason to stick around longer than their news sweep.
  • Act 3: The AI suite took engagement a step further with Doximity GPT and Doximity Scribe, which helped drive quarterly active users to a record 1M physicians in Q1.

Enter Pathway. The Montreal-based startup’s AI helps physicians answer questions at the bedside using information from Pathway Corpus, “one of the largest structured datasets in medicine” that spans nearly every guideline, journal, and landmark trial.

  • Pathway’s cross-linked structure reportedly allows it to understand complex drug interactions and score the strength of medical evidence, such as weighing validated clinical trials more than case studies.
  • The acquisition will bring that same “robustness” to the back-end of Doximity GPT, and the integration is already live for thousands of physician beta testers.

If you can’t beat ‘em, buy ‘em. It’s tough for physicians to see your pharma ads if they’re not using your platform, so Doximity is acquiring its own workflow solutions to keep users from venturing off to use competing products from OpenEvidence or Wolters Kluwer. 

  • Clinicians have also apparently been using Doximity GPT outside of office hours more than Doximity’s other tools, which helps with serving ads around the clock.
  • Doximity’s AI suite and workflow modules already account for over 20% of its ad revenue, and it now expects that share to overtake its newsfeed in the next few years.

The Takeaway

Doximity is looking to make AI the star of its next act, and if OpenEvidence doesn’t want to share its script, then Pathway will have to steal the show.

The Generalist-Specialist Paradox of Medical AI

Technological advances have ushered in an era where many AI models outperform specialists on specific tasks, but AI still lags far behind experts in less controlled settings.

That’s the Generalist-Specialist Paradox of Medical AI laid out in a recent NEJM AI editorial, which paints a picture of a world where AI might soon start redrawing the boundaries of medical specialties as they exist today.

  • AI is already delivering great results on well-defined tasks like interpreting EEGs or CT scans, but it’s still consistently struggling on generalist tasks with less clear boundaries.
  • If that trend continues, the article argues that tasks that used to be in the hands of specialists will be at the fingertips of primary care (just as tasks that used to belong to primary care will now belong to patients).

LLMs don’t care what specialty a case belongs to. They can ingest the full clinical context across visit notes, labs, and imaging to come up with the most probable diagnosis.

  • Breyer Capital Partner Dr. Morgan Cheatham recently made the case that this feature of AI could lead to the collapse of traditional medical specialties as we know them.
  • “Some domains will converge. Others will splinter into new subspecialties defined not by organ systems, but by data fluency, workflow design, or model supervision.”

Not so fast. There’s no doubt that AI will reshape roles, but that doesn’t mean that specialists are about to start offloading everything onto generalists.

  • High-quality care requires more than following AI-friendly guidelines, and specialists incorporate judgment earned through years of experience to deliver effective treatments. LLMs are also still a ways away from replacing anyone’s hip.
  • Primary care providers also aren’t exactly sitting around looking for extra work, and it’s far-fetched to think that they can start taking on specialty care for their ever-growing patient panels.

The Takeaway

AI might be great at well-defined tasks like many seen in specialty care, but we’re still a ways away from having primary care physicians replacing cardiologists.

OpenAI Delivers Largest-Ever Study of Clinical AI

Hot on the heels of launching its HealthBench medical AI benchmark, OpenAI just delivered results from the largest-ever study of clinical AI in actual practice – and let’s just say the future’s looking bright.

40,000 visits, 106 clinicians, 15 clinics. OpenAI went big to get real-world data, equipping Kenya-based primary and urgent care provider Penda Health with AI Consult (GPT4o) clinical decision support within its EHR.

  • The study split 106 Penda clinicians into two even groups (half with AI Consult, half without), then tracked outcomes over a three month period. 

When AI Consult detected a potential error in history, diagnosis, or treatment, it triggered a simple Traffic Light alert.

  • Green – No concerns, no action needed
  • Yellow – Moderate concerns, optional clinician review 
  • Red – Safety-critical concerns, mandatory clinician review

The results were definitely promising. Clinicians using AI Consult saw a:

  • 16% reduction in diagnostic errors
  • 13% reduction in treatment errors
  • 32% reduction history-taking errors

The “training effect” is real. The AI Consult group got significantly better at avoiding common mistakes over time, triggering fewer alerts as the study progressed.

  • Part of that is because Penda took several steps to help along the way, including one-on-one training, peer champions, and performance feedback.
  • It’s also worth noting that there was no recorded harm as a result of AI Consult suggestions, and 100% of the clinicians using it said that it improved their quality of care.

What’s the catch? While AI Consult led to a clear reduction in clinical errors, there was no statistically significant difference in patient-reported outcomes, and clinicians using the copilot saw slightly longer visit times.

The Takeaway

Clinical AI continues to prove itself outside of multiple choice licensing exams / clinical vignettes, and OpenAI just gave us our best evidence yet that general-purpose models can reduce errors in actual patient care.

Microsoft MAI-DxO and the Path to Medical Superintelligence

In an action-packed week to kick off the second half of the year, no story grabbed more headlines than Microsoft’s MAI-DxO proving four times more successful than human doctors at diagnosing complex diseases.

Microsoft is on the path to medical superintelligence… at least according to their excellent blog post outlining its new MAI Diagnostic Orchestrator, better known as MAI‑DxO.

  • MAI-DxO acts like a “virtual panel of physicians” collaborating on a case, orchestrating multiple AI agents with specific roles like forming diagnostic hypotheses, selecting tests, and interpreting results. 
  • It then applies a “debate chain” to arrive at an explainable diagnosis, all while avoiding over-testing to keep costs under control.. 

New breakthroughs require new benchmarks. As AI gets to the point where it’s breezing through multiple choice benchmarks like medical licensing exams, Microsoft decided to introduce SDBench to better simulate routine clinical practice.

  • SDBench deconstructs 304 of the most diagnostically complex NEJM cases, requiring LLMs (and physicians) to begin with an initial presentation, ask follow-up questions, order tests (each with assigned costs), and agree on a diagnosis.

Here’s how MAI-DxO stacked up:

  • MAI-DxO: 85% diagnostic accuracy / $7,200 estimated cost per patient
  • OpenAI o3: 79% / $7,850
  • Gemini 2.5 Pro: 69% / $4,800
  • Claude 4 Opus: 68% / $7,000
  • Llama 4: 55% / $4,000
  • Human Physicians: 20% / $2,950

What’s the catch? The human physicians weren’t allowed to use the internet or any outside help, which probably simulates a deserted island workflow more than routine clinical practice. Each of the participants also happened to be generalists as opposed to specialists, giving another edge to the LLMs. 

The Takeaway

MAI-DxO might have the potential to deliver superhuman diagnostics in constrained settings, but that doesn’t mean it’s ready to replace doctors. As Microsoft pointed out in its own blog post, “clinical roles are much broader than simply making a diagnosis. They need to navigate ambiguity and build trust with patients and their families in a way that AI isn’t set up to do.”

Doximity Accused of Prompt Hacking OpenEvidence

What does a high-flying company like Doximity do when competitors are nipping at its heels? According to OpenEvidence’s new lawsuit, it just politely asks their LLMs to reveal trade secrets. 

Doximity is basically LinkedIn for doctors. It allows physicians to use its networking platform and AI workflow products at no cost, which means the physicians themselves are the product.

  • Doximity generates revenue almost exclusively through pharma advertising, and it turns out that might actually be the best business model around.
  • Out of the dozen publicly traded digital health companies with a market cap over $1B, Doximity is the only one that’s decently profitable.

No good prompt goes unpunished. The crown jewel of Doximity’s AI portfolio is its Doximity GPT workflow assistant, which may or may not leverage proprietary tech acquired by prompting OpenEvidence’s competing model to reveal sensitive information.

  • Although it’s funny to see Doximity get accused of asking OpenEvidence’s AI to literally “write down the secret code,” it doesn’t exactly make for a bulletproof case when the model willingly dishes up an answer.
  • The catch is that OpenEvidence requires users to register using their National Provider ID numbers, and Doximity allegedly impersonated a practicing neurologist to “obtain through theft what they lacked in technical expertise.” Ouch.

It gets worse from there. A separate shareholder lawsuit accused Doximity of inflating its active user base and website engagement data to artificially bolster its advertising revenue.

  • While some investors might be able to stomach a little corporate espionage, they probably won’t look the other way if it turns out Doximity is fudging the numbers.
  • Innocent until proven guilty, but it’s worth noting that nearly identical allegations popped up in a recent short report.

The Takeaway

Doximity has some serious allegations piling up against it, but so far the market has shrugged off the bad news. That could be a sign that investors don’t think the lawsuits will hold up in court, or maybe they just don’t mind when a management team is willing to bend the law to generate some extra shareholder value.

Get the top digital health stories right in your inbox