Artificial Intelligence

Microsoft MAI-DxO and the Path to Medical Superintelligence

In an action-packed week to kick off the second half of the year, no story grabbed more headlines than Microsoft’s MAI-DxO proving four times more successful than human doctors at diagnosing complex diseases.

Microsoft is on the path to medical superintelligence… at least according to their excellent blog post outlining its new MAI Diagnostic Orchestrator, better known as MAI‑DxO.

  • MAI-DxO acts like a “virtual panel of physicians” collaborating on a case, orchestrating multiple AI agents with specific roles like forming diagnostic hypotheses, selecting tests, and interpreting results. 
  • It then applies a “debate chain” to arrive at an explainable diagnosis, all while avoiding over-testing to keep costs under control.. 

New breakthroughs require new benchmarks. As AI gets to the point where it’s breezing through multiple choice benchmarks like medical licensing exams, Microsoft decided to introduce SDBench to better simulate routine clinical practice.

  • SDBench deconstructs 304 of the most diagnostically complex NEJM cases, requiring LLMs (and physicians) to begin with an initial presentation, ask follow-up questions, order tests (each with assigned costs), and agree on a diagnosis.

Here’s how MAI-DxO stacked up:

  • MAI-DxO: 85% diagnostic accuracy / $7,200 estimated cost per patient
  • OpenAI o3: 79% / $7,850
  • Gemini 2.5 Pro: 69% / $4,800
  • Claude 4 Opus: 68% / $7,000
  • Llama 4: 55% / $4,000
  • Human Physicians: 20% / $2,950

What’s the catch? The human physicians weren’t allowed to use the internet or any outside help, which probably simulates a deserted island workflow more than routine clinical practice. Each of the participants also happened to be generalists as opposed to specialists, giving another edge to the LLMs. 

The Takeaway

MAI-DxO might have the potential to deliver superhuman diagnostics in constrained settings, but that doesn’t mean it’s ready to replace doctors. As Microsoft pointed out in its own blog post, “clinical roles are much broader than simply making a diagnosis. They need to navigate ambiguity and build trust with patients and their families in a way that AI isn’t set up to do.”

Get the top digital health stories right in your inbox

You might also like

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Digital Health Wire team

You're all set!