Microsoft has developed a diagnostic system called MAI-DxO (Microsoft AI Diagnostic Orchestrator) that significantly outperforms human doctors in diagnosing complex clinical cases. The tool combines responses from several leading large language models — GPT-4, Gemini, Claude, Llama, Grok, and DeepSeek — to simulate a multi-doctor consultative approach.
The system was evaluated on 304 cases from the New England Journal of Medicine, a set of scenarios typically used for medical education due to their complexity. A panel of 21 experienced physicians from the US and UK was asked to diagnose the same cases, step by step, without access to external tools.
MAI-DxO achieved an 85.5 percent accuracy rate, compared with 20 percent for the physicians. It also reduced diagnostic costs by approximately 20 percent, primarily by selecting fewer and less expensive tests.
Dominic King, vice president at Microsoft, says the system, “performs incredibly well, both getting to the diagnosis and getting to that diagnosis very cost effectively.”
Mustafa Suleyman, CEO of Microsoft AI, describes the system’s design as a breakthrough, arguing that, “this orchestration mechanism—multiple agents that work together in this chain-of-debate style—that’s what’s going to drive us closer to medical superintelligence.”
The approach involves having multiple AI agents reason together through each case, mimicking the behavior of a team of physicians debating diagnoses in real time.
Optimism tempered by caution
The findings have sparked discussion across the medical and AI sectors. Conor Grennan, Chief AI Architect at NYU Stern School of Business, shares his reaction on LinkedIn, calling the results “bonkers” and writing: “Microsoft just announced a breakthrough in medical AI with its new Microsoft AI Diagnostic Orchestrator—outperforming experienced human doctors by a wide margin in diagnosing complex medical cases.”
Grennan highlights the design of the system, noting: “Their new tool, called MAI-DxO, just let the models talk to each other and work together, like a group of doctors would.” He adds that the AI “ordered fewer and less expensive tests, reducing overall diagnostic costs by about 20% compared to the doctors.”
Clinical availability remains distant
MAI-DxO is not yet approved for clinical deployment and remains in a research phase. Microsoft has not confirmed whether it will commercialize the system, though integration into products like Bing or clinical decision-support tools remains under review.
Suleyman says Microsoft plans further trials: “What you’ll see over the next couple of years is us doing more and more work proving these systems out in the real world.”
For now, the system marks a significant development in AI-assisted diagnostics. Whether it can deliver the same results in a hospital setting, where variables like patient communication, ethics, and institutional logistics come into play, remains to be seen.
RTIH AI in Retail Awards
link

.png?format=1500w)