New Microsoft AI Research Edges Towards 'Medical Superintelligence'

Microsoft AI has unveiled new research demonstrating AI’s abilities in sequential diagnostics—rivaling physicians in both accurate and cost-effective recommendations.

The company’s announcement on Monday morning was two-fold, introducing a benchmark to prove the performance of AI diagnostics tools against human experts, and a model-agnostic orchestrator to highlight AI’s capabilities when stacked against that benchmark.

Mufasa Suleyman, CEO of Microsoft AI, believes this research brings us one step closer to “medical superintelligence.”

“The simple way to understand medical superintelligence is that it is a model which is multiple times better than the best humans in the world, that has the breadth of all or most of the expert clinicians worldwide, combined with the depth of any given expert,” the AI pioneer told Newsweek in an exclusive interview ahead of the announcement.

The interactive Sequential Diagnosis Benchmark, or SDBench, took 304 complex cases from the New England Journal of Medicine (NEJM) clinicopathological conference, which are historically challenging to diagnose, and translated them into step-by-step diagnostic encounters that mimic clinical decision-making processes.

Each day, more than 50 million health-related searches are conducted across Microsoft’s AI consumer products, including Copilot, Bing, Edge and MSN. AI’s advancements in sequential diagnostics could lead to more helpful, accurate responses, Microsoft AI…

Cheng Xin/Getty Images

SDBench presents physicians or AI models with a short case abstract. Then, the human or model must ask questions and order tests to inform their diagnosis. A “gatekeeper model” reveals information only when explicitly asked for it. The final diagnosis is compared to the NEJM’s gold standard and assessed for both accuracy and cost.

Microsoft’s new model-agnostic MAI Diagnostic Orchestrator (MAI-DxO) achieved 85.5 percent diagnostic accuracy—outperforming generalist physicians, who reached the correct diagnosis 20 percent of the time, on average.

MAI-DxO also reduced diagnostic costs by 20 percent compared to physicians by ordering fewer expensive tests and reaching their clinical decisions more quickly.

Since MAI-DxO is model-agnostic, it can be generalized across models from the OpenAI, Gemini, Claude, Grok, DeepMind and Llama families, according to Microsoft.

Microsoft AI — This graphic from Microsoft’s research paper illustrates SDBench’s assessment process. Three agents orchestrate the “conversation” between SDBench and a human or AI model. Via the Diagnostic Agent (yellow), humans or AI models may ask questions…

Microsoft

The study has its limitations. Microsoft’s panel of 21 U.S. and U.K. doctors had a median of 12 years of experience but were not allowed to use search engines, language models or other sources of medical information when interacting with SDBench. These tools are common in physicians’ practices, with about 1 in 5 using generative AI and about 7 in 10 using search engines on a regular basis, according to recent research—so the human participants may have achieved higher diagnostic accuracy if allowed to access their typical suite of online resources.

Still, Microsoft’s team says the research “highlight[s] how AI systems, when guided to think iteratively and act judiciously, can advance both diagnostic precision and cost-effectiveness in clinical care.”

MAI-DxO has not been deployed into production, but its initial performance offers a glimpse of high potential. The tool was developed by Microsoft AI’s health effort, which launched quietly in late 2024 to create technology and conduct research that advances consumer health.

A team of clinicians, designers, engineers and AI scientists have been collaborating under Suleyman, Microsoft AI CEO and co-founder of DeepMind (the AI company acquired by Google in 2014 for $400 million). Dr. Dominic King, Microsoft AI’s heath vice president and a former lead at both Google DeepMind and Google Health, is also core to the work.

“Two things that we’re really proud of: creating a new benchmark for us to test the performance of AI against and showing that the orchestrator system that we created does stunningly well against that benchmark,” King told Newsweek. “This is certainly the most exciting thing I’ve ever been part of.”

Each day, more than 50 million health-related searches are conducted across Microsoft’s AI consumer products, including Copilot, Bing, Edge and MSN. Whether searching for a nearby urgent care center or attempting to make sense of a nagging headache, patients are increasingly turning to AI as a digital front door into the health system. There’s a lot of pressure on tech companies like Microsoft to ensure patients find helpful answers.

“We’ve got an AI called Copilot and people will come and talk about everything from their anxiety, to their child’s headache, to much more serious conditions they’re worried about,” Suleyman said. “These are sustained conversational interactions. Copilot can do a better job for these folks if it has good expertise in diagnostics.”

Microsoft AI’s research could also translate into gains for the health care industry, helping physicians reach an accurate diagnosis more quickly and with fewer expensive tests. Each year in the United States, 7.4 million people are misdiagnosed in emergency rooms, causing death or permanent disability in 1 in 350 patients, according to a 2023 study from the Agency for Healthcare Research and Quality. Plus, billions of dollars are spent on unnecessary tests—contributing to rising national health care costs and exacerbating terse relations between hospitals and insurance companies.

Now, Microsoft is working closely with health systems (it declined to share which) and clinicians to set up more trials and attempt to replicate MAI-DxO’s initial success.

“This is a very promising sign of the potential,” King said, “but we definitely see this as a multi-year journey that requires a lot of engagement across the health care system to get right.”

New Microsoft AI Research Edges Towards ‘Medical Superintelligence’

Leave a Reply Cancel reply