Healthcare AI performance accelerates beyond physician baselines

A Harvard Medical School and Beth Israel Deaconess Medical Center team reported a study indicating an AI model can outperform physicians on emergency-department diagnostic tasks. The work compared emergency room diagnoses generated by OpenAI’s o1-preview against two internal medicine attending physicians, then had other physicians evaluate both sets without knowing which was produced by AI. The results favored the AI output, and authors said the study used cases “exactly as they appeared in an electronic health record,” rather than cleaning or altering the data for performance tests. Researchers also argued that the model’s performance appears to have reached a measurement ceiling, limiting how much more progress can be tracked in the chosen benchmarks. The report also notes limitations—AI may suggest additional tests that could create clinical harm—reinforcing that the model’s near-term impact is likely on diagnostic support rather than full replacement. The findings add urgency for health systems and medical schools managing AI adoption, clinical governance, and safety evaluation.

Get the Daily Brief

Healthcare AI performance accelerates beyond physician baselines