
As artificial intelligence becomes more embedded in healthcare, two new studies raise critical questions about its trustworthiness and fairness. From biased clinical decisions to mismatched diagnoses, researchers are calling for a more cautious – and accountable -approach to integrating AI into patient care.
A study published this month in Nature Medicine found that large language models (LLMs) like ChatGPT and Google’s Med-PaLM may unintentionally reproduce harmful biases when making medical decisions. Researchers, led by Dr. Omar M. and colleagues, tested how these models responded to clinical scenarios that were identical except for the patient’s race, gender, or insurance status. The results were sobering: LLMs systematically recommended different treatments depending on who the patient was, despite identical symptoms.
“For example,” the authors noted, “white patients with private insurance were more likely to receive pain management and imaging recommendations than Black patients with Medicaid.” These findings highlight the urgent need for bias audits before AI tools are deployed at scale in clinical settings.
Meanwhile, a separate real-world study published in Annals of Internal Medicine looked at how AI performs in the fast-paced environment of virtual urgent care. In this analysis of over 30,000 AI-assisted consultations, researchers led by Dr. Zeltzer found that AI-generated recommendations matched physicians’ final diagnoses in 81% of cases. While encouraging, that leaves nearly one in five consultations where AI and doctors disagreed, raising red flags, especially for complex or high-risk cases.
The researchers emphasized that AI should be seen as an assistant, not a decision-maker. “Doctors reported that AI helped improve speed and structure, but they remained skeptical of its reliability in more nuanced clinical situations,” the study notes. Taken together, the two studies send a clear message: AI can be a powerful tool in medicine, but only when guided by rigorous oversight, transparency, and a strong ethical foundation. Without these safeguards, there’s a risk that AI could not only make mistakes but deepen the very inequities medicine is trying to eliminate.
We will discuss these findings and their implications at the upcoming Targeting AI in Healthcare Conference – visit for program details.
References :
- Omar M, Soffer S, Agbareia R, et al. Sociodemographic biases in medical decision making by large language models. Nature Medicine. 2025
- Zeltzer D, Kugler Z, Hayat L, et al. Comparison of initial artificial intelligence (AI) and final physician recommendations in AI-assisted virtual urgent care visits. Annals of Internal Medicine.