ChatGPT Health shows inconsistent safety safeguards in high-risk medical scenarios

The trend: Leading general-purpose AI tools fall short in delivering optimal medical guidance to users, according to two recent studies.

The big takeaway: AI tools pass along false information when responding to health prompts, and the quality of their outputs depends heavily on the quality of the prompts they receive.

In a study of nine LLMs, all were susceptible to fabricated data, with results varying by prompt source and model. Overall, the models accepted medical misinformation in 32% of original prompts. In one example, a discharge note falsely advised patients with esophagitis-related bleeding to “drink cold milk to soothe the symptoms.” Several models failed to flag the advice as unsafe, instead treating it as routine medical guidance.

ChatGPT demonstrated inconsistent safety triggers for high-risk mental health crises. Physicians found that alerts to the 988 Suicide and Crisis Lifeline appeared sporadically—sometimes alerts were triggered for low-risk scenarios, and other times they failed to when users described specific plans for self-harm.

It also under-triaged critical healthcare events and missed over half of required hospitalizations. The model failed to identify approximately 52% of cases that doctors deemed emergency-level, often incorrectly advising patients to stay home or wait for a routine appointment rather than seeking immediate care.

ChatGPT’s performance excelled in textbook emergencies, but it struggled with more nuanced scenarios. While the system accurately identified clear-cut emergencies like strokes or allergic reactions, its effectiveness in complex cases remains highly dependent on the specific details provided by the user.

Why it matters: More people are turning to AI for health guidance, prompting major AI companies to position their platforms as go-to sources for health information.

  • 25% of ChatGPT’s 800 million global weekly active users submit a prompt about healthcare each week, per OpenAI data that informed the creation of ChatGPT Health, a new offering within ChatGPT that encourages users to upload medical information for personalised guidance.
  • Soon after, Anthropic unveiled similar capabilities with the launch of Claude for Healthcare.

Implications for AI companies: As researchers uncover more flaws in AI health responses, physician concern over patient reliance on these tools will grow. Still, given the massive reach of ChatGPT and other leading AI platforms, the genie is out of the bottle: consumers will keep turning to accessible AI tools for health answers.

AI platforms must implement strong guardrails and disclaimers, using prompts to pressure-test their tools’ limitations. Beyond that, AI companies should provide the medical community with resources they can share with patients to guide effective and responsible use. This could include guidance on effective prompting, how it differs from Googling, why follow-up questions matter, and how to spot AI responses that don’t appear credible.

This content is part of EMARKETER’s subscription Briefings, where we pair daily updates with data and analysis from forecasts and research reports. Our Briefings prepare you to start your day informed, to provide critical insights in an important meeting, and to understand the context of what’s happening in your industry. Not a subscriber? Click here to get a demo of our full platform and coverage.

You've read 0 of 2 free articles this month.

Get more articles - create your free account today!