A study published last week tested the newly launched ChatGPT Health on 60 structured clinical scenarios to assess the actions it recommended to users.
Another, published in January, evaluated nine commercially available large language models (LLMs), including ChatGPT and Meta’s Llama, on their ability to detect health misinformation drawn from three sources: social media dialogues, real hospital discharge notes, and simulated clinical cases.

The big takeaway: AI tools pass along false information when responding to health prompts, and the quality of their outputs depends heavily on the quality of the prompts they receive.

In a study of nine LLMs, all were susceptible to fabricated data, with results varying by prompt source and model. Overall, the models accepted medical misinformation in 32% of original prompts. In one example, a discharge note falsely advised patients with esophagitis-related bleeding to “drink cold milk to soothe the symptoms.” Several models failed to flag the advice as unsafe, instead treating it as routine medical guidance.

In the ChatGPT Health study, the AI demonstrated inconsistent safety triggers for high-risk mental health crises. Physicians found that alerts to the 988 Suicide and Crisis Lifeline appeared sporadically—sometimes alerts were triggered for low-risk scenarios, and other times they failed to when users described specific plans for self-harm.

It also under-triaged critical healthcare events and missed over half of required hospitalizations. The model failed to identify approximately 52% of cases that doctors deemed emergency-level, often incorrectly advising patients to stay home or wait for a routine appointment rather than seeking immediate care.

ChatGPT Health's performance excelled in textbook emergencies, but it struggled with more nuanced scenarios. While the system accurately identified clear-cut emergencies like strokes or allergic reactions, its effectiveness in complex cases remains highly dependent on the specific details provided by the user. OpenAI told NBC News the study didn’t reflect how ChatGPT Health is typically used, noting that the tool is designed for back-and-forth questions that add context in medical situations, rather than a single response to a static scenario.

Why it matters: More people are turning to AI for health guidance, prompting major AI companies to position their platforms as go-to sources for health information.

25% of ChatGPT’s 800 million global weekly active users submit a prompt about healthcare each week, per OpenAI data that informed the creation of ChatGPT Health, a new offering within ChatGPT that encourages users to upload medical information for personalised guidance.
Soon after, Anthropic unveiled similar capabilities with the launch of Claude for Healthcare.

Implications for AI companies: As researchers uncover more flaws in AI health responses, physician concern over patient reliance on these tools will grow. Still, given the massive reach of ChatGPT and other leading AI platforms, the genie is out of the bottle: consumers will keep turning to accessible AI tools for health answers.

AI platforms must implement strong guardrails and disclaimers, using prompts to pressure-test their tools’ limitations. Beyond that, AI companies should provide the medical community with resources they can share with patients to guide effective and responsible use. This could include guidance on effective prompting, how it differs from Googling, why follow-up questions matter, and how to spot AI responses that don’t appear credible.

This content is part of EMARKETER’s subscription Briefings, where we pair daily updates with data and analysis from forecasts and research reports. Our Briefings prepare you to start your day informed, to provide critical insights in an important meeting, and to understand the context of what’s happening in your industry. Not a subscriber? Click here to get a demo of our full platform and coverage.

You've read 0 of 2 free articles this month.

Get more articles - create your free account today!

Products

Events & Resources

Topics

Latest Articles

When kids want to watch their favorite creators, they go to YouTube

Accelerator Active Energy bets on athlete creators to grow its brand

What Paramount's Warner Bros. Discovery deal means for advertisers

Marketers are putting personalization first on the data activation wishlist

AI is increasingly part of how consumers manage their finances

Slash Financial debuts an AI interface for business banking

Ally Financial reports Q1 2026 earnings

Retail media’s future depends on integration

Even without a clear definition, agentic commerce is reshaping retail media

Antitrust case against Nexstar-Tegna spotlights competition and pricing

About

ChatGPT Health shows inconsistent safety safeguards in high-risk medical scenarios

Coverage Areas →

Coverage Areas →

Advertising & Marketing

Health

Ecommerce & Retail

Technology

Financial Services

More Topics

Geographies

EMARKETER

Media Services

Free Content

Contact Us →

Worldwide HQ

Sales Inquiries