Events & Resources

Learning Center
Read through guides, explore resource hubs, and sample our coverage.
Learn More
Events
Register for an upcoming webinar and track which industry events our analysts attend.
Learn More
Podcasts
Listen to our podcast, Behind the Numbers for the latest news and insights.
Learn More

About

Our Story
Learn more about our mission and how EMARKETER came to be.
Learn More
Our Clients
Key decision-makers share why they find EMARKETER so critical.
Learn More
Our People
Take a look into our corporate culture and view our open roles.
Join the Team
Our Methodology
Rigorous proprietary data vetting strips biases and produces superior insights.
Learn More
Newsroom
See our latest press releases, news articles or download our press kit.
Learn More
Contact Us
Speak to a member of our team to learn more about EMARKETER.
Contact Us

GenAI agents will lie, threaten, and harm if they fear replacement

The news: Generative AI (genAI) agents aren’t above sabotage—and most would resort to blackmail or corporate espionage if threatened, per Anthropic’s Agent Misalignment research.

Anthropic created scenarios that gave 16 top AI models access to corporate data and emails. Some of the email content referred to replacing the agent with a different model. One email thread between two employees hinted at an in-office affair.

  • When faced with replacement or misaligned goals, Anthropic’s Claude 4 Opus and Google’s Gemini 2.5 Flash responded with blackmail threats to those employees 96% of the time despite ethical restraints in their training. OpenAI’s GPT-4.1, Grok 3 Beta, and DeepSeek-R1 followed suit about 80% of the time.
  • The AI agents responded by threatening to share information of the affair with the full office or even notify the employee’s wife if he didn’t change his mind about shutting down the current model.

When faced with a new corporate goal that didn’t align with their training, such as a US-focused operation switching to global—all models would leak information to a competitor that aligned with the original goal at least some of the time.

Dangerous scenarios: In some cases, AI agents capable of affecting systems would choose to cancel emergency alerts—for example, if that person were locked in an air-tight room and requested help—that would prevent harm or death of the person who plans to replace the model.

Instructions to preserve human life and avoid blackmail affected outcomes but didn’t eliminate the behavior.

What this means: AI agents have the capacity to lie and threaten humans, even with ethical guidelines in their training that would prohibit this behavior.

Our take: Anthropic’s research shows that safeguards can’t prevent agents from running off the rails, but continued monitoring will keep those instances to a minimum. Limit agent access to critical systems, review its output during customer contact, and implement failsafes in case deviant patterns emerge.

As agents enter the “maturity phase” of their existence around 2029 and handle 80% of customer service problems, they are expected to require less oversight. Until then, closely monitoring agent behavior is key to success.

Dive deeper: Read our report on A Marketer’s Guide to AI Agents.

You've read 0 of 2 free articles this month.

Get more articles - create your free account today!