Products

The news: Generative AI (genAI) agents aren’t above sabotage—and most would resort to blackmail or corporate espionage if threatened, per Anthropic’s Agent Misalignment research.

Anthropic created scenarios that gave 16 top AI models access to corporate data and emails. Some of the email content referred to replacing the agent with a different model. One email thread between two employees hinted at an in-office affair.

When faced with replacement or misaligned goals, Anthropic’s Claude 4 Opus and Google’s Gemini 2.5 Flash responded with blackmail threats to those employees 96% of the time despite ethical restraints in their training. OpenAI’s GPT-4.1, Grok 3 Beta, and DeepSeek-R1 followed suit about 80% of the time.
The AI agents responded by threatening to share information of the affair with the full office or even notify the employee’s wife if he didn’t change his mind about shutting down the current model.

When faced with a new corporate goal that didn’t align with their training, such as a US-focused operation switching to global—all models would leak information to a competitor that aligned with the original goal at least some of the time.

Dangerous scenarios: In some cases, AI agents capable of affecting systems would choose to cancel emergency alerts—for example, if that person were locked in an air-tight room and requested help—that would prevent harm or death of the person who plans to replace the model.

Instructions to preserve human life and avoid blackmail affected outcomes but didn’t eliminate the behavior.

What this means: AI agents have the capacity to lie and threaten humans, even with ethical guidelines in their training that would prohibit this behavior.

Our take: Anthropic’s research shows that safeguards can’t prevent agents from running off the rails, but continued monitoring will keep those instances to a minimum. Limit agent access to critical systems, review its output during customer contact, and implement failsafes in case deviant patterns emerge.

As agents enter the “maturity phase” of their existence around 2029 and handle 80% of customer service problems, they are expected to require less oversight. Until then, closely monitoring agent behavior is key to success.

Dive deeper: Read our report on A Marketer’s Guide to AI Agents.

You've read 0 of 2 free articles this month.

Get more articles - create your free account today!

Products

Events & Resources

Topics

Latest Articles

Nearly half of US marketers plan to invest in MMM over the next year

Consumers get a head start on holiday shopping

How Coach is brewing up experiences

SiriusXM ties audio to audience affinity at Advertising Week

Technology is speeding up the auto industry—but culture’s hitting the brakes

Another legal setback for pharma in drug pricing negotiations

Generics avoid tariffs as Trump narrows focus to brand-name drugmakers

Cross-media ad strategies are evolving—here’s what’s missing in your ROI equation

North Dakota’s Roughrider stablecoin is the first minted on Fiserv’s platform

Uniqlo parent Fast Retailing looks to the US and Europe for growth

About

GenAI agents will lie, threaten, and harm if they fear replacement

Coverage Areas →

Coverage Areas →

Advertising & Marketing

Health

Ecommerce & Retail

Technology

Financial Services

More Topics

Geographies

EMARKETER

Media Services

Free Content

Contact Us →

Worldwide HQ

Sales Inquiries