The news: A federal judge gave Anthropic the green light this week to train its AI on millions of pirated books—declaring it “fair use”—even as the company used methods that resembled digital looting, per CNET.
Anthropic pirated millions of books from illegal online repositories like Library Genesis (LibGen) and Pirate Library Mirror, then it bought and destroyed physical copies of those same books to build what it calls a “research library.” While the online content gathering and book destruction were declared legal, the judge ordered a new trial on the library aspect of the case.
The balance is tipping in favor of AI companies: Judge William Alsup’s decision was the first in favor of an AI company for fair use.
Anthropic’s fair use argument, which allows people and companies to use protected content without the rights holders’ approval, is what AI and tech companies are leaning on to train AI models with impunity.
- Anthropic’s precedent could seems to have informed a similar ruling in favor of Meta where a judged ruled against authors suing for copyright infringement.
- These two successive rulings in favor of Big Tech signals to other AI companies that when it comes to training data, it’s better to beg forgiveness than ask permission.
- Microsoft is now facing a lawsuit from authors claiming it used 200,000 pirated books, to train its Megatron AI model, previous rulings could mean it, too, can get off the hook for the practice.
As the AI arms race heats up, companies are burning through legitimate data sources and turning to the same underground repositories—raising red flags for regulators, creators, and clients alike.
What this means for authors and content creators: By classifying Anthropic’s scraping of millions of pirated books as fair use, the ruling opens the door for AI firms to mine creative work without consent or compensation.
Our take: 24% of marketers say copyright concerns will be a big challenge for generative AI (genAI) in the next two years, per Econsultancy. To protect themselves, agencies need to pressure-test genAI vendors and ask tough questions about how models were trained—before the lawsuits land on their desks.