The news: Reddit is suing Perplexity and data-scraping companies Oxylabs UAB, AWMProxy, and SerpApi, highlighting the battle over user-generated content (UGC) in the race to build the top genAI models.
- Reddit states in court documents that the three data scraping companies illegally pulled content from its website without permission through Google Search results to sell it, per Bloomberg.
- The lawsuit also states that Perplexity has been buying that data from at least one of the companies.
The pressure to collect quality human content is creating an “industrial-scale ‘data laundering’ economy,” per Reddit chief legal officer Ben Lee. He added that Reddit is a prime target due to its massive, dynamic collection of UGC.
Perplexity said its “approach remains principled and responsible,” per The New York Times, and that the company won’t tolerate threats against openness and the public interest.
Why it matters: Cases like this could redefine how AI firms access and value online content, including original UGC and brand-owned material. Publishers’ crackdown on unpermitted access to user content could alter the supply of freely available data for training and refining AI models.
Marketers may need to rethink data-sourcing strategies by leaning more on first-party data and establishing direct content partnerships with publishers and platforms.
Zooming out: Beyond lawsuits, publishers have limited tools to protect their content from data scraping. While robots.txt code can be added to websites to tell bots what information they can and cannot scrape, it’s not legally binding.
What marketers should do: To navigate an increasingly complex landscape for information sourcing, marketers should:
- Diversify reliance on genAI tools—across ChatGPT, Gemini, Anthropic, and others—to prevent operational slowdowns if some models lag in capability due to restricted training data.
- Explore AI partners that offer legal indemnification clauses to ensure that a brand isn’t at legal risk if a provider errs by scraping copyrighted information.
This content is part of EMARKETER’s subscription Briefings, where we pair daily updates with data and analysis from forecasts and research reports. Our Briefings prepare you to start your day informed, to provide critical insights in an important meeting, and to understand the context of what’s happening in your industry. Non-clients can click here to get a demo of our full platform and coverage.