The news: Wikimedia is setting up a data-set-sharing program to discourage web crawlers from scraping Wikipedia, offering an ethical and time-saving route for smaller AI companies, research organizations, and developers to use Wikipedia’s information.

It launched a collection of stripped-down Wikipedia data for AI developers, which is housed on a Google-owned data science platform called Kaggle.

Wikipedia’s parent company said the open-license data set can be used for model development, benchmarking, and alignment, without using web crawlers to pull information directly from articles.

You've read 0 of 2 free articles this month.

Get more articles - create your free account today!

Products

Events & Resources

Topics

Latest Articles

Finance apps keep users engaged longer than any other industry

AI models are reshaping brand visibility and showing up isn't enough

World Cup 2026 faces economic hurdles despite record scale and viewership surge

Under Armour bets on culture to grow its women’s business

The top healthcare and pharma stories in the first half of 2026

Fintech success is reshaping how JPMorgan innovates

Consumers with trusted providers are less likely to believe vaccine myths

43% of podcast listeners tune in more in summer, boosting ad reach

Nicotine pouch marketing gets new FDA leeway, but may confuse consumers

The creator’s advantage in an AI-driven world

About

Wikipedia opens AI training data set to deter web scrapers

Coverage Areas →

Coverage Areas →

Advertising & Marketing

Health

Ecommerce & Retail

Technology

Financial Services

More Topics

Geographies

EMARKETER

Media Services

Free Content

Contact Us →

Worldwide HQ

Sales Inquiries