Events & Resources

Learning Center
Read through guides, explore resource hubs, and sample our coverage.
Learn More
Events
Register for an upcoming webinar and track which industry events our analysts attend.
Learn More
Podcasts
Listen to our podcast, Behind the Numbers for the latest news and insights.
Learn More

About

Our Story
Learn more about our mission and how EMARKETER came to be.
Learn More
Our Clients
Key decision-makers share why they find EMARKETER so critical.
Learn More
Our People
Take a look into our corporate culture and view our open roles.
Join the Team
Our Methodology
Rigorous proprietary data vetting strips biases and produces superior insights.
Learn More
Newsroom
See our latest press releases, news articles or download our press kit.
Learn More
Contact Us
Speak to a member of our team to learn more about EMARKETER.
Contact Us

Wikipedia opens AI training data set to deter web scrapers

The news: Wikimedia is setting up a data-set-sharing program to discourage web crawlers from scraping Wikipedia, offering an ethical and time-saving route for smaller AI companies, research organizations, and developers to use Wikipedia’s information.

It launched a collection of stripped-down Wikipedia data for AI developers, which is housed on a Google-owned data science platform called Kaggle.

  • Wikipedia’s parent company said the open-license data set can be used for model development, benchmarking, and alignment, without using web crawlers to pull information directly from articles.

You've read 0 of 2 free articles this month.

Create an account for uninterrupted access to select articles.
Create a Free Account