This event has ended. Create your own event → Check it out
This event has ended. Create your own
View analytic
Friday, April 24 • 2:20pm - 3:00pm
A Web Worth of Data: Common Crawl for NLP

Sign up or log in to save this to your schedule and see who's attending!

The Common Crawl corpus contains petabytes of web crawl data and is a treasure trove of potential experiments. To introduce you to the possibilities that web crawl data has for NLP, we will take a detailed look at how the data has been used by various experiments and how to get started with the data yourself.

avatar for Stephen Merity

Stephen Merity

@smerity | Stephen Merity is responsible for crawling billions of pages a month at Common Crawl, a non-profit that provides petabytes of web data free of charge. Prior to joining Common Crawl, Stephen worked with Freelancer.com and Grok Learning in Australia. He holds a Masters of CSE from Harvard University and a Bachelors (Honours) from the University of Sydney in NLP. 

Friday April 24, 2015 2:20pm - 3:00pm

Attendees (8)