Loading…
Text By the Bay has ended
Friday, April 24 • 2:20pm - 3:00pm
Unsupervised NLP Tutorial using Apache Spark

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Paraphrasing Tim O'Reilly, the person who has the most data wins. That's a neat slogan, but the more data one has, the more likely it is to be unlabeled. Unfortunately, there aren't that many unsupervised learning algorithms out there, for machine learning in general and for NLP in particular. Recent advances in deep learning provide new tools for text mining of large unsupervised datasets. In particular, I will talk about the math, intuition and implementation of the word2vec algorithm, its variants (skipgram and continuous bag of words), use cases, and extensions (e.g. paragraph2vec, doc2vec). I will wrap up with a simple demonstration at scale using Scala, Apache Spark, MLLib, and the Apache Zeppelin Notebook.

Speakers
avatar for Marek Kolodziej

Marek Kolodziej

Principal Research Engineer, Nitro
Marek Kolodziej is a Principal Research Engineer at Nitro, Inc. He's been working on a diverse set of machine learning, distributed computing and big data problems for the past 6 years, and statistics and econometrics for the past 11. His current passion is deep learning and GPU computing... Read More →


Friday April 24, 2015 2:20pm - 3:00pm PDT
Theater

Attendees (1)