You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Felix Cheung <fe...@hotmail.com> on 2016/01/19 23:28:44 UTC

Spark Talks: Using MLib to Predict Popular Tweets & Using Zeppelin Notebooks

FYI



    _____________________________


            

*Note, expedite your check in at Galvanize and register here 

Talk 1: Using Spark MLlib To Predict Most Popular Tweets 
Spark's Machine Learning Library (MLlib) enables running Machine Learning algorithms in a scalable way on massive datasets. In this talk we will use Spark and MLlib to analyze tweets and predict the number of stars and retweets that a tweet will get. The talk will include a tutorial on Spark and MLlib. 

Prerequisites: 
Beginner. Familiarity with a programming language will be helpful. 

What You'll Learn:  
After this talk you will be able to: 
1. Use Spark to process large data sets. 
2. Use Spark MLlib to apply Machine Learning algorithms to large data sets. 
3. Understand pros and cons of using Spark vs other Machine Learning technologies. 

What to Bring: 
Asim will share source code for the talk. Attendees can bring a laptop to download and try the demos on their own machines. 

Meet Your Speaker: 
Asim Jalis is a Lead Instructor in Data Engineering at Galvanize. He has worked as a software engineer and instructor at Cloudera, Microsoft, and Hewlett-Packard. He has an MS in computer science from the University of Virginia and an MA in mathematics from the University of Wisconsin—Madison. 


Talk 2: Using Zeppelin Notebooks for Spark Streaming and Live Monitoring 
We will discuss the rapidly evolving open source Zeppelin notebook project and how it can be used for data science applications, including those with streaming data. Zeppelin notebooks can use the scheduler functionality to update data and generate plots. This allows for live monitoring applications to be rapidly prototyped in a production environment. Simple end to end examples will be discussed. 

Prerequisites: 
Intermediate to Advanced. Some experience with Spark and Zeppelin (or notebooks in general) will be helpful. 

Meet Your Speaker:  
Jerome Nilmeier is a Data Scientist and Engineer at IBM in the Spark Enablement Team and the Spark Technology Center, where he works on all things Spark related. He contributes to open source, participates in community outreach, and works with clients on Spark in production environments. 

Prior to his journey into big data, he was a computational scientist at the Lawrence Livermore and Berkeley National Laboratories. He holds a PhD in Computational Biophysics from UC San Francisco, and a BS from UC Berkeley in Chemical Engineering. 

*Note, expedite your check in at Galvanize and register here