You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by Matei Zaharia <ma...@eecs.berkeley.edu> on 2012/03/31 00:13:07 UTC

Third Spark User Meetup, featuring Hadoop, Mesos, and Debugging

If you're in the Bay Area, you might be interested to attend the next installment of the Spark User Meetup, which will be on April 5th at UC Berkeley. To attend, *please register* at http://www.meetup.com/spark-users/events/58579402/ so we can get an accurate headcount.

This meetup will contain two talks:

Running Spark and Hadoop on a Private Cluster with Mesos
(Benjamin Hindman, UC Berkeley and Twitter)

This talk will cover how to deploy Spark to a cluster using the Apache Mesos cluster manager, and dynamically share resources with Hadoop MapReduce by running Hadoop through Mesos as well. It will focus on the upcoming 0.9 release of Mesos, which provides a variety of usability and fault tolerance fixes. We will demo how to set up and configure a cluster with Mesos, Spark, Hadoop MapReduce and HDFS starting from plain Linux machines. In addition, we'll cover practical issues such as how to find log files and debug your jobs.

Arthur: The Spark Debugger
(Ankur Dave, UC Berkeley)

Debugging large parallel jobs is hard, because the sheer scale of the computation makes it hard to track what's happening, inevitable weirdnesses in the data triggers errors, and it's difficult tell whether a program is performing efficiently. To tackle this problem, we are designing Arthur, a debugger for Spark programs that provides visibility into the computation and powerful analysis features. One key feature of Arthur is that it can leverage the deterministic nature of Spark programs to efficiently replay part of a parallel job. Using this capability, users can rerun any task in the job in a single-process debugger to step through it line by line, or rebuild any intermediate dataset in the job and query it interactively from the Spark shell. We are also using replay to build tracing capabilities, such as figure out which input records caused an output record. This talk will give an overview of the research going on in Arthur and cover several features that are already included in Spark. We also solicit your suggestions for improving debugging!

Pizza will also be provided starting at 7 PM, and the talks themselves start at 7:30.

See you on Thursday!

Matei