You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Matei Zaharia <ma...@eecs.berkeley.edu> on 2007/12/01 06:48:05 UTC

A tracing framework for Hadoop

Hi,

We're grad students at UC Berkeley working on a project to instrument
Hadoop using an open-source path-based tracing framework called X-
Trace (www.x-trace.net/wiki). X-Trace captures causal dependencies
between events in addition to timings, letting developers analyze not
just performance but also context and dependencies for various events.
We have created a web-based trace analysis UI that shows performance
of different IPC calls, DFS operations, and phases of a MapReduce job.
The goal is to let users easily spot the origin of unusual behavior in
a running system at a centralized location. We believe that this kind
of tracing can be used for performance tuning and debugging in both
development and production environments.

We'd like to get feedback on our work and suggestions on what trace
analyses would be useful to Hadoop developers and users. Some of the
reports we currently generate include machine utilization over time,
relative performance of different tasks, and performance of DFS
operations. You can see an example set of reports at http://www.cs.berkeley.edu/~matei/xtrace_sample_task.html
(this is a trace of a Nutch indexing job). You can also read our
project journal at http://radlab.cs.berkeley.edu/wiki/Projects/Monitoring_Hadoop_through_Tracing
. We've already spotted some interesting issues, like map tasks and
DFS reads/writes that are an order of magnitude slower than the
average, and we are investigating possible causes for them. Most
importantly, the UI lets a user easily see where the system is
spending time and reason about how to tune it, and provides much more
information than the progress data in the JobTracker UI. As a Hadoop
developer, what kinds of questions do you want answered about running
jobs that are hard to obtain just from process logs?

Once we've had a discussion on features for a trace analysis UI, we
would like to contribute our work into the Hadoop codebase. We will
create a JIRA issue and patch adding this functionality. We're also
interested in seeing if we can integrate X-Trace logging more tightly
with the current Apache logging in Hadoop.

Finally, we are currently experimenting on relatively small (<50
nodes) clusters here at Berkeley, but we would really like to try
tracing some large (>1000 node) clusters. If there is someone
interested in evaluating performance on such a cluster, we would be
very happy to talk about how to set up X-Trace and provide you with a
patch.

Thanks,

Andy Konwinski and Matei Zaharia