You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/09/28 08:43:47 UTC

[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by Amareshwari

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by Amareshwari:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

------------------------------------------------------------------------------

This can be extremely useful to display debug information about the current record being handled, or setting certain debug flags about the status of the mapper. While running locally on a small data set can find many bugs, large data sets may contain pathological cases that are otherwise unexepcted. This method of debugging can help catch those cases.

+ == Run a debug script when Task fails ==
+
+ A facility is provided, via user-provided scripts, for doing post-processing on task logs, task's stdout, stderr, core file.There is a default script which processes core dumps under gdb and prints stack trace. The last five lines from stdout and stderr of debug script are printed on the diagnostics. These outputs are displayed job UI on demand.
+
+ == How to submit debug command ==
+
+ A very quick and easy way to set debug command is to set the properties mapred.map.task.debug.command and mapred.reduce.task.debug.command for debugging map task and reduce task respectively.
+ These properties can also be set by APIs conf.setMapDebugCommand(String cmd) and conf.setReduceDebugCommand(String cmd).
+ The command can consists of @stdout@, @stderr@, @core@ to access task's stdout, stderr and core files respectively.
+
+ == How to submit debug script ==
+
+
= How to debug Hadoop Pipes programs =

In order to debug Pipes programs you need to keep the downloaded commands.