You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/08/28 07:58:43 UTC
[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by TedDunning
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by TedDunning:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms
The comment on the change is:
added help in setting config parameters.
------------------------------------------------------------------------------
1. Start by getting everything running (likely on a small input) in the local runner.
You do this by setting your job tracker to "local" in your config. The local runner can run
- under the debugger and runs on your development machine.
+ under the debugger and runs on your development machine. A very quick and easy way to set this
+ config variable is to include the following line just before you run the job:
+
+ {{{conf.set("mapred.job.tracker", "local");}}}
+
+ You may also want to do this to make the input and output files be in the local file system rather than in the Hadoop
+ distributed file system (HDFS):
+
+ {{{conf.set("fs.default.name", "local");}}}
+
+ You can also set these configuration parameters in {{{hadoop-site.xml}}}. The configuration files
+ {{{hadoop-default.xml}}}, {{{mapred-default.xml}}} and {{{hadoop-site.xml}}} should appear somewhere in your program's
+ class path when the program runs.
+
2. Run the small input on a 1 node cluster. This will smoke out all of the issues that happen with
distribution and the "real" task runner, but you only have a single place to look at logs. Most