You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/08/28 07:58:43 UTC

[Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by TedDunning

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by TedDunning:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

The comment on the change is:
added help in setting config parameters.

------------------------------------------------------------------------------
  
   1. Start by getting everything running (likely on a small input) in the local runner. 
      You do this by setting your job tracker to "local" in your config. The local runner can run 
-     under the debugger and runs on your development machine.
+     under the debugger and runs on your development machine.  A very quick and easy way to set this 
+     config variable is to include the following line just before you run the job:
+ 
+     {{{conf.set("mapred.job.tracker", "local");}}}
+ 
+     You may also want to do this to make the input and output files be in the local file system rather than in the Hadoop 
+     distributed file system (HDFS):
+ 
+     {{{conf.set("fs.default.name", "local");}}}
+ 
+     You can also set these configuration parameters in {{{hadoop-site.xml}}}.  The configuration files 
+     {{{hadoop-default.xml}}}, {{{mapred-default.xml}}} and {{{hadoop-site.xml}}} should appear somewhere in your program's 
+     class path when the program runs.
+ 
  
   2. Run the small input on a 1 node cluster. This will smoke out all of the issues that happen with
      distribution and the "real" task runner, but you only have a single place to look at logs. Most