You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-commits@hadoop.apache.org by vi...@apache.org on 2010/05/10 17:28:58 UTC
svn commit: r942787 - in /hadoop/mapreduce/branches/branch-0.21: ./
CHANGES.txt src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
Author: vinodkv
Date: Mon May 10 15:28:58 2010
New Revision: 942787
URL: http://svn.apache.org/viewvc?rev=942787&view=rev
Log:
MAPREDUCE-1610. Merge revision 942764 from trunk.
Modified:
hadoop/mapreduce/branches/branch-0.21/ (props changed)
hadoop/mapreduce/branches/branch-0.21/CHANGES.txt (contents, props changed)
hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
Propchange: hadoop/mapreduce/branches/branch-0.21/
------------------------------------------------------------------------------
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon May 10 15:28:58 2010
@@ -1,2 +1,2 @@
/hadoop/core/branches/branch-0.19/mapred:713112
-/hadoop/mapreduce/trunk:940364
+/hadoop/mapreduce/trunk:940364,942764
Modified: hadoop/mapreduce/branches/branch-0.21/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/CHANGES.txt?rev=942787&r1=942786&r2=942787&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/CHANGES.txt (original)
+++ hadoop/mapreduce/branches/branch-0.21/CHANGES.txt Mon May 10 15:28:58 2010
@@ -750,6 +750,9 @@ Release 0.21.0 - Unreleased
MAPREDUCE-1613. Install/deploy source jars to Maven repo
(Patrick Angeles via ddas)
+ MAPREDUCE-1610. Forrest documentation should be updated to reflect
+ the changes in MAPREDUCE-856. (Ravi Gummadi via vinodkv)
+
BUG FIXES
MAPREDUCE-878. Rename fair scheduler design doc to
Propchange: hadoop/mapreduce/branches/branch-0.21/CHANGES.txt
------------------------------------------------------------------------------
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon May 10 15:28:58 2010
@@ -1,3 +1,3 @@
/hadoop/core/branches/branch-0.19/mapred/CHANGES.txt:713112
/hadoop/mapreduce/branches/HDFS-641/CHANGES.txt:817878-835964
-/hadoop/mapreduce/trunk/CHANGES.txt:940364
+/hadoop/mapreduce/trunk/CHANGES.txt:940364,942764
Modified: hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=942787&r1=942786&r2=942787&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Mon May 10 15:28:58 2010
@@ -1375,61 +1375,87 @@
<section>
<title> Directory Structure </title>
<p>The task tracker has local directory,
- <code> ${mapreduce.cluster.local.dir}/taskTracker/</code> to create localized
- cache and localized job. It can define multiple local directories
- (spanning multiple disks) and then each filename is assigned to a
- semi-random local directory. When the job starts, task tracker
+ <code> ${mapreduce.cluster.local.dir}/taskTracker/</code> to create
+ localized cache and localized job. It can define multiple local
+ directories (spanning multiple disks) and then each filename is assigned
+ to a semi-random local directory. When the job starts, task tracker
creates a localized job directory relative to the local directory
specified in the configuration. Thus the task tracker directory
- structure looks the following: </p>
+ structure looks as following: </p>
<ul>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/archive/</code> :
- The distributed cache. This directory holds the localized distributed
- cache. Thus localized distributed cache is shared among all
- the tasks and jobs </li>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/</code> :
- The localized job directory
+ <li><code>${mapreduce.cluster.local.dir}/taskTracker/distcache/</code> :
+ The public distributed cache for the jobs of all users. This directory
+ holds the localized public distributed cache. Thus localized public
+ distributed cache is shared among all the tasks and jobs of all users.
+ </li>
+ <li><code>${mapreduce.cluster.local.dir}/taskTracker/$user/distcache/
+ </code> :
+ The private distributed cache for the jobs of the specific user. This
+ directory holds the localized private distributed cache. Thus localized
+ private distributed cache is shared among all the tasks and jobs of the
+ specific user only. It is not accessible to jobs of other users.
+ </li>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/
+ </code> : The localized job directory
<ul>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/work/</code>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/work/
+ </code>
: The job-specific shared directory. The tasks can use this space as
scratch space and share files among them. This directory is exposed
to the users through the configuration property
- <code>mapreduce.job.local.dir</code>. It is available as System property also.
- So, users (streaming etc.) can call
+ <code>mapreduce.job.local.dir</code>. It is available as System property
+ also. So, users (streaming etc.) can call
<code>System.getProperty("mapreduce.job.local.dir")</code> to access the
directory.</li>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/jars/</code>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/jars/
+ </code>
: The jars directory, which has the job jar file and expanded jar.
The <code>job.jar</code> is the application's jar file that is
- automatically distributed to each machine. Any library jars that are dependencies
- of the application code may be packaged inside this jar in a <code>lib/</code> directory.
- This directory is extracted from <code>job.jar</code> and its contents are
- automatically added to the classpath for each task.
- The job.jar location is accessible to the application through the api
+ automatically distributed to each machine. Any library jars that are
+ dependencies of the application code may be packaged inside this jar in
+ a <code>lib/</code> directory.
+ This directory is extracted from <code>job.jar</code> and its contents
+ are automatically added to the classpath for each task.
+ The job.jar location is accessible to the application through the API
<a href="ext:api/org/apache/hadoop/mapreduce/task/jobcontextimpl/getjar">
Job.getJar() </a>. To access the unjarred directory,
Job.getJar().getParent() can be called.</li>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/job.xml</code>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/job.xml
+ </code>
: The job.xml file, the generic job configuration, localized for
the job. </li>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid</code>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid
+ </code>
: The task directory for each task attempt. Each task directory
again has the following structure :
<ul>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid/job.xml</code>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/job.xml
+ </code>
: A job.xml file, task localized job configuration, Task localization
means that properties have been set that are specific to
this particular task within the job. The properties localized for
each task are described below.</li>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid/output</code>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/output
+ </code>
: A directory for intermediate output files. This contains the
temporary map reduce data generated by the framework
such as map output files etc. </li>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid/work</code>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/work
+ </code>
: The curernt working directory of the task.
With <a href="#Task+JVM+Reuse">jvm reuse</a> enabled for tasks, this
directory will be the directory on which the jvm has started</li>
- <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid/work/tmp</code>
+ <li><code>
+ ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/work/tmp
+ </code>
: The temporary directory for the task.
(User can specify the property <code>mapreduce.task.tmp.dir</code> to set
the value of temporary directory for map and reduce tasks. This
@@ -1438,7 +1464,7 @@
directly assigned. The directory will be created if it doesn't exist.
Then, the child java tasks are executed with option
<code>-Djava.io.tmpdir='the absolute path of the tmp dir'</code>.
- Anp pipes and streaming are set with environment variable,
+ Pipes and streaming are set with environment variable,
<code>TMPDIR='the absolute path of the tmp dir'</code>). This
directory is created, if <code>mapreduce.task.tmp.dir</code> has the value
<code>./tmp</code> </li>
@@ -2097,7 +2123,8 @@
Next, go to the node on which the failed task ran and go to the
<code>TaskTracker</code>'s local directory and run the
<code>IsolationRunner</code>:<br/>
- <code>$ cd <local path>/taskTracker/${taskid}/work</code><br/>
+ <code>$ cd <local path>
+ /taskTracker/$user/jobcache/$jobid/${taskid}/work</code><br/>
<code>
$ bin/hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
</code>