You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by dd...@apache.org on 2008/06/11 13:39:56 UTC
svn commit: r666624 [3/3] - in /hadoop/core/branches/branch-0.18:
CHANGES.txt docs/mapred_tutorial.html docs/mapred_tutorial.pdf
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
src/docs/src/documentation/content/xdocs/site.xml
Modified: hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=666624&r1=666623&r2=666624&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Wed Jun 11 04:39:55 2008
@@ -1068,33 +1068,109 @@
<p>Users/admins can also specify the maximum virtual memory
of the launched child-task using <code>mapred.child.ulimit</code>.</p>
- <p>When the job starts, the localized job directory
- <code> ${mapred.local.dir}/taskTracker/jobcache/$jobid/</code>
- has the following directories: </p>
+ <p>The task tracker has local directory,
+ <code> ${mapred.local.dir}/taskTracker/</code> to create localized
+ cache and localized job. It can define multiple local directories
+ (spanning multiple disks) and then each filename is assigned to a
+ semi-random local directory. When the job starts, task tracker
+ creates a localized job directory relative to the local directory
+ specified in the configuration. Thus the task tracker directory
+ structure looks the following: </p>
<ul>
- <li> A job-specific shared directory, created at location
- <code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/ </code>.
- This directory is exposed to the users through
- <code>job.local.dir </code>. The tasks can use this space as scratch
- space and share files among them. The directory can accessed through
+ <li><code>${mapred.local.dir}/taskTracker/archive/</code> :
+ The distributed cache. This directory holds the localized distributed
+ cache. Thus localized distributed cache is shared among all
+ the tasks and jobs </li>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/</code> :
+ The localized job directory
+ <ul>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/</code>
+ : The job-specific shared directory. The tasks can use this space as
+ scratch space and share files among them. This directory is exposed
+ to the users through the configuration property
+ <code>job.local.dir</code>. The directory can accessed through
api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjoblocaldir">
JobConf.getJobLocalDir()</a>. It is available as System property also.
- So,users can call <code>System.getProperty("job.local.dir")</code>;
- </li>
- <li>A jars directory, which has the job jar file and expanded jar </li>
- <li>A job.xml file, the generic job configuration </li>
- <li>Each task has directory <code>task-id</code> which again has the
- following structure
+ So, users (streaming etc.) can call
+ <code>System.getProperty("job.local.dir")</code> to access the
+ directory.</li>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/</code>
+ : The jars directory, which has the job jar file and expanded jar.
+ The <code>job.jar</code> is the application's jar file that is
+ automatically distributed to each machine. It is expanded in jars
+ directory before the tasks for the job start. The job.jar location
+ is accessible to the application through the api
+ <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjar">
+ JobConf.getJar() </a>. To access the unjarred directory,
+ JobConf.getJar().getParent() can be called.</li>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/job.xml</code>
+ : The job.xml file, the generic job configuration, localized for
+ the job. </li>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid</code>
+ : The task direcrory for each task attempt. Each task directory
+ again has the following structure :
<ul>
- <li>A job.xml file, task localized job configuration </li>
- <li>A directory for intermediate output files</li>
- <li>The working directory of the task.
- And work directory has a temporary directory
- to create temporary files</li>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/job.xml</code>
+ : A job.xml file, task localized job configuration, Task localization
+ means that properties have been set that are specific to
+ this particular task within the job. The properties localized for
+ each task are described below.</li>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/output</code>
+ : A directory for intermediate output files. This contains the
+ temporary map reduce data generated by the framework
+ such as map output files etc. </li>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work</code>
+ : The curernt working directory of the task. </li>
+ <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work/tmp</code>
+ : The temporary directory for the task.
+ (User can specify the property <code>mapred.child.tmp</code> to set
+ the value of temporary directory for map and reduce tasks. This
+ defaults to <code>./tmp</code>. If the value is not an absolute path,
+ it is prepended with task's working directory. Otherwise, it is
+ directly assigned. The directory will be created if it doesn't exist.
+ Then, the child java tasks are executed with option
+ <code>-Djava.io.tmpdir='the absolute path of the tmp dir'</code>.
+ Anp pipes and streaming are set with environment variable,
+ <code>TMPDIR='the absolute path of the tmp dir'</code>). This
+ directory is created, if <code>mapred.child.tmp</code> has the value
+ <code>./tmp</code> </li>
</ul>
</li>
</ul>
-
+ </li>
+ </ul>
+
+ <p>The following properties are localized in the job configuration
+ for each task's execution: </p>
+ <table>
+ <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+ <tr><td>mapred.job.id</td><td>String</td><td>The job id</td></tr>
+ <tr><td>mapred.jar</td><td>String</td>
+ <td>job.jar location in job directory</td></tr>
+ <tr><td>job.local.dir</td><td> String</td>
+ <td> The job specific shared scratch space</td></tr>
+ <tr><td>mapred.tip.id</td><td> String</td>
+ <td> The task id</td></tr>
+ <tr><td>mapred.task.id</td><td> String</td>
+ <td> The task attempt id</td></tr>
+ <tr><td>mapred.task.is.map</td><td> boolean </td>
+ <td>Is this a map task</td></tr>
+ <tr><td>mapred.task.partition</td><td> int </td>
+ <td>The id of the task within the job</td></tr>
+ <tr><td>map.input.file</td><td> String</td>
+ <td> The filename that the map is reading from</td></tr>
+ <tr><td>map.input.start</td><td> long</td>
+ <td> The offset of the start of the map input split</td></tr>
+ <tr><td>map.input.length </td><td>long </td>
+ <td>The number of bytes in the map input split</td></tr>
+ <tr><td>mapred.work.output.dir</td><td> String </td>
+ <td>The task's temporary output directory</td></tr>
+ </table>
+
+ <p>The standard output (stdout) and error (stderr) streams of the task
+ are read by the TaskTracker and logged to
+ <code>${HADOOP_LOG_DIR}/userlogs</code></p>
+
<p>The <a href="#DistributedCache">DistributedCache</a> can also be used
as a rudimentary software distribution mechanism for use in the map
and/or reduce tasks. It can be used to distribute both jars and
Modified: hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/site.xml?rev=666624&r1=666623&r2=666624&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/site.xml Wed Jun 11 04:39:55 2008
@@ -167,6 +167,7 @@
<setmapoutputcompressiontype href="#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)" />
<setmapoutputcompressorclass href="#setMapOutputCompressorClass(java.lang.Class)" />
<getjoblocaldir href="#getJobLocalDir()" />
+ <getjar href="#getJar()" />
</jobconf>
<jobconfigurable href="JobConfigurable.html">
<configure href="#configure(org.apache.hadoop.mapred.JobConf)" />