You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by dd...@apache.org on 2008/06/11 13:26:55 UTC

svn commit: r666620 [3/3] - in /hadoop/core/trunk: CHANGES.txt docs/changes.html docs/mapred_tutorial.html docs/mapred_tutorial.pdf src/docs/src/documentation/content/xdocs/mapred_tutorial.xml src/docs/src/documentation/content/xdocs/site.xml

Modified: hadoop/core/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=666620&r1=666619&r2=666620&view=diff
==============================================================================
--- hadoop/core/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/core/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Wed Jun 11 04:26:54 2008
@@ -1068,33 +1068,109 @@
         <p>Users/admins can also specify the maximum virtual memory 
         of the launched child-task using <code>mapred.child.ulimit</code>.</p>
         
-        <p>When the job starts, the localized job directory
-        <code> ${mapred.local.dir}/taskTracker/jobcache/$jobid/</code>
-        has the following directories: </p>
+        <p>The task tracker has local directory,
+        <code> ${mapred.local.dir}/taskTracker/</code> to create localized
+        cache and localized job. It can define multiple local directories 
+        (spanning multiple disks) and then each filename is assigned to a
+        semi-random local directory. When the job starts, task tracker 
+        creates a localized job directory relative to the local directory
+        specified in the configuration. Thus the task tracker directory 
+        structure looks the following: </p>         
         <ul>
-        <li> A job-specific shared directory, created at location
-        <code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/ </code>.
-        This directory is exposed to the users through 
-        <code>job.local.dir </code>. The tasks can use this space as scratch
-        space and share files among them. The directory can accessed through 
+        <li><code>${mapred.local.dir}/taskTracker/archive/</code> :
+        The distributed cache. This directory holds the localized distributed
+        cache. Thus localized distributed cache is shared among all
+        the tasks and jobs </li>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/</code> :
+        The localized job directory 
+        <ul>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/</code> 
+        : The job-specific shared directory. The tasks can use this space as 
+        scratch space and share files among them. This directory is exposed
+        to the users through the configuration property  
+        <code>job.local.dir</code>. The directory can accessed through 
         api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjoblocaldir">
         JobConf.getJobLocalDir()</a>. It is available as System property also.
-        So,users can call <code>System.getProperty("job.local.dir")</code>;
-        </li>
-        <li>A jars directory, which has the job jar file and expanded jar </li>
-        <li>A job.xml file, the generic job configuration </li>
-        <li>Each task has directory <code>task-id</code> which again has the 
-        following structure
+        So, users (streaming etc.) can call 
+        <code>System.getProperty("job.local.dir")</code> to access the 
+        directory.</li>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/</code>
+        : The jars directory, which has the job jar file and expanded jar.
+        The <code>job.jar</code> is the application's jar file that is
+        automatically distributed to each machine. It is expanded in jars
+        directory before the tasks for the job start. The job.jar location
+        is accessible to the application through the api
+        <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjar"> 
+        JobConf.getJar() </a>. To access the unjarred directory,
+        JobConf.getJar().getParent() can be called.</li>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/job.xml</code>
+        : The job.xml file, the generic job configuration, localized for 
+        the job. </li>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid</code>
+        : The task direcrory for each task attempt. Each task directory
+        again has the following structure :
         <ul>
-        <li>A job.xml file, task localized job configuration </li>
-        <li>A directory for intermediate output files</li>
-        <li>The working directory of the task. 
-        And work directory has a temporary directory 
-        to create temporary files</li>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/job.xml</code>
+        : A job.xml file, task localized job configuration, Task localization
+        means that properties have been set that are specific to
+        this particular task within the job. The properties localized for 
+        each task are described below.</li>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/output</code>
+        : A directory for intermediate output files. This contains the
+        temporary map reduce data generated by the framework
+        such as map output files etc. </li>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work</code>
+        : The curernt working directory of the task. </li>
+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work/tmp</code>
+        : The temporary directory for the task. 
+        (User can specify the property <code>mapred.child.tmp</code> to set
+        the value of temporary directory for map and reduce tasks. This 
+        defaults to <code>./tmp</code>. If the value is not an absolute path,
+        it is prepended with task's working directory. Otherwise, it is
+        directly assigned. The directory will be created if it doesn't exist.
+        Then, the child java tasks are executed with option
+        <code>-Djava.io.tmpdir='the absolute path of the tmp dir'</code>.
+        Anp pipes and streaming are set with environment variable,
+        <code>TMPDIR='the absolute path of the tmp dir'</code>). This 
+        directory is created, if <code>mapred.child.tmp</code> has the value
+        <code>./tmp</code> </li>
         </ul>
         </li>
         </ul>
- 
+        </li>
+        </ul>
+
+        <p>The following properties are localized in the job configuration 
+         for each task's execution: </p>
+        <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td>mapred.job.id</td><td>String</td><td>The job id</td></tr>
+          <tr><td>mapred.jar</td><td>String</td>
+              <td>job.jar location in job directory</td></tr>
+          <tr><td>job.local.dir</td><td> String</td>
+              <td> The job specific shared scratch space</td></tr>
+          <tr><td>mapred.tip.id</td><td> String</td>
+              <td> The task id</td></tr>
+          <tr><td>mapred.task.id</td><td> String</td>
+              <td> The task attempt id</td></tr>
+          <tr><td>mapred.task.is.map</td><td> boolean </td>
+              <td>Is this a map task</td></tr>
+          <tr><td>mapred.task.partition</td><td> int </td>
+              <td>The id of the task within the job</td></tr>
+          <tr><td>map.input.file</td><td> String</td>
+              <td> The filename that the map is reading from</td></tr>
+          <tr><td>map.input.start</td><td> long</td>
+              <td> The offset of the start of the map input split</td></tr>
+          <tr><td>map.input.length </td><td>long </td>
+              <td>The number of bytes in the map input split</td></tr>
+          <tr><td>mapred.work.output.dir</td><td> String </td>
+              <td>The task's temporary output directory</td></tr>
+        </table>
+        
+        <p>The standard output (stdout) and error (stderr) streams of the task 
+        are read by the TaskTracker and logged to 
+        <code>${HADOOP_LOG_DIR}/userlogs</code></p>
+        
         <p>The <a href="#DistributedCache">DistributedCache</a> can also be used
         as a rudimentary software distribution mechanism for use in the map 
         and/or reduce tasks. It can be used to distribute both jars and 

Modified: hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=666620&r1=666619&r2=666620&view=diff
==============================================================================
--- hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml Wed Jun 11 04:26:54 2008
@@ -167,6 +167,7 @@
                 <setmapoutputcompressiontype href="#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)" />
                 <setmapoutputcompressorclass href="#setMapOutputCompressorClass(java.lang.Class)" />
                 <getjoblocaldir href="#getJobLocalDir()" />
+                <getjar href="#getJar()" />
               </jobconf>
               <jobconfigurable href="JobConfigurable.html">
                 <configure href="#configure(org.apache.hadoop.mapred.JobConf)" />