You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by ni...@apache.org on 2008/05/15 09:03:49 UTC
svn commit: r656523 [2/4] - in /hadoop/core/branches/branch-0.17: ./ docs/
docs/skin/images/ src/docs/src/documentation/conf/
src/docs/src/documentation/content/xdocs/
Modified: hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html
URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html?rev=656523&r1=656522&r2=656523&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html (original)
+++ hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html Thu May 15 00:03:48 2008
@@ -150,7 +150,10 @@
<a href="http://hadoop.apache.org/core/mailing_lists.html">Mailing Lists</a>
</div>
<div class="menuitem">
-<a href="changes.html">Release Notes</a>
+<a href="releasenotes.html">Release Notes</a>
+</div>
+<div class="menuitem">
+<a href="changes.html">All Changes</a>
</div>
</div>
<div id="credit"></div>
@@ -292,7 +295,7 @@
<a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
<ul class="minitoc">
<li>
-<a href="#Source+Code-N10C7E">Source Code</a>
+<a href="#Source+Code-N10C84">Source Code</a>
</li>
<li>
<a href="#Sample+Runs">Sample Runs</a>
@@ -1531,6 +1534,8 @@
<span class="codefrag"></property></span>
</p>
+<p>Users/admins can also specify the maximum virtual memory
+ of the launched child-task using <span class="codefrag">mapred.child.ulimit</span>.</p>
<p>When the job starts, the localized job directory
<span class="codefrag"> ${mapred.local.dir}/taskTracker/jobcache/$jobid/</span>
has the following directories: </p>
@@ -1585,7 +1590,7 @@
loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
System.load</a>.</p>
-<a name="N108F2"></a><a name="Job+Submission+and+Monitoring"></a>
+<a name="N108F8"></a><a name="Job+Submission+and+Monitoring"></a>
<h3 class="h4">Job Submission and Monitoring</h3>
<p>
<a href="api/org/apache/hadoop/mapred/JobClient.html">
@@ -1646,7 +1651,7 @@
<p>Normally the user creates the application, describes various facets
of the job via <span class="codefrag">JobConf</span>, and then uses the
<span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
-<a name="N10952"></a><a name="Job+Control"></a>
+<a name="N10958"></a><a name="Job+Control"></a>
<h4>Job Control</h4>
<p>Users may need to chain map-reduce jobs to accomplish complex
tasks which cannot be done via a single map-reduce job. This is fairly
@@ -1682,7 +1687,7 @@
</li>
</ul>
-<a name="N1097C"></a><a name="Job+Input"></a>
+<a name="N10982"></a><a name="Job+Input"></a>
<h3 class="h4">Job Input</h3>
<p>
<a href="api/org/apache/hadoop/mapred/InputFormat.html">
@@ -1730,7 +1735,7 @@
appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
compressed files with the above extensions cannot be <em>split</em> and
each compressed file is processed in its entirety by a single mapper.</p>
-<a name="N109E6"></a><a name="InputSplit"></a>
+<a name="N109EC"></a><a name="InputSplit"></a>
<h4>InputSplit</h4>
<p>
<a href="api/org/apache/hadoop/mapred/InputSplit.html">
@@ -1744,7 +1749,7 @@
FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets
<span class="codefrag">map.input.file</span> to the path of the input file for the
logical split.</p>
-<a name="N10A0B"></a><a name="RecordReader"></a>
+<a name="N10A11"></a><a name="RecordReader"></a>
<h4>RecordReader</h4>
<p>
<a href="api/org/apache/hadoop/mapred/RecordReader.html">
@@ -1756,7 +1761,7 @@
for processing. <span class="codefrag">RecordReader</span> thus assumes the
responsibility of processing record boundaries and presents the tasks
with keys and values.</p>
-<a name="N10A2E"></a><a name="Job+Output"></a>
+<a name="N10A34"></a><a name="Job+Output"></a>
<h3 class="h4">Job Output</h3>
<p>
<a href="api/org/apache/hadoop/mapred/OutputFormat.html">
@@ -1781,7 +1786,7 @@
<p>
<span class="codefrag">TextOutputFormat</span> is the default
<span class="codefrag">OutputFormat</span>.</p>
-<a name="N10A57"></a><a name="Task+Side-Effect+Files"></a>
+<a name="N10A5D"></a><a name="Task+Side-Effect+Files"></a>
<h4>Task Side-Effect Files</h4>
<p>In some applications, component tasks need to create and/or write to
side-files, which differ from the actual job-output files.</p>
@@ -1820,7 +1825,7 @@
<p>The entire discussion holds true for maps of jobs with
reducer=NONE (i.e. 0 reduces) since output of the map, in that case,
goes directly to HDFS.</p>
-<a name="N10A9F"></a><a name="RecordWriter"></a>
+<a name="N10AA5"></a><a name="RecordWriter"></a>
<h4>RecordWriter</h4>
<p>
<a href="api/org/apache/hadoop/mapred/RecordWriter.html">
@@ -1828,9 +1833,9 @@
pairs to an output file.</p>
<p>RecordWriter implementations write the job outputs to the
<span class="codefrag">FileSystem</span>.</p>
-<a name="N10AB6"></a><a name="Other+Useful+Features"></a>
+<a name="N10ABC"></a><a name="Other+Useful+Features"></a>
<h3 class="h4">Other Useful Features</h3>
-<a name="N10ABC"></a><a name="Counters"></a>
+<a name="N10AC2"></a><a name="Counters"></a>
<h4>Counters</h4>
<p>
<span class="codefrag">Counters</span> represent global counters, defined either by
@@ -1844,7 +1849,7 @@
Reporter.incrCounter(Enum, long)</a> in the <span class="codefrag">map</span> and/or
<span class="codefrag">reduce</span> methods. These counters are then globally
aggregated by the framework.</p>
-<a name="N10AE7"></a><a name="DistributedCache"></a>
+<a name="N10AED"></a><a name="DistributedCache"></a>
<h4>DistributedCache</h4>
<p>
<a href="api/org/apache/hadoop/filecache/DistributedCache.html">
@@ -1877,7 +1882,7 @@
<a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
DistributedCache.createSymlink(Configuration)</a> api. Files
have <em>execution permissions</em> set.</p>
-<a name="N10B25"></a><a name="Tool"></a>
+<a name="N10B2B"></a><a name="Tool"></a>
<h4>Tool</h4>
<p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a>
interface supports the handling of generic Hadoop command-line options.
@@ -1917,7 +1922,7 @@
</span>
</p>
-<a name="N10B57"></a><a name="IsolationRunner"></a>
+<a name="N10B5D"></a><a name="IsolationRunner"></a>
<h4>IsolationRunner</h4>
<p>
<a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
@@ -1941,7 +1946,7 @@
<p>
<span class="codefrag">IsolationRunner</span> will run the failed task in a single
jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10B8A"></a><a name="Debugging"></a>
+<a name="N10B90"></a><a name="Debugging"></a>
<h4>Debugging</h4>
<p>Map/Reduce framework provides a facility to run user-provided
scripts for debugging. When map/reduce task fails, user can run
@@ -1952,7 +1957,7 @@
<p> In the following sections we discuss how to submit debug script
along with the job. For submitting debug script, first it has to
distributed. Then the script has to supplied in Configuration. </p>
-<a name="N10B96"></a><a name="How+to+distribute+script+file%3A"></a>
+<a name="N10B9C"></a><a name="How+to+distribute+script+file%3A"></a>
<h5> How to distribute script file: </h5>
<p>
To distribute the debug script file, first copy the file to the dfs.
@@ -1975,7 +1980,7 @@
<a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
DistributedCache.createSymLink(Configuration) </a> api.
</p>
-<a name="N10BAF"></a><a name="How+to+submit+script%3A"></a>
+<a name="N10BB5"></a><a name="How+to+submit+script%3A"></a>
<h5> How to submit script: </h5>
<p> A quick way to submit debug script is to set values for the
properties "mapred.map.task.debug.script" and
@@ -1999,17 +2004,17 @@
<span class="codefrag">$script $stdout $stderr $syslog $jobconf $program </span>
</p>
-<a name="N10BD1"></a><a name="Default+Behavior%3A"></a>
+<a name="N10BD7"></a><a name="Default+Behavior%3A"></a>
<h5> Default Behavior: </h5>
<p> For pipes, a default script is run to process core dumps under
gdb, prints stack trace and gives info about running threads. </p>
-<a name="N10BDC"></a><a name="JobControl"></a>
+<a name="N10BE2"></a><a name="JobControl"></a>
<h4>JobControl</h4>
<p>
<a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
and their dependencies.</p>
-<a name="N10BE9"></a><a name="Data+Compression"></a>
+<a name="N10BEF"></a><a name="Data+Compression"></a>
<h4>Data Compression</h4>
<p>Hadoop Map-Reduce provides facilities for the application-writer to
specify compression for both intermediate map-outputs and the
@@ -2023,7 +2028,7 @@
codecs for reasons of both performance (zlib) and non-availability of
Java libraries (lzo). More details on their usage and availability are
available <a href="native_libraries.html">here</a>.</p>
-<a name="N10C09"></a><a name="Intermediate+Outputs"></a>
+<a name="N10C0F"></a><a name="Intermediate+Outputs"></a>
<h5>Intermediate Outputs</h5>
<p>Applications can control compression of intermediate map-outputs
via the
@@ -2044,7 +2049,7 @@
<a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)">
JobConf.setMapOutputCompressionType(SequenceFile.CompressionType)</a>
api.</p>
-<a name="N10C35"></a><a name="Job+Outputs"></a>
+<a name="N10C3B"></a><a name="Job+Outputs"></a>
<h5>Job Outputs</h5>
<p>Applications can control compression of job-outputs via the
<a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -2064,7 +2069,7 @@
</div>
-<a name="N10C64"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10C6A"></a><a name="Example%3A+WordCount+v2.0"></a>
<h2 class="h3">Example: WordCount v2.0</h2>
<div class="section">
<p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
@@ -2074,7 +2079,7 @@
<a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
<a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a>
Hadoop installation.</p>
-<a name="N10C7E"></a><a name="Source+Code-N10C7E"></a>
+<a name="N10C84"></a><a name="Source+Code-N10C84"></a>
<h3 class="h4">Source Code</h3>
<table class="ForrestTable" cellspacing="1" cellpadding="4">
@@ -3284,7 +3289,7 @@
</tr>
</table>
-<a name="N113E0"></a><a name="Sample+Runs"></a>
+<a name="N113E6"></a><a name="Sample+Runs"></a>
<h3 class="h4">Sample Runs</h3>
<p>Sample text-files as input:</p>
<p>
@@ -3452,7 +3457,7 @@
<br>
</p>
-<a name="N114B4"></a><a name="Highlights"></a>
+<a name="N114BA"></a><a name="Highlights"></a>
<h3 class="h4">Highlights</h3>
<p>The second version of <span class="codefrag">WordCount</span> improves upon the
previous one by using some features offered by the Map-Reduce framework: