You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by yh...@apache.org on 2009/05/05 10:46:10 UTC

svn commit: r771623 - in /hadoop/core/branches/branch-0.20: ./ conf/ src/docs/src/documentation/content/xdocs/

Author: yhemanth
Date: Tue May  5 08:46:09 2009
New Revision: 771623

URL: http://svn.apache.org/viewvc?rev=771623&view=rev
Log:
HADOOP-5736. Update the capacity scheduler documentation for features like memory based scheduling, job initialization and removal of pre-emption. Contributed by Sreekanth Ramakrishnan.

Modified:
    hadoop/core/branches/branch-0.20/CHANGES.txt
    hadoop/core/branches/branch-0.20/conf/capacity-scheduler.xml.template
    hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
    hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/cluster_setup.xml
    hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

Modified: hadoop/core/branches/branch-0.20/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.20/CHANGES.txt?rev=771623&r1=771622&r2=771623&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.20/CHANGES.txt (original)
+++ hadoop/core/branches/branch-0.20/CHANGES.txt Tue May  5 08:46:09 2009
@@ -13,6 +13,10 @@
 
     HADOOP-5711. Change Namenode file close log to info. (szetszwo)
 
+    HADOOP-5736. Update the capacity scheduler documentation for features
+    like memory based scheduling, job initialization and removal of pre-emption.
+    (Sreekanth Ramakrishnan via yhemanth)
+
   OPTIMIZATIONS
 
   BUG FIXES

Modified: hadoop/core/branches/branch-0.20/conf/capacity-scheduler.xml.template
URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.20/conf/capacity-scheduler.xml.template?rev=771623&r1=771622&r2=771623&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.20/conf/capacity-scheduler.xml.template (original)
+++ hadoop/core/branches/branch-0.20/conf/capacity-scheduler.xml.template Tue May  5 08:46:09 2009
@@ -60,16 +60,13 @@
   <property>
     <name>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</name>
     <value>-1</value>
-    <description>If mapred.task.maxpmem is set to -1, this configuration will
-      be used to calculate job's physical memory requirements as a percentage of
-      the job's virtual memory requirements set via mapred.task.maxvmem. This
-      property thus provides default value of physical memory for job's that
-      don't explicitly specify physical memory requirements.
-
-      If not explicitly set to a valid value, scheduler will not consider
-      physical memory for scheduling even if virtual memory based scheduling is
-      enabled(by setting valid values for both mapred.task.default.maxvmem and
-      mapred.task.limit.maxvmem).
+    <description>A percentage (float) of the default VM limit for jobs
+   	  (mapred.task.default.maxvm). This is the default RAM task-limit 
+   	  associated with a task. Unless overridden by a job's setting, this 
+   	  number defines the RAM task-limit.
+
+      If this property is missing, or set to an invalid value, scheduling 
+      based on physical memory, RAM, is disabled.  
     </description>
   </property>
 

Modified: hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml?rev=771623&r1=771622&r2=771623&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml (original)
+++ hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml Tue May  5 08:46:09 2009
@@ -28,7 +28,9 @@
     <section>
       <title>Purpose</title>
       
-      <p>This document describes the Capacity Scheduler, a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.</p>
+      <p>This document describes the Capacity Scheduler, a pluggable 
+      Map/Reduce scheduler for Hadoop which provides a way to share 
+      large clusters.</p>
     </section>
     
     <section>
@@ -40,19 +42,17 @@
           Support for multiple queues, where a job is submitted to a queue.
         </li>
         <li>
-          Queues are guaranteed a fraction of the capacity of the grid (their 
- 	      'guaranteed capacity') in the sense that a certain capacity of 
- 	      resources will be at their disposal. All jobs submitted to a 
- 	      queue will have access to the capacity guaranteed to the queue.
+          Queues are allocated a fraction of the capacity of the grid in the 
+          sense that a certain capacity of resources will be at their 
+          disposal. All jobs submitted to a queue will have access to the 
+          capacity allocated to the queue.
         </li>
         <li>
-          Free resources can be allocated to any queue beyond its guaranteed 
-          capacity. These excess allocated resources can be reclaimed and made 
-          available to another queue in order to meet its capacity guarantee.
-        </li>
-        <li>
-          The scheduler guarantees that excess resources taken from a queue 
-          will be restored to it within N minutes of its need for them.
+          Free resources can be allocated to any queue beyond it's capacity. 
+          When there is demand for these resources from queues running below 
+          capacity at a future point in time, as tasks scheduled on these 
+          resources complete, they will be assigned to jobs on queues 
+          running below the capacity.
         </li>
         <li>
           Queues optionally support job priorities (disabled by default).
@@ -60,7 +60,9 @@
         <li>
           Within a queue, jobs with higher priority will have access to the 
           queue's resources before jobs with lower priority. However, once a 
-          job is running, it will not be preempted for a higher priority job.
+          job is running, it will not be preempted for a higher priority job,
+          though new tasks from the higher priority job will be 
+          preferentially scheduled.
         </li>
         <li>
           In order to prevent one or more users from monopolizing its 
@@ -83,59 +85,34 @@
       <p>Note that many of these steps can be, and will be, enhanced over time
       to provide better algorithms.</p>
       
-      <p>Whenever a TaskTracker is free, the Capacity Scheduler first picks a 
-      queue that needs to reclaim any resources the earliest (this is a queue
-      whose resources were temporarily being used by some other queue and now
-      needs access to those resources). If no such queue is found, it then picks
+      <p>Whenever a TaskTracker is free, the Capacity Scheduler picks 
       a queue which has most free space (whose ratio of # of running slots to 
-      guaranteed capacity is the lowest).</p>
+      capacity is the lowest).</p>
       
-      <p>Once a queue is selected, the scheduler picks a job in the queue. Jobs
+      <p>Once a queue is selected, the Scheduler picks a job in the queue. Jobs
       are sorted based on when they're submitted and their priorities (if the 
       queue supports priorities). Jobs are considered in order, and a job is 
       selected if its user is within the user-quota for the queue, i.e., the 
       user is not already using queue resources above his/her limit. The 
-      scheduler also makes sure that there is enough free memory in the 
+      Scheduler also makes sure that there is enough free memory in the 
       TaskTracker to tun the job's task, in case the job has special memory
       requirements.</p>
       
-      <p>Once a job is selected, the scheduler picks a task to run. This logic 
+      <p>Once a job is selected, the Scheduler picks a task to run. This logic 
       to pick a task remains unchanged from earlier versions.</p> 
       
     </section>
     
     <section>
-      <title>Reclaiming capacity</title>
-
-	  <p>Periodically, the scheduler determines:</p>
-	  <ul>
-	    <li>
-	      if a queue needs to reclaim capacity. This happens when a queue has
-	      at least one task pending and part of its guaranteed capacity is 
-	      being used by some other queue. If this happens, the scheduler notes
-	      the amount of resources it needs to reclaim for this queue within a 
-	      specified period of time (the reclaim time). 
-	    </li>
-	    <li>
-	      if a queue has not received all the resources it needed to reclaim,
-	      and its reclaim time is about to expire. In this case, the scheduler
-	      needs to kill tasks from queues running over capacity. This it does
-	      by killing the tasks that started the latest.
-	    </li>
-	  </ul>   
-
-    </section>
-
-    <section>
       <title>Installation</title>
       
-        <p>The capacity scheduler is available as a JAR file in the Hadoop
+        <p>The Capacity Scheduler is available as a JAR file in the Hadoop
         tarball under the <em>contrib/capacity-scheduler</em> directory. The name of 
         the JAR file would be on the lines of hadoop-*-capacity-scheduler.jar.</p>
-        <p>You can also build the scheduler from source by executing
+        <p>You can also build the Scheduler from source by executing
         <em>ant package</em>, in which case it would be available under
         <em>build/contrib/capacity-scheduler</em>.</p>
-        <p>To run the capacity scheduler in your Hadoop installation, you need 
+        <p>To run the Capacity Scheduler in your Hadoop installation, you need 
         to put it on the <em>CLASSPATH</em>. The easiest way is to copy the 
         <code>hadoop-*-capacity-scheduler.jar</code> from 
         to <code>HADOOP_HOME/lib</code>. Alternatively, you can modify 
@@ -147,9 +124,9 @@
       <title>Configuration</title>
 
       <section>
-        <title>Using the capacity scheduler</title>
+        <title>Using the Capacity Scheduler</title>
         <p>
-          To make the Hadoop framework use the capacity scheduler, set up
+          To make the Hadoop framework use the Capacity Scheduler, set up
           the following property in the site configuration:</p>
           <table>
             <tr>
@@ -167,7 +144,7 @@
         <title>Setting up queues</title>
         <p>
           You can define multiple queues to which users can submit jobs with
-          the capacity scheduler. To define multiple queues, you should edit
+          the Capacity Scheduler. To define multiple queues, you should edit
           the site configuration for Hadoop and modify the
           <em>mapred.queue.names</em> property.
         </p>
@@ -185,8 +162,8 @@
       <section>
         <title>Configuring properties for queues</title>
 
-        <p>The capacity scheduler can be configured with several properties
-        for each queue that control the behavior of the scheduler. This
+        <p>The Capacity Scheduler can be configured with several properties
+        for each queue that control the behavior of the Scheduler. This
         configuration is in the <em>conf/capacity-scheduler.xml</em>. By
         default, the configuration is set up for one queue, named 
         <em>default</em>.</p>
@@ -194,10 +171,10 @@
         configuration, you should use the property name as
         <em>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.&lt;property-name&gt;</em>.
         </p>
-        <p>For example, to define the property <em>guaranteed-capacity</em>
+        <p>For example, to define the property <em>capacity</em>
         for queue named <em>research</em>, you should specify the property
         name as 
-        <em>mapred.capacity-scheduler.queue.research.guaranteed-capacity</em>.
+        <em>mapred.capacity-scheduler.queue.research.capacity</em>.
         </p>
 
         <p>The properties defined for queues and their descriptions are
@@ -205,15 +182,10 @@
 
         <table>
           <tr><th>Name</th><th>Description</th></tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.guaranteed-capacity</td>
-          	<td>Percentage of the number of slots in the cluster that are
-          	guaranteed to be available for jobs in this queue. 
-          	The sum of guaranteed capacities for all queues should be less 
-          	than or equal 100.</td>
-          </tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.reclaim-time-limit</td>
-          	<td>The amount of time, in seconds, before which resources 
-          	distributed to other queues will be reclaimed.</td>
+          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.capacity</td>
+          	<td>Percentage of the number of slots in the cluster that are made 
+            to be available for jobs in this queue. The sum of capacities 
+            for all queues should be less than or equal 100.</td>
           </tr>
           <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.supports-priority</td>
           	<td>If true, priorities of jobs will be taken into account in scheduling 
@@ -236,27 +208,133 @@
       </section>
       
       <section>
-        <title>Configuring the capacity scheduler</title>
-        <p>The capacity scheduler's behavior can be controlled through the 
-          following properties. 
+        <title>Memory management</title>
+      
+        <p>The Capacity Scheduler supports scheduling of tasks on a
+        <code>TaskTracker</code>(TT) based on a job's memory requirements
+        and the availability of RAM and Virtual Memory (VMEM) on the TT node.
+        See the <a href="mapred_tutorial.html#Memory+monitoring">Hadoop 
+        Map/Reduce tutorial</a> for details on how the TT monitors
+        memory usage.</p>
+        <p>Currently the memory based scheduling is only supported
+        in Linux platform.</p>
+        <p>Memory-based scheduling works as follows:</p>
+        <ol>
+          <li>The absence of any one or more of three config parameters 
+          or -1 being set as value of any of the parameters, 
+          <code>mapred.tasktracker.vmem.reserved</code>, 
+          <code>mapred.task.default.maxvmem</code>, or
+          <code>mapred.task.limit.maxvmem</code>, disables memory-based
+          scheduling, just as it disables memory monitoring for a TT. These
+          config parameters are described in the 
+          <a href="mapred_tutorial.html#Memory+monitoring">Hadoop Map/Reduce 
+          tutorial</a>. The value of  
+          <code>mapred.tasktracker.vmem.reserved</code> is 
+          obtained from the TT via its heartbeat. 
+          </li>
+          <li>If all the three mandatory parameters are set, the Scheduler 
+          enables VMEM-based scheduling. First, the Scheduler computes the free
+          VMEM on the TT. This is the difference between the available VMEM on the
+          TT (the node's total VMEM minus the offset, both of which are sent by 
+          the TT on each heartbeat)and the sum of VMs already allocated to 
+          running tasks (i.e., sum of the VMEM task-limits). Next, the Scheduler
+          looks at the VMEM requirements for the job that's first in line to 
+          run. If the job's VMEM requirements are less than the available VMEM on 
+          the node, the job's task can be scheduled. If not, the Scheduler 
+          ensures that the TT does not get a task to run (provided the job 
+          has tasks to run). This way, the Scheduler ensures that jobs with 
+          high memory requirements are not starved, as eventually, the TT 
+          will have enough VMEM available. If the high-mem job does not have 
+          any task to run, the Scheduler moves on to the next job. 
+          </li>
+          <li>In addition to VMEM, the Capacity Scheduler can also consider 
+          RAM on the TT node. RAM is considered the same way as VMEM. TTs report
+          the total RAM available on their node, and an offset. If both are
+          set, the Scheduler computes the available RAM on the node. Next, 
+          the Scheduler figures out the RAM requirements of the job, if any. 
+          As with VMEM, users can optionally specify a RAM limit for their job
+          (<code>mapred.task.maxpmem</code>, described in the Map/Reduce 
+          tutorial). The Scheduler also maintains a limit for this value 
+          (<code>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</code>, 
+          described below). All these three values must be set for the 
+          Scheduler to schedule tasks based on RAM constraints.
+          </li>
+          <li>The Scheduler ensures that jobs cannot ask for RAM or VMEM higher
+          than configured limits. If this happens, the job is failed when it
+          is submitted. 
+          </li>
+        </ol>
+        
+        <p>As described above, the additional scheduler-based config 
+        parameters are as follows:</p>
+
+        <table>
+          <tr><th>Name</th><th>Description</th></tr>
+          <tr><td>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</td>
+          	<td>A percentage of the default VMEM limit for jobs
+          	(<code>mapred.task.default.maxvmem</code>). This is the default 
+          	RAM task-limit associated with a task. Unless overridden by a 
+          	job's setting, this number defines the RAM task-limit.</td>
+          </tr>
+          <tr><td>mapred.capacity-scheduler.task.limit.maxpmem</td>
+          <td>Configuration which provides an upper limit to maximum physical
+           memory which can be specified by a job. If a job requires more 
+           physical memory than what is specified in this limit then the same
+           is rejected.</td>
+          </tr>
+        </table>
+      </section>
+   <section>
+        <title>Job Initialization Parameters</title>
+        <p>Capacity scheduler lazily initializes the jobs before they are
+        scheduled, for reducing the memory footprint on jobtracker. 
+        Following are the parameters, by which you can control the laziness
+        of the job initialization. The following parameters can be 
+        configured in capacity-scheduler.xml
         </p>
+        
         <table>
+          <tr><th>Name</th><th>Description</th></tr>
+          <tr>
+            <td>
+              mapred.capacity-scheduler.queue.&lt;queue-name&gt;.maximum-initialized-jobs-per-user
+            </td>
+            <td>
+              Maximum number of jobs which are allowed to be pre-initialized for
+              a particular user in the queue. Once a job is scheduled, i.e. 
+              it starts running, then that job is not considered
+              while scheduler computes the maximum job a user is allowed to
+              initialize. 
+            </td>
+          </tr>
           <tr>
-          <th>Name</th><th>Description</th>
+            <td>
+              mapred.capacity-scheduler.init-poll-interval
+            </td>
+            <td>
+              Amount of time in miliseconds which is used to poll the scheduler
+              job queue to look for jobs to be initialized.
+            </td>
           </tr>
           <tr>
-          <td>mapred.capacity-scheduler.reclaimCapacity.interval</td>
-          <td>The time interval, in seconds, between which the scheduler 
-          periodically determines whether capacity needs to be reclaimed for 
-          any queue. The default value is 5 seconds.
-          </td>
+            <td>
+              mapred.capacity-scheduler.init-worker-threads
+            </td>
+            <td>
+              Number of worker threads which would be used by Initialization
+              poller to initialize jobs in a set of queue. If number mentioned 
+              in property is equal to number of job queues then a thread is 
+              assigned jobs from one queue. If the number configured is lesser than
+              number of queues, then a thread can get jobs from more than one queue
+              which it initializes in a round robin fashion. If the number configured
+              is greater than number of queues, then number of threads spawned
+              would be equal to number of job queues.
+            </td>
           </tr>
         </table>
-        
-      </section>
-
+      </section>   
       <section>
-        <title>Reviewing the configuration of the capacity scheduler</title>
+        <title>Reviewing the configuration of the Capacity Scheduler</title>
         <p>
           Once the installation and configuration is completed, you can review
           it after starting the Map/Reduce cluster from the admin UI.
@@ -270,7 +348,8 @@
               Information</em> column against each queue.</li>
         </ul>
       </section>
-    </section>
+      
+   </section>
   </body>
   
 </document>

Modified: hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/cluster_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/cluster_setup.xml?rev=771623&r1=771622&r2=771623&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/cluster_setup.xml (original)
+++ hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/cluster_setup.xml Tue May  5 08:46:09 2009
@@ -463,6 +463,120 @@
           </section>
           
         </section>
+        <section>
+        <title> Memory monitoring</title>
+        <p>A <code>TaskTracker</code>(TT) can be configured to monitor memory 
+        usage of tasks it spawns, so that badly-behaved jobs do not bring 
+        down a machine due to excess memory consumption. With monitoring 
+        enabled, every task is assigned a task-limit for virtual memory (VMEM). 
+        In addition, every node is assigned a node-limit for VMEM usage. 
+        A TT ensures that a task is killed if it, and 
+        its descendants, use VMEM over the task's per-task limit. It also 
+        ensures that one or more tasks are killed if the sum total of VMEM 
+        usage by all tasks, and their descendents, cross the node-limit.</p>
+        
+        <p>Users can, optionally, specify the VMEM task-limit per job. If no
+        such limit is provided, a default limit is used. A node-limit can be 
+        set per node.</p>   
+        <p>Currently the memory monitoring and management is only supported
+        in Linux platform.</p>
+        <p>To enable monitoring for a TT, the 
+        following parameters all need to be set:</p> 
+
+        <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td>mapred.tasktracker.vmem.reserved</td><td>long</td>
+            <td>A number, in bytes, that represents an offset. The total VMEM on 
+            the machine, minus this offset, is the VMEM node-limit for all 
+            tasks, and their descendants, spawned by the TT. 
+          </td></tr>
+          <tr><td>mapred.task.default.maxvmem</td><td>long</td>
+            <td>A number, in bytes, that represents the default VMEM task-limit 
+            associated with a task. Unless overridden by a job's setting, 
+            this number defines the VMEM task-limit.   
+          </td></tr>
+          <tr><td>mapred.task.limit.maxvmem</td><td>long</td>
+            <td>A number, in bytes, that represents the upper VMEM task-limit 
+            associated with a task. Users, when specifying a VMEM task-limit 
+            for their tasks, should not specify a limit which exceeds this amount. 
+          </td></tr>
+        </table>
+        
+        <p>In addition, the following parameters can also be configured.</p>
+
+    <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td>mapred.tasktracker.taskmemorymanager.monitoring-interval</td>
+            <td>long</td>
+            <td>The time interval, in milliseconds, between which the TT 
+            checks for any memory violation. The default value is 5000 msec
+            (5 seconds). 
+          </td></tr>
+        </table>
+        
+        <p>Here's how the memory monitoring works for a TT.</p>
+        <ol>
+          <li>If one or more of the configuration parameters described 
+          above are missing or -1 is specified , memory monitoring is 
+          disabled for the TT.
+          </li>
+          <li>In addition, monitoring is disabled if 
+          <code>mapred.task.default.maxvmem</code> is greater than 
+          <code>mapred.task.limit.maxvmem</code>. 
+          </li>
+          <li>If a TT receives a task whose task-limit is set by the user
+          to a value larger than <code>mapred.task.limit.maxvmem</code>, it 
+          logs a warning but executes the task.
+          </li> 
+          <li>Periodically, the TT checks the following: 
+          <ul>
+            <li>If any task's current VMEM usage is greater than that task's
+            VMEM task-limit, the task is killed and reason for killing 
+            the task is logged in task diagonistics . Such a task is considered 
+            failed, i.e., the killing counts towards the task's failure count.
+            </li> 
+            <li>If the sum total of VMEM used by all tasks and descendants is 
+            greater than the node-limit, the TT kills enough tasks, in the
+            order of least progress made, till the overall VMEM usage falls
+            below the node-limt. Such killed tasks are not considered failed
+            and their killing does not count towards the tasks' failure counts.
+            </li>
+          </ul>
+          </li>
+        </ol>
+        
+        <p>Schedulers can choose to ease the monitoring pressure on the TT by 
+        preventing too many tasks from running on a node and by scheduling 
+        tasks only if the TT has enough VMEM free. In addition, Schedulers may 
+        choose to consider the physical memory (RAM) available on the node
+        as well. To enable Scheduler support, TTs report their memory settings 
+        to the JobTracker in every heartbeat. Before getting into details, 
+        consider the following additional memory-related parameters than can be 
+        configured to enable better scheduling:</p> 
+
+        <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td>mapred.tasktracker.pmem.reserved</td><td>int</td>
+            <td>A number, in bytes, that represents an offset. The total 
+            physical memory (RAM) on the machine, minus this offset, is the 
+            recommended RAM node-limit. The RAM node-limit is a hint to a
+            Scheduler to scheduler only so many tasks such that the sum 
+            total of their RAM requirements does not exceed this limit. 
+            RAM usage is not monitored by a TT.   
+          </td></tr>
+        </table>
+        
+        <p>A TT reports the following memory-related numbers in every 
+        heartbeat:</p>
+        <ul>
+          <li>The total VMEM available on the node.</li>
+          <li>The value of <code>mapred.tasktracker.vmem.reserved</code>,
+           if set.</li>
+          <li>The total RAM available on the node.</li> 
+          <li>The value of <code>mapred.tasktracker.pmem.reserved</code>,
+           if set.</li>
+         </ul>
+        </section>
         
         <section>
           <title>Slaves</title>

Modified: hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=771623&r1=771622&r2=771623&view=diff
==============================================================================
--- hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Tue May  5 08:46:09 2009
@@ -1104,8 +1104,26 @@
         counters for a job- particularly relative to byte counts from the map
         and into the reduce- is invaluable to the tuning of these
         parameters.</p>
+        
+        <p>Users can choose to override default limits of Virtual Memory and RAM 
+          enforced by the task tracker, if memory management is enabled. 
+          Users can set the following parameter per job:</p>
+           
+          <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td><code>mapred.task.maxvmem</code></td><td>int</td>
+            <td>A number, in bytes, that represents the maximum Virtual Memory
+            task-limit for each task of the job. A task will be killed if 
+            it consumes more Virtual Memory than this number. 
+          </td></tr>
+          <tr><td>mapred.task.maxpmem</td><td>int</td>
+            <td>A number, in bytes, that represents the maximum RAM task-limit
+            for each task of the job. This number can be optionally used by
+            Schedulers to prevent over-scheduling of tasks on a node based 
+            on RAM needs.  
+          </td></tr>
+        </table>       
         </section>
-
         <section>
           <title>Map Parameters</title>