You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-commits@hadoop.apache.org by vi...@apache.org on 2010/06/14 07:20:42 UTC
svn commit: r954365 - in /hadoop/mapreduce/branches/branch-0.21: CHANGES.txt src/docs/src/documentation/content/xdocs/capacity_scheduler.xml src/docs/src/documentation/content/xdocs/mapred_tutorial.xml src/docs/src/documentation/content/xdocs/site.xml

Author: vinodkv
Date: Mon Jun 14 05:20:42 2010
New Revision: 954365

URL: http://svn.apache.org/viewvc?rev=954365&view=rev
Log:
MAPREDUCE-1018. Merge --ignore-ancestry -c 954364 from trunk into branch-0.21.

Modified:
    hadoop/mapreduce/branches/branch-0.21/CHANGES.txt
    hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
    hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
    hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml

Modified: hadoop/mapreduce/branches/branch-0.21/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/CHANGES.txt?rev=954365&r1=954364&r2=954365&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/CHANGES.txt (original)
+++ hadoop/mapreduce/branches/branch-0.21/CHANGES.txt Mon Jun 14 05:20:42 2010
@@ -749,6 +749,9 @@ Release 0.21.0 - Unreleased
     MAPREDUCE-1033. Resolve location of scripts and configuration files after
     project split. (tomwhite)
 
+    MAPREDUCE-1018. Document changes to the memory management and scheduling
+    model. (Hemanth Yamijala via vinodkv)
+
   OPTIMIZATIONS
 
     MAPREDUCE-270. Fix the tasktracker to optionally send an out-of-band

Modified: hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml?rev=954365&r1=954364&r2=954365&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml (original)
+++ hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml Mon Jun 14 05:20:42 2010
@@ -114,6 +114,69 @@
       
       <p>Once a job is selected, the Scheduler picks a task to run. This logic 
       to pick a task remains unchanged from earlier versions.</p> 
+
+      <section>
+        <title>Scheduling Tasks Considering Memory Requirements</title>
+      
+        <p>
+        The Capacity Scheduler supports scheduling of tasks on a
+        TaskTracker based on a job's virtual memory requirements and 
+        the availability
+        of enough virtual memory on the TaskTracker node. By doing so, it
+        simplifies the virtual memory monitoring function on the 
+        TaskTracker node, described in the section on 
+        <a href="ext:cluster-setup/ConfiguringMemoryParameters">
+        Monitoring Task Memory Usage</a> in the Cluster Setup guide. 
+        Refer to that section for more details on how memory for 
+        MapReduce tasks is handled.
+        </p>
+        
+        <p>
+        Virtual memory based task scheduling uses the same parameters as
+        the memory monitoring function of the TaskTracker, and is enabled
+        along with virtual memory monitoring. When enabled, the scheduler
+        ensures that a task is scheduled on a TaskTracker only when the
+        virtual memory required by the map or reduce task can be assured 
+        by the TaskTracker. That is, the task is scheduled only if the 
+        following constraint is satisfied:<br/>
+        <code>
+        Job's mapreduce.{map|reduce}.memory.mb of the job &lt;=
+          total virtual memory for all map or reduce tasks on the TaskTracker -
+            total virtual memory required for all running map or reduce tasks 
+            on the TaskTracker
+        </code><br/>
+        </p>
+        
+        <p>
+        When a task at the front of the scheduler's queue cannot be scheduled
+        on a TaskTracker due to insufficient memory, the scheduler creates
+        a virtual <em>reservation</em> for this task. This can continue
+        for all pending tasks on a job, subject to other capacity constraints.
+        Once all tasks are either scheduled or have reservations, the 
+        scheduler will proceed to schedule other jobs's tasks that are not 
+        necessarily at the front of the queue, but meet memory constraints 
+        of the TaskTracker.
+        By following this reservation procedure of reserving just enough
+        TaskTrackers, the scheduler balances between not starving jobs with 
+        high memory requirements and under-utilizing cluster resources.
+        </p>
+
+        <p>
+        Tasks of jobs that require more virtual memory than the
+        per slot <code>mapreduce.cluster.{map|reduce}memory.mb</code>
+        value, are treated as occupying more than one slot, and account
+        for a corresponding increased capacity usage for their queue.
+        The number of slots they occupy is determined as:<br/>
+        <code>
+        Number of slots for a task = mapreduce.{map|reduce}.memory.mb / 
+        mapreduce.cluster.{map|reduce}memory.mb
+        </code><br/>
+        However, special tasks run by the framework like setup 
+        and cleanup tasks do not count for more than 1 slot, 
+        irrespective of their job's memory requirements.
+        </p>
+
+      </section>
       
     </section>
     
@@ -295,82 +358,6 @@
         capacity-scheduler.</p>
       </section>
       
-      <section>
-        <title>Memory Management</title>
-      
-        <p>The Capacity Scheduler supports scheduling of tasks on a
-        <code>TaskTracker</code>(TT) based on a job's memory requirements
-        and the availability of RAM and Virtual Memory (VMEM) on the TT node.
-        See the 
-        <a href="mapred_tutorial.html">MapReduce Tutorial</a> 
-        for details on how the TT monitors memory usage.</p>
-        <p>Currently the memory based scheduling is only supported in Linux platform.</p>
-        <p>Memory-based scheduling works as follows:</p>
-        <ol>
-          <li>The absence of any one or more of three config parameters 
-          or -1 being set as value of any of the parameters, 
-          <code>mapred.tasktracker.vmem.reserved</code>, 
-          <code>mapred.task.default.maxvmem</code>, or
-          <code>mapred.task.limit.maxvmem</code>, disables memory-based
-          scheduling, just as it disables memory monitoring for a TT. These
-          config parameters are described in the 
-          <a href="mapred_tutorial.html">MapReduce Tutorial</a>. 
-          The value of  
-          <code>mapred.tasktracker.vmem.reserved</code> is 
-          obtained from the TT via its heartbeat. 
-          </li>
-          <li>If all the three mandatory parameters are set, the Scheduler 
-          enables VMEM-based scheduling. First, the Scheduler computes the free
-          VMEM on the TT. This is the difference between the available VMEM on the
-          TT (the node's total VMEM minus the offset, both of which are sent by 
-          the TT on each heartbeat)and the sum of VMs already allocated to 
-          running tasks (i.e., sum of the VMEM task-limits). Next, the Scheduler
-          looks at the VMEM requirements for the job that's first in line to 
-          run. If the job's VMEM requirements are less than the available VMEM on 
-          the node, the job's task can be scheduled. If not, the Scheduler 
-          ensures that the TT does not get a task to run (provided the job 
-          has tasks to run). This way, the Scheduler ensures that jobs with 
-          high memory requirements are not starved, as eventually, the TT 
-          will have enough VMEM available. If the high-mem job does not have 
-          any task to run, the Scheduler moves on to the next job. 
-          </li>
-          <li>In addition to VMEM, the Capacity Scheduler can also consider 
-          RAM on the TT node. RAM is considered the same way as VMEM. TTs report
-          the total RAM available on their node, and an offset. If both are
-          set, the Scheduler computes the available RAM on the node. Next, 
-          the Scheduler figures out the RAM requirements of the job, if any. 
-          As with VMEM, users can optionally specify a RAM limit for their job
-          (<code>mapred.task.maxpmem</code>, described in the MapReduce 
-          tutorial). The Scheduler also maintains a limit for this value 
-          (<code>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</code>, 
-          described below). All these three values must be set for the 
-          Scheduler to schedule tasks based on RAM constraints.
-          </li>
-          <li>The Scheduler ensures that jobs cannot ask for RAM or VMEM higher
-          than configured limits. If this happens, the job is failed when it
-          is submitted. 
-          </li>
-        </ol>
-        
-        <p>As described above, the additional scheduler-based config 
-        parameters are as follows:</p>
-
-        <table>
-          <tr><th>Name</th><th>Description</th></tr>
-          <tr><td>mapred.capacity-scheduler.task.default-pmem-<br/>percentage-in-vmem</td>
-          	<td>A percentage of the default VMEM limit for jobs
-          	(<code>mapred.task.default.maxvmem</code>). This is the default 
-          	RAM task-limit associated with a task. Unless overridden by a 
-          	job's setting, this number defines the RAM task-limit.</td>
-          </tr>
-          <tr><td>mapred.capacity-scheduler.task.limit.maxpmem</td>
-          <td>Configuration which provides an upper limit to maximum physical
-           memory which can be specified by a job. If a job requires more 
-           physical memory than what is specified in this limit then the same
-           is rejected.</td>
-          </tr>
-        </table>
-      </section>
    <section>
         <title>Job Initialization Parameters</title>
         <p>Capacity scheduler lazily initializes the jobs before they are

Modified: hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=954365&r1=954364&r2=954365&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Mon Jun 14 05:20:42 2010
@@ -1201,49 +1201,107 @@
         </p>
         
         <section>
-        <title> Memory Management</title>
-        <p>Users/admins can also specify the maximum virtual memory 
-        of the launched child-task, and any sub-process it launches 
-        recursively, using <code>mapred.{map|reduce}.child.ulimit</code>. Note 
-        that the value set here is a per process limit.
-        The value for <code>mapred.{map|reduce}.child.ulimit</code> should be 
-        specified in kilo bytes (KB). And also the value must be greater than
-        or equal to the -Xmx passed to JavaVM, or else the VM might not start. 
-        </p>
-        
-        <p>Note: <code>mapred.{map|reduce}.child.java.opts</code> are used only 
-        for configuring the launched child tasks from task tracker. Configuring 
-        the memory options for daemons is documented under
-        <a href="ext:cluster-setup/ConfiguringEnvironmentHadoopDaemons">
-        Configuring the Environment of the Hadoop Daemons</a> (Cluster Setup).</p>
-        
-        <p>The memory available to some parts of the framework is also
-        configurable. In map and reduce tasks, performance may be influenced
-        by adjusting parameters influencing the concurrency of operations and
-        the frequency with which data will hit disk. Monitoring the filesystem
-        counters for a job- particularly relative to byte counts from the map
-        and into the reduce- is invaluable to the tuning of these
-        parameters.</p>
-        
-        <p>Users can choose to override default limits of Virtual Memory and RAM 
-          enforced by the task tracker, if memory management is enabled. 
-          Users can set the following parameter per job:</p>
+         <title>Configuring Memory Requirements For A Job</title>
+         
+         <p>
+         MapReduce tasks are launched with some default memory limits
+         that are provided by the system or by the cluster's administrators.
+         Memory intensive jobs might need to use more than these default
+         values. Hadoop has some configuration options that allow these to 
+         be changed.
+         Without such modifications, memory intensive jobs could fail due
+         to <code>OutOfMemory</code> errors in tasks or could get killed
+         when the limits are enforced by the system. This section describes
+         the various options that can be used to configure specific
+         memory requirements.
+         </p>
+          
+         <ul>
+         
+           <li>
+           <code>mapreduce.{map|reduce}.java.opts</code>: If the task
+           requires more Java heap space, this option must be used. The
+           value of this option should pass the desired heap using the JVM
+           option -Xmx. For example, to use 1G of heap space, the option
+           should be passed in as -Xmx1024m. Note that other JVM options
+           are also passed using the same option. Hence, append the
+           heap space option along with other options already configured.
+           </li>
+         
+           <li>
+           <code>mapreduce.{map|reduce}.ulimit</code>: The slaves where
+           tasks are run could be configured with a ulimit value that
+           applies a limit to every process that is launched on the slave.
+           If the task, or any child that the task launches (like in
+           streaming), requires more than the configured limit, this option
+           must be used. The value is given in kilobytes. For example, to
+           increase the ulimit to 1G, the option should be set to 1048576.
+           Note that this value is a per process limit. Since it applies 
+           to the JVM as well, the heap space given to the JVM through 
+           the <code>mapreduce.{map|reduce}.java.opts</code> should be less
+           than the value configured for the ulimit. Otherwise the JVM
+           will not start.
+           </li>
            
-          <table>
-          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
-          <tr><td><code>mapred.task.maxvmem</code></td><td>int</td>
-            <td>A number, in bytes, that represents the maximum Virtual Memory
-            task-limit for each task of the job. A task will be killed if 
-            it consumes more Virtual Memory than this number. 
-          </td></tr>
-          <tr><td>mapred.task.maxpmem</td><td>int</td>
-            <td>A number, in bytes, that represents the maximum RAM task-limit
-            for each task of the job. This number can be optionally used by
-            Schedulers to prevent over-scheduling of tasks on a node based 
-            on RAM needs.  
-          </td></tr>
-        </table>       
+           <li>
+           <code>mapreduce.{map|reduce}.memory.mb</code>: In some 
+           environments, administrators might have configured a total limit
+           on the virtual memory used by the entire process tree for a task, 
+           including all processes launched recursively by the task or 
+           its children, like in streaming. More details about this can be
+           found in the section on 
+           <a href="ext:cluster-setup/ConfiguringMemoryParameters">
+           Monitoring Task Memory Usage</a> in the Cluster SetUp guide.
+           If a task requires more virtual memory for its entire tree, 
+           this option 
+           must be used. The value is given in MB. For example, to set 
+           the limit to 1G, the option should be set to 1024. Note that this 
+           value does not automatically influence the per process ulimit or 
+           heap space. Hence, you may need to set those parameters as well 
+           (as described above) in order to give your tasks the right amount 
+           of memory.
+           </li>
+          
+           <li>
+           <code>mapreduce.{map|reduce}.memory.physical.mb</code>: 
+           This parameter is similar to 
+           <code>mapreduce.{map|reduce}.memory.mb</code>, except it specifies
+           how much physical memory is required by a task for its entire
+           tree of processes. The parameter is applicable if administrators
+           have configured a total limit on the physical memory used by
+           all MapReduce tasks.
+           </li>
+ 
+         </ul>
+         
+         <p>
+         As seen above, each of the options can be specified separately for
+         map and reduce tasks. It is typically the case that the different
+         types of tasks have different memory requirements. Hence different
+         values can be set for the corresponding options.
+         </p>
+         
+         <p>
+         The memory available to some parts of the framework is also
+         configurable. In map and reduce tasks, performance may be influenced
+         by adjusting parameters influencing the concurrency of operations and
+         the frequency with which data will hit disk. Monitoring the filesystem
+         counters for a job- particularly relative to byte counts from the map
+         and into the reduce- is invaluable to the tuning of these
+         parameters.
+         </p>
+           
+         <p>
+         Note: The memory related configuration options described above 
+         are used only for configuring the launched child tasks from the 
+         tasktracker. Configuring the memory options for daemons is documented 
+         under
+         <a href="ext:cluster-setup/ConfiguringEnvironmentHadoopDaemons">
+         Configuring the Environment of the Hadoop Daemons</a> (Cluster Setup).
+         </p>
+          
         </section>
+
         <section>
           <title>Map Parameters</title>
 

Modified: hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml?rev=954365&r1=954364&r2=954365&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml Mon Jun 14 05:20:42 2010
@@ -85,6 +85,7 @@ See http://forrest.apache.org/docs/linki
       <RefreshingQueueConfiguration href="#Refreshing+queue+configuration"/>
       <mapred-queues.xml href="#mapred-queues.xml"/>
       <ConfiguringEnvironmentHadoopDaemons href="#Configuring+the+Environment+of+the+Hadoop+Daemons"/>
+      <ConfiguringMemoryParameters href="#Configuring+Memory+Parameters+for+MapReduce+Jobs" />
       <ConfiguringHadoopDaemons href="#Configuring+the+Hadoop+Daemons"/>
       <FullyDistributedOperation href="#Fully-Distributed+Operation"/>
     </cluster-setup>