You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-commits@hadoop.apache.org by vi...@apache.org on 2010/06/14 07:20:42 UTC
svn commit: r954365 - in /hadoop/mapreduce/branches/branch-0.21: CHANGES.txt
src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
src/docs/src/documentation/content/xdocs/site.xml
Author: vinodkv
Date: Mon Jun 14 05:20:42 2010
New Revision: 954365
URL: http://svn.apache.org/viewvc?rev=954365&view=rev
Log:
MAPREDUCE-1018. Merge --ignore-ancestry -c 954364 from trunk into branch-0.21.
Modified:
hadoop/mapreduce/branches/branch-0.21/CHANGES.txt
hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml
Modified: hadoop/mapreduce/branches/branch-0.21/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/CHANGES.txt?rev=954365&r1=954364&r2=954365&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/CHANGES.txt (original)
+++ hadoop/mapreduce/branches/branch-0.21/CHANGES.txt Mon Jun 14 05:20:42 2010
@@ -749,6 +749,9 @@ Release 0.21.0 - Unreleased
MAPREDUCE-1033. Resolve location of scripts and configuration files after
project split. (tomwhite)
+ MAPREDUCE-1018. Document changes to the memory management and scheduling
+ model. (Hemanth Yamijala via vinodkv)
+
OPTIMIZATIONS
MAPREDUCE-270. Fix the tasktracker to optionally send an out-of-band
Modified: hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml?rev=954365&r1=954364&r2=954365&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml (original)
+++ hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml Mon Jun 14 05:20:42 2010
@@ -114,6 +114,69 @@
<p>Once a job is selected, the Scheduler picks a task to run. This logic
to pick a task remains unchanged from earlier versions.</p>
+
+ <section>
+ <title>Scheduling Tasks Considering Memory Requirements</title>
+
+ <p>
+ The Capacity Scheduler supports scheduling of tasks on a
+ TaskTracker based on a job's virtual memory requirements and
+ the availability
+ of enough virtual memory on the TaskTracker node. By doing so, it
+ simplifies the virtual memory monitoring function on the
+ TaskTracker node, described in the section on
+ <a href="ext:cluster-setup/ConfiguringMemoryParameters">
+ Monitoring Task Memory Usage</a> in the Cluster Setup guide.
+ Refer to that section for more details on how memory for
+ MapReduce tasks is handled.
+ </p>
+
+ <p>
+ Virtual memory based task scheduling uses the same parameters as
+ the memory monitoring function of the TaskTracker, and is enabled
+ along with virtual memory monitoring. When enabled, the scheduler
+ ensures that a task is scheduled on a TaskTracker only when the
+ virtual memory required by the map or reduce task can be assured
+ by the TaskTracker. That is, the task is scheduled only if the
+ following constraint is satisfied:<br/>
+ <code>
+ Job's mapreduce.{map|reduce}.memory.mb of the job <=
+ total virtual memory for all map or reduce tasks on the TaskTracker -
+ total virtual memory required for all running map or reduce tasks
+ on the TaskTracker
+ </code><br/>
+ </p>
+
+ <p>
+ When a task at the front of the scheduler's queue cannot be scheduled
+ on a TaskTracker due to insufficient memory, the scheduler creates
+ a virtual <em>reservation</em> for this task. This can continue
+ for all pending tasks on a job, subject to other capacity constraints.
+ Once all tasks are either scheduled or have reservations, the
+ scheduler will proceed to schedule other jobs's tasks that are not
+ necessarily at the front of the queue, but meet memory constraints
+ of the TaskTracker.
+ By following this reservation procedure of reserving just enough
+ TaskTrackers, the scheduler balances between not starving jobs with
+ high memory requirements and under-utilizing cluster resources.
+ </p>
+
+ <p>
+ Tasks of jobs that require more virtual memory than the
+ per slot <code>mapreduce.cluster.{map|reduce}memory.mb</code>
+ value, are treated as occupying more than one slot, and account
+ for a corresponding increased capacity usage for their queue.
+ The number of slots they occupy is determined as:<br/>
+ <code>
+ Number of slots for a task = mapreduce.{map|reduce}.memory.mb /
+ mapreduce.cluster.{map|reduce}memory.mb
+ </code><br/>
+ However, special tasks run by the framework like setup
+ and cleanup tasks do not count for more than 1 slot,
+ irrespective of their job's memory requirements.
+ </p>
+
+ </section>
</section>
@@ -295,82 +358,6 @@
capacity-scheduler.</p>
</section>
- <section>
- <title>Memory Management</title>
-
- <p>The Capacity Scheduler supports scheduling of tasks on a
- <code>TaskTracker</code>(TT) based on a job's memory requirements
- and the availability of RAM and Virtual Memory (VMEM) on the TT node.
- See the
- <a href="mapred_tutorial.html">MapReduce Tutorial</a>
- for details on how the TT monitors memory usage.</p>
- <p>Currently the memory based scheduling is only supported in Linux platform.</p>
- <p>Memory-based scheduling works as follows:</p>
- <ol>
- <li>The absence of any one or more of three config parameters
- or -1 being set as value of any of the parameters,
- <code>mapred.tasktracker.vmem.reserved</code>,
- <code>mapred.task.default.maxvmem</code>, or
- <code>mapred.task.limit.maxvmem</code>, disables memory-based
- scheduling, just as it disables memory monitoring for a TT. These
- config parameters are described in the
- <a href="mapred_tutorial.html">MapReduce Tutorial</a>.
- The value of
- <code>mapred.tasktracker.vmem.reserved</code> is
- obtained from the TT via its heartbeat.
- </li>
- <li>If all the three mandatory parameters are set, the Scheduler
- enables VMEM-based scheduling. First, the Scheduler computes the free
- VMEM on the TT. This is the difference between the available VMEM on the
- TT (the node's total VMEM minus the offset, both of which are sent by
- the TT on each heartbeat)and the sum of VMs already allocated to
- running tasks (i.e., sum of the VMEM task-limits). Next, the Scheduler
- looks at the VMEM requirements for the job that's first in line to
- run. If the job's VMEM requirements are less than the available VMEM on
- the node, the job's task can be scheduled. If not, the Scheduler
- ensures that the TT does not get a task to run (provided the job
- has tasks to run). This way, the Scheduler ensures that jobs with
- high memory requirements are not starved, as eventually, the TT
- will have enough VMEM available. If the high-mem job does not have
- any task to run, the Scheduler moves on to the next job.
- </li>
- <li>In addition to VMEM, the Capacity Scheduler can also consider
- RAM on the TT node. RAM is considered the same way as VMEM. TTs report
- the total RAM available on their node, and an offset. If both are
- set, the Scheduler computes the available RAM on the node. Next,
- the Scheduler figures out the RAM requirements of the job, if any.
- As with VMEM, users can optionally specify a RAM limit for their job
- (<code>mapred.task.maxpmem</code>, described in the MapReduce
- tutorial). The Scheduler also maintains a limit for this value
- (<code>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</code>,
- described below). All these three values must be set for the
- Scheduler to schedule tasks based on RAM constraints.
- </li>
- <li>The Scheduler ensures that jobs cannot ask for RAM or VMEM higher
- than configured limits. If this happens, the job is failed when it
- is submitted.
- </li>
- </ol>
-
- <p>As described above, the additional scheduler-based config
- parameters are as follows:</p>
-
- <table>
- <tr><th>Name</th><th>Description</th></tr>
- <tr><td>mapred.capacity-scheduler.task.default-pmem-<br/>percentage-in-vmem</td>
- <td>A percentage of the default VMEM limit for jobs
- (<code>mapred.task.default.maxvmem</code>). This is the default
- RAM task-limit associated with a task. Unless overridden by a
- job's setting, this number defines the RAM task-limit.</td>
- </tr>
- <tr><td>mapred.capacity-scheduler.task.limit.maxpmem</td>
- <td>Configuration which provides an upper limit to maximum physical
- memory which can be specified by a job. If a job requires more
- physical memory than what is specified in this limit then the same
- is rejected.</td>
- </tr>
- </table>
- </section>
<section>
<title>Job Initialization Parameters</title>
<p>Capacity scheduler lazily initializes the jobs before they are
Modified: hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=954365&r1=954364&r2=954365&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Mon Jun 14 05:20:42 2010
@@ -1201,49 +1201,107 @@
</p>
<section>
- <title> Memory Management</title>
- <p>Users/admins can also specify the maximum virtual memory
- of the launched child-task, and any sub-process it launches
- recursively, using <code>mapred.{map|reduce}.child.ulimit</code>. Note
- that the value set here is a per process limit.
- The value for <code>mapred.{map|reduce}.child.ulimit</code> should be
- specified in kilo bytes (KB). And also the value must be greater than
- or equal to the -Xmx passed to JavaVM, or else the VM might not start.
- </p>
-
- <p>Note: <code>mapred.{map|reduce}.child.java.opts</code> are used only
- for configuring the launched child tasks from task tracker. Configuring
- the memory options for daemons is documented under
- <a href="ext:cluster-setup/ConfiguringEnvironmentHadoopDaemons">
- Configuring the Environment of the Hadoop Daemons</a> (Cluster Setup).</p>
-
- <p>The memory available to some parts of the framework is also
- configurable. In map and reduce tasks, performance may be influenced
- by adjusting parameters influencing the concurrency of operations and
- the frequency with which data will hit disk. Monitoring the filesystem
- counters for a job- particularly relative to byte counts from the map
- and into the reduce- is invaluable to the tuning of these
- parameters.</p>
-
- <p>Users can choose to override default limits of Virtual Memory and RAM
- enforced by the task tracker, if memory management is enabled.
- Users can set the following parameter per job:</p>
+ <title>Configuring Memory Requirements For A Job</title>
+
+ <p>
+ MapReduce tasks are launched with some default memory limits
+ that are provided by the system or by the cluster's administrators.
+ Memory intensive jobs might need to use more than these default
+ values. Hadoop has some configuration options that allow these to
+ be changed.
+ Without such modifications, memory intensive jobs could fail due
+ to <code>OutOfMemory</code> errors in tasks or could get killed
+ when the limits are enforced by the system. This section describes
+ the various options that can be used to configure specific
+ memory requirements.
+ </p>
+
+ <ul>
+
+ <li>
+ <code>mapreduce.{map|reduce}.java.opts</code>: If the task
+ requires more Java heap space, this option must be used. The
+ value of this option should pass the desired heap using the JVM
+ option -Xmx. For example, to use 1G of heap space, the option
+ should be passed in as -Xmx1024m. Note that other JVM options
+ are also passed using the same option. Hence, append the
+ heap space option along with other options already configured.
+ </li>
+
+ <li>
+ <code>mapreduce.{map|reduce}.ulimit</code>: The slaves where
+ tasks are run could be configured with a ulimit value that
+ applies a limit to every process that is launched on the slave.
+ If the task, or any child that the task launches (like in
+ streaming), requires more than the configured limit, this option
+ must be used. The value is given in kilobytes. For example, to
+ increase the ulimit to 1G, the option should be set to 1048576.
+ Note that this value is a per process limit. Since it applies
+ to the JVM as well, the heap space given to the JVM through
+ the <code>mapreduce.{map|reduce}.java.opts</code> should be less
+ than the value configured for the ulimit. Otherwise the JVM
+ will not start.
+ </li>
- <table>
- <tr><th>Name</th><th>Type</th><th>Description</th></tr>
- <tr><td><code>mapred.task.maxvmem</code></td><td>int</td>
- <td>A number, in bytes, that represents the maximum Virtual Memory
- task-limit for each task of the job. A task will be killed if
- it consumes more Virtual Memory than this number.
- </td></tr>
- <tr><td>mapred.task.maxpmem</td><td>int</td>
- <td>A number, in bytes, that represents the maximum RAM task-limit
- for each task of the job. This number can be optionally used by
- Schedulers to prevent over-scheduling of tasks on a node based
- on RAM needs.
- </td></tr>
- </table>
+ <li>
+ <code>mapreduce.{map|reduce}.memory.mb</code>: In some
+ environments, administrators might have configured a total limit
+ on the virtual memory used by the entire process tree for a task,
+ including all processes launched recursively by the task or
+ its children, like in streaming. More details about this can be
+ found in the section on
+ <a href="ext:cluster-setup/ConfiguringMemoryParameters">
+ Monitoring Task Memory Usage</a> in the Cluster SetUp guide.
+ If a task requires more virtual memory for its entire tree,
+ this option
+ must be used. The value is given in MB. For example, to set
+ the limit to 1G, the option should be set to 1024. Note that this
+ value does not automatically influence the per process ulimit or
+ heap space. Hence, you may need to set those parameters as well
+ (as described above) in order to give your tasks the right amount
+ of memory.
+ </li>
+
+ <li>
+ <code>mapreduce.{map|reduce}.memory.physical.mb</code>:
+ This parameter is similar to
+ <code>mapreduce.{map|reduce}.memory.mb</code>, except it specifies
+ how much physical memory is required by a task for its entire
+ tree of processes. The parameter is applicable if administrators
+ have configured a total limit on the physical memory used by
+ all MapReduce tasks.
+ </li>
+
+ </ul>
+
+ <p>
+ As seen above, each of the options can be specified separately for
+ map and reduce tasks. It is typically the case that the different
+ types of tasks have different memory requirements. Hence different
+ values can be set for the corresponding options.
+ </p>
+
+ <p>
+ The memory available to some parts of the framework is also
+ configurable. In map and reduce tasks, performance may be influenced
+ by adjusting parameters influencing the concurrency of operations and
+ the frequency with which data will hit disk. Monitoring the filesystem
+ counters for a job- particularly relative to byte counts from the map
+ and into the reduce- is invaluable to the tuning of these
+ parameters.
+ </p>
+
+ <p>
+ Note: The memory related configuration options described above
+ are used only for configuring the launched child tasks from the
+ tasktracker. Configuring the memory options for daemons is documented
+ under
+ <a href="ext:cluster-setup/ConfiguringEnvironmentHadoopDaemons">
+ Configuring the Environment of the Hadoop Daemons</a> (Cluster Setup).
+ </p>
+
</section>
+
<section>
<title>Map Parameters</title>
Modified: hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml?rev=954365&r1=954364&r2=954365&view=diff
==============================================================================
--- hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/mapreduce/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml Mon Jun 14 05:20:42 2010
@@ -85,6 +85,7 @@ See http://forrest.apache.org/docs/linki
<RefreshingQueueConfiguration href="#Refreshing+queue+configuration"/>
<mapred-queues.xml href="#mapred-queues.xml"/>
<ConfiguringEnvironmentHadoopDaemons href="#Configuring+the+Environment+of+the+Hadoop+Daemons"/>
+ <ConfiguringMemoryParameters href="#Configuring+Memory+Parameters+for+MapReduce+Jobs" />
<ConfiguringHadoopDaemons href="#Configuring+the+Hadoop+Daemons"/>
<FullyDistributedOperation href="#Fully-Distributed+Operation"/>
</cluster-setup>