You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by to...@apache.org on 2010/06/14 23:16:49 UTC
svn commit: r954647 - in /hadoop/common/trunk: CHANGES.txt
src/docs/src/documentation/content/xdocs/cluster_setup.xml
src/docs/src/documentation/content/xdocs/site.xml
Author: tomwhite
Date: Mon Jun 14 21:16:47 2010
New Revision: 954647
URL: http://svn.apache.org/viewvc?rev=954647&view=rev
Log:
HADOOP-6821. Document changes to memory monitoring. Contributed by Hemanth Yamijala.
Modified:
hadoop/common/trunk/CHANGES.txt
hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml
hadoop/common/trunk/src/docs/src/documentation/content/xdocs/site.xml
Modified: hadoop/common/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/CHANGES.txt?rev=954647&r1=954646&r2=954647&view=diff
==============================================================================
--- hadoop/common/trunk/CHANGES.txt (original)
+++ hadoop/common/trunk/CHANGES.txt Mon Jun 14 21:16:47 2010
@@ -958,6 +958,9 @@ Release 0.21.0 - Unreleased
HADOOP-6668. Apply audience and stability annotations to classes in
common. (tomwhite)
+ HADOOP-6821. Document changes to memory monitoring. (Hemanth Yamijala
+ via tomwhite)
+
OPTIMIZATIONS
HADOOP-5595. NameNode does not need to run a replicator to choose a
Modified: hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml?rev=954647&r1=954646&r2=954647&view=diff
==============================================================================
--- hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml (original)
+++ hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml Mon Jun 14 21:16:47 2010
@@ -739,146 +739,250 @@
</li>
</ul>
</section>
- <section>
- <title> Memory management</title>
- <p>Users/admins can also specify the maximum virtual memory
- of the launched child-task, and any sub-process it launches
- recursively, using <code>mapred.{map|reduce}.child.ulimit</code>. Note
- that the value set here is a per process limit.
- The value for <code>mapred.{map|reduce}.child.ulimit</code> should be
- specified in kilo bytes (KB). And also the value must be greater than
- or equal to the -Xmx passed to JavaVM, else the VM might not start.
+ <section>
+ <title>Configuring Memory Parameters for MapReduce Jobs</title>
+ <p>
+ As MapReduce jobs could use varying amounts of memory, Hadoop
+ provides various configuration options to users and administrators
+ for managing memory effectively. Some of these options are job
+ specific and can be used by users. While setting up a cluster,
+ administrators can configure appropriate default values for these
+ options so that users jobs run out of the box. Other options are
+ cluster specific and can be used by administrators to enforce
+ limits and prevent misconfigured or memory intensive jobs from
+ causing undesired side effects on the cluster.
+ </p>
+ <p>
+ The values configured should
+ take into account the hardware resources of the cluster, such as the
+ amount of physical and virtual memory available for tasks,
+ the number of slots configured on the slaves and the requirements
+ for other processes running on the slaves. If right values are not
+ set, it is likely that jobs start failing with memory related
+ errors or in the worst case, even affect other tasks or
+ the slaves themselves.
</p>
-
- <p>Note: <code>mapred.{map|reduce}.child.java.opts</code> are used only for
- configuring the launched child tasks from task tracker. Configuring
- the memory options for daemons is documented in
- <a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
- cluster_setup.html </a></p>
-
- <p>The memory available to some parts of the framework is also
- configurable. In map and reduce tasks, performance may be influenced
- by adjusting parameters influencing the concurrency of operations and
- the frequency with which data will hit disk. Monitoring the filesystem
- counters for a job- particularly relative to byte counts from the map
- and into the reduce- is invaluable to the tuning of these
- parameters.</p>
- </section>
<section>
- <title> Memory monitoring</title>
- <p>A <code>TaskTracker</code>(TT) can be configured to monitor memory
- usage of tasks it spawns, so that badly-behaved jobs do not bring
- down a machine due to excess memory consumption. With monitoring
- enabled, every task is assigned a task-limit for virtual memory (VMEM).
- In addition, every node is assigned a node-limit for VMEM usage.
- A TT ensures that a task is killed if it, and
- its descendants, use VMEM over the task's per-task limit. It also
- ensures that one or more tasks are killed if the sum total of VMEM
- usage by all tasks, and their descendents, cross the node-limit.</p>
+ <title>Monitoring Task Memory Usage</title>
+ <p>
+ Before describing the memory options, it is
+ useful to look at a feature provided by Hadoop to monitor
+ memory usage of MapReduce tasks it runs. The basic objective
+ of this feature is to prevent MapReduce tasks from consuming
+ memory beyond a limit that would result in their affecting
+ other processes running on the slave, including other tasks
+ and daemons like the DataNode or TaskTracker.
+ </p>
- <p>Users can, optionally, specify the VMEM task-limit per job. If no
- such limit is provided, a default limit is used. A node-limit can be
- set per node.</p>
- <p>Currently the memory monitoring and management is only supported
- in Linux platform.</p>
- <p>To enable monitoring for a TT, the
- following parameters all need to be set:</p>
-
- <table>
- <tr><th>Name</th><th>Type</th><th>Description</th></tr>
- <tr><td>mapred.tasktracker.vmem.reserved</td><td>long</td>
- <td>A number, in bytes, that represents an offset. The total VMEM on
- the machine, minus this offset, is the VMEM node-limit for all
- tasks, and their descendants, spawned by the TT.
- </td></tr>
- <tr><td>mapred.task.default.maxvmem</td><td>long</td>
- <td>A number, in bytes, that represents the default VMEM task-limit
- associated with a task. Unless overridden by a job's setting,
- this number defines the VMEM task-limit.
- </td></tr>
- <tr><td>mapred.task.limit.maxvmem</td><td>long</td>
- <td>A number, in bytes, that represents the upper VMEM task-limit
- associated with a task. Users, when specifying a VMEM task-limit
- for their tasks, should not specify a limit which exceeds this amount.
- </td></tr>
- </table>
-
- <p>In addition, the following parameters can also be configured.</p>
+ <p>
+ <em>Note:</em> For the time being, this feature is available
+ only for the Linux platform.
+ </p>
+
+ <p>
+ Hadoop allows monitoring to be done both for virtual
+ and physical memory usage of tasks. This monitoring
+ can be done independently of each other, and therefore the
+ options can be configured independently of each other. It
+ has been found in some environments, particularly related
+ to streaming, that virtual memory recorded for tasks is high
+ because of libraries loaded by the programs used to run
+ the tasks. However, this memory is largely unused and does
+ not affect the slaves's memory itself. In such cases,
+ monitoring based on physical memory can provide a more
+ accurate picture of memory usage.
+ </p>
+
+ <p>
+ This feature considers that there is a limit on
+ the amount of virtual or physical memory on the slaves
+ that can be used by
+ the running MapReduce tasks. The rest of the memory is
+ assumed to be required for the system and other processes.
+ Since some jobs may require higher amount of memory for their
+ tasks than others, Hadoop allows jobs to specify how much
+ memory they expect to use at a maximum. Then by using
+ resource aware scheduling and monitoring, Hadoop tries to
+ ensure that at any time, only enough tasks are running on
+ the slaves as can meet the dual constraints of an individual
+ job's memory requirements and the total amount of memory
+ available for all MapReduce tasks.
+ </p>
+
+ <p>
+ The TaskTracker monitors tasks in regular intervals. Each time,
+ it operates in two steps:
+ </p>
+
+ <ul>
+
+ <li>
+ In the first step, it
+ checks that a job's task and any child processes it
+ launches are not cumulatively using more virtual or physical
+ memory than specified. If both virtual and physical memory
+ monitoring is enabled, then virtual memory usage is checked
+ first, followed by physical memory usage.
+ Any task that is found to
+ use more memory is killed along with any child processes it
+ might have launched, and the task status is marked
+ <em>failed</em>. Repeated failures such as this will terminate
+ the job.
+ </li>
+
+ <li>
+ In the next step, it checks that the cumulative virtual and
+ physical memory
+ used by all running tasks and their child processes
+ does not exceed the total virtual and physical memory limit,
+ respectively. Again, virtual memory limit is checked first,
+ followed by physical memory limit. In this case, it kills
+ enough number of tasks, along with any child processes they
+ might have launched, until the cumulative memory usage
+ is brought under limit. In the case of virtual memory limit
+ being exceeded, the tasks chosen for killing are
+ the ones that have made the least progress. In the case of
+ physical memory limit being exceeded, the tasks chosen
+ for killing are the ones that have used the maximum amount
+ of physical memory. Also, the status
+ of these tasks is marked as <em>killed</em>, and hence repeated
+ occurrence of this will not result in a job failure.
+ </li>
+
+ </ul>
+
+ <p>
+ In either case, the task's diagnostic message will indicate the
+ reason why the task was terminated.
+ </p>
+
+ <p>
+ Resource aware scheduling can ensure that tasks are scheduled
+ on a slave only if their memory requirement can be satisfied
+ by the slave. The Capacity Scheduler, for example,
+ takes virtual memory requirements into account while
+ scheduling tasks, as described in the section on
+ <a href="ext:capacity-scheduler/MemoryBasedTaskScheduling">
+ memory based scheduling</a>.
+ </p>
+
+ <p>
+ Memory monitoring is enabled when certain configuration
+ variables are defined with non-zero values, as described below.
+ </p>
+
+ </section>
- <table>
- <tr><th>Name</th><th>Type</th><th>Description</th></tr>
- <tr><td>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</td>
- <td>long</td>
- <td>The time interval, in milliseconds, between which the TT
- checks for any memory violation. The default value is 5000 msec
- (5 seconds).
- </td></tr>
- </table>
+ <section>
+ <title>Job Specific Options</title>
+ <p>
+ Memory related options that can be configured individually per
+ job are described in detail in the section on
+ <a href="ext:mapred-tutorial/ConfiguringMemoryRequirements">
+ Configuring Memory Requirements For A Job</a> in the MapReduce
+ tutorial. While setting up
+ the cluster, the Hadoop defaults for these options can be reviewed
+ and changed to better suit the job profiles expected to be run on
+ the clusters, as also the hardware configuration.
+ </p>
+ <p>
+ As with any other configuration option in Hadoop, if the
+ administrators desire to prevent users from overriding these
+ options in jobs they submit, these values can be marked as
+ <em>final</em> in the cluster configuration.
+ </p>
+ </section>
- <p>Here's how the memory monitoring works for a TT.</p>
- <ol>
- <li>If one or more of the configuration parameters described
- above are missing or -1 is specified , memory monitoring is
- disabled for the TT.
+
+ <section>
+ <title>Cluster Specific Options</title>
+
+ <p>
+ This section describes the memory related options that are
+ used by the JobTracker and TaskTrackers, and cannot be changed
+ by jobs. The values set for these options should be the same
+ for all the slave nodes in a cluster.
+ </p>
+
+ <ul>
+
+ <li>
+ <code>mapreduce.cluster.{map|reduce}memory.mb</code>: These
+ options define the default amount of virtual memory that should be
+ allocated for MapReduce tasks running in the cluster. They
+ typically match the default values set for the options
+ <code>mapreduce.{map|reduce}.memory.mb</code>. They help in the
+ calculation of the total amount of virtual memory available for
+ MapReduce tasks on a slave, using the following equation:<br/>
+ <em>Total virtual memory for all MapReduce tasks =
+ (mapreduce.cluster.mapmemory.mb *
+ mapreduce.tasktracker.map.tasks.maximum) +
+ (mapreduce.cluster.reducememory.mb *
+ mapreduce.tasktracker.reduce.tasks.maximum)</em><br/>
+ Typically, reduce tasks require more memory than map tasks.
+ Hence a higher value is recommended for
+ <em>mapreduce.cluster.reducememory.mb</em>. The value is
+ specified in MB. To set a value of 2GB for reduce tasks, set
+ <em>mapreduce.cluster.reducememory.mb</em> to 2048.
</li>
- <li>In addition, monitoring is disabled if
- <code>mapred.task.default.maxvmem</code> is greater than
- <code>mapred.task.limit.maxvmem</code>.
+
+ <li>
+ <code>mapreduce.jobtracker.max{map|reduce}memory.mb</code>:
+ These options define the maximum amount of virtual memory that
+ can be requested by jobs using the parameters
+ <code>mapreduce.{map|reduce}.memory.mb</code>. The system
+ will reject any job that is submitted requesting for more
+ memory than these limits. Typically, the values for these
+ options should be set to satisfy the following constraint:<br/>
+ <em>mapreduce.jobtracker.maxmapmemory.mb =
+ mapreduce.cluster.mapmemory.mb *
+ mapreduce.tasktracker.map.tasks.maximum<br/>
+ mapreduce.jobtracker.maxreducememory.mb =
+ mapreduce.cluster.reducememory.mb *
+ mapreduce.tasktracker.reduce.tasks.maximum</em><br/>
+ The value is specified in MB. If
+ <code>mapreduce.cluster.reducememory.mb</code> is set to 2GB and
+ there are 2 reduce slots configured in the slaves, the value
+ for <code>mapreduce.jobtracker.maxreducememory.mb</code> should
+ be set to 4096.
</li>
- <li>If a TT receives a task whose task-limit is set by the user
- to a value larger than <code>mapred.task.limit.maxvmem</code>, it
- logs a warning but executes the task.
- </li>
- <li>Periodically, the TT checks the following:
- <ul>
- <li>If any task's current VMEM usage is greater than that task's
- VMEM task-limit, the task is killed and reason for killing
- the task is logged in task diagonistics . Such a task is considered
- failed, i.e., the killing counts towards the task's failure count.
- </li>
- <li>If the sum total of VMEM used by all tasks and descendants is
- greater than the node-limit, the TT kills enough tasks, in the
- order of least progress made, till the overall VMEM usage falls
- below the node-limt. Such killed tasks are not considered failed
- and their killing does not count towards the tasks' failure counts.
- </li>
- </ul>
+
+ <li>
+ <code>mapreduce.tasktracker.reserved.physicalmemory.mb</code>:
+ This option defines the amount of physical memory that is
+ marked for system and daemon processes. Using this, the amount
+ of physical memory available for MapReduce tasks is calculated
+ using the following equation:<br/>
+ <em>Total physical memory for all MapReduce tasks =
+ Total physical memory available on the system -
+ mapreduce.tasktracker.reserved.physicalmemory.mb</em><br/>
+ The value is specified in MB. To set this value to 2GB,
+ specify the value as 2048.
</li>
- </ol>
-
- <p>Schedulers can choose to ease the monitoring pressure on the TT by
- preventing too many tasks from running on a node and by scheduling
- tasks only if the TT has enough VMEM free. In addition, Schedulers may
- choose to consider the physical memory (RAM) available on the node
- as well. To enable Scheduler support, TTs report their memory settings
- to the JobTracker in every heartbeat. Before getting into details,
- consider the following additional memory-related parameters than can be
- configured to enable better scheduling:</p>
- <table>
- <tr><th>Name</th><th>Type</th><th>Description</th></tr>
- <tr><td>mapred.tasktracker.pmem.reserved</td><td>int</td>
- <td>A number, in bytes, that represents an offset. The total
- physical memory (RAM) on the machine, minus this offset, is the
- recommended RAM node-limit. The RAM node-limit is a hint to a
- Scheduler to scheduler only so many tasks such that the sum
- total of their RAM requirements does not exceed this limit.
- RAM usage is not monitored by a TT.
- </td></tr>
- </table>
-
- <p>A TT reports the following memory-related numbers in every
- heartbeat:</p>
- <ul>
- <li>The total VMEM available on the node.</li>
- <li>The value of <code>mapred.tasktracker.vmem.reserved</code>,
- if set.</li>
- <li>The total RAM available on the node.</li>
- <li>The value of <code>mapred.tasktracker.pmem.reserved</code>,
- if set.</li>
- </ul>
+ <li>
+ <code>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</code>:
+ This option defines the time the TaskTracker waits between
+ two cycles of memory monitoring. The value is specified in
+ milliseconds.
+ </li>
+
+ </ul>
+
+ <p>
+ <em>Note:</em> The virtual memory monitoring function is only
+ enabled if
+ the variables <code>mapreduce.cluster.{map|reduce}memory.mb</code>
+ and <code>mapreduce.jobtracker.max{map|reduce}memory.mb</code>
+ are set to values greater than zero. Likewise, the physical
+ memory monitoring function is only enabled if the variable
+ <code>mapreduce.tasktracker.reserved.physicalmemory.mb</code>
+ is set to a value greater than zero.
+ </p>
</section>
+ </section>
+
<section>
<title>Task Controllers</title>
Modified: hadoop/common/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=954647&r1=954646&r2=954647&view=diff
==============================================================================
--- hadoop/common/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/common/trunk/src/docs/src/documentation/content/xdocs/site.xml Mon Jun 14 21:16:47 2010
@@ -72,9 +72,12 @@ See http://forrest.apache.org/docs/linki
<mapred-default href="http://hadoop.apache.org/mapreduce/docs/current/mapred-default.html" />
<mapred-queues href="http://hadoop.apache.org/mapreduce/docs/current/mapred_queues.xml" />
- <capacity-scheduler href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html" />
+ <capacity-scheduler href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html">
+ <MemoryBasedTaskScheduling href="#Scheduling+Tasks+Considering+Memory+Requirements" />
+ </capacity-scheduler>
<mapred-tutorial href="http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html" >
<JobAuthorization href="#Job+Authorization" />
+ <ConfiguringMemoryRequirements href="#Configuring+Memory+Requirements+For+A+Job" />
</mapred-tutorial>
<streaming href="http://hadoop.apache.org/mapreduce/docs/current/streaming.html" />
<distcp href="http://hadoop.apache.org/mapreduce/docs/current/distcp.html" />