You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by to...@apache.org on 2010/06/04 18:35:48 UTC
svn commit: r951482 - in /hadoop/common/branches/branch-0.21: ./
src/docs/src/documentation/content/xdocs/
Author: tomwhite
Date: Fri Jun 4 16:35:47 2010
New Revision: 951482
URL: http://svn.apache.org/viewvc?rev=951482&view=rev
Log:
Merge -r 951479:951480 from trunk to branch-0.21. Fixes: HADOOP-6738
Added:
hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/commands_manual.xml
- copied unchanged from r951480, hadoop/common/trunk/src/docs/src/documentation/content/xdocs/commands_manual.xml
hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/hod_scheduler.xml
- copied unchanged from r951480, hadoop/common/trunk/src/docs/src/documentation/content/xdocs/hod_scheduler.xml
Modified:
hadoop/common/branches/branch-0.21/CHANGES.txt
hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml
hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml
hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml
Modified: hadoop/common/branches/branch-0.21/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.21/CHANGES.txt?rev=951482&r1=951481&r2=951482&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.21/CHANGES.txt (original)
+++ hadoop/common/branches/branch-0.21/CHANGES.txt Fri Jun 4 16:35:47 2010
@@ -867,6 +867,9 @@ Release 0.21.0 - Unreleased
HADOOP-6585. Add FileStatus#isDirectory and isFile. (Eli Collins via
tomwhite)
+ HADOOP-6738. Move cluster_setup.xml from MapReduce to Common.
+ (Tom White via tomwhite)
+
OPTIMIZATIONS
HADOOP-5595. NameNode does not need to run a replicator to choose a
Modified: hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml?rev=951482&r1=951481&r2=951482&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml (original)
+++ hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml Fri Jun 4 16:35:47 2010
@@ -33,20 +33,20 @@
Hadoop clusters ranging from a few nodes to extremely large clusters with
thousands of nodes.</p>
<p>
- To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="single_node_setup.html"> Single Node Setup</a>).
+ To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="single_node_setup.html"> Hadoop Quick Start</a>).
</p>
</section>
<section>
- <title>Prerequisites</title>
+ <title>Pre-requisites</title>
<ol>
<li>
- Make sure all <a href="single_node_setup.html#PreReqs">required software</a>
+ Make sure all <a href="single_node_setup.html#PreReqs">requisite</a> software
is installed on all nodes in your cluster.
</li>
<li>
- <a href="single_node_setup.html#Download">Download</a> the Hadoop software.
+ <a href="single_node_setup.html#Download">Get</a> the Hadoop software.
</li>
</ol>
</section>
@@ -81,21 +81,23 @@
<ol>
<li>
Read-only default configuration -
- <a href="ext:common-default">src/common/common-default.xml</a>,
- <a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a> and
- <a href="ext:mapred-default">src/mapred/mapred-default.xml</a>.
+ <a href="ext:common-default">src/core/core-default.xml</a>,
+ <a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a>,
+ <a href="ext:mapred-default">src/mapred/mapred-default.xml</a> and
+ <a href="ext:mapred-queues">conf/mapred-queues.xml.template</a>.
</li>
<li>
Site-specific configuration -
- <em>conf/core-site.xml</em>,
- <em>conf/hdfs-site.xml</em> and
- <em>conf/mapred-site.xml</em>.
+ <a href="#core-site.xml">conf/core-site.xml</a>,
+ <a href="#hdfs-site.xml">conf/hdfs-site.xml</a>,
+ <a href="#mapred-site.xml">conf/mapred-site.xml</a> and
+ <a href="#mapred-queues.xml">conf/mapred-queues.xml</a>.
</li>
</ol>
<p>To learn more about how the Hadoop framework is controlled by these
- configuration files see
- <a href="ext:api/org/apache/hadoop/conf/configuration">Class Configuration</a>.</p>
+ configuration files, look
+ <a href="ext:api/org/apache/hadoop/conf/configuration">here</a>.</p>
<p>Additionally, you can control the Hadoop scripts found in the
<code>bin/</code> directory of the distribution, by setting site-specific
@@ -163,9 +165,8 @@
<title>Configuring the Hadoop Daemons</title>
<p>This section deals with important parameters to be specified in the
- following:
- <br/>
- <code>conf/core-site.xml</code>:</p>
+ following:</p>
+ <anchor id="core-site.xml"/><p><code>conf/core-site.xml</code>:</p>
<table>
<tr>
@@ -180,7 +181,7 @@
</tr>
</table>
- <p><br/><code>conf/hdfs-site.xml</code>:</p>
+ <anchor id="hdfs-site.xml"/><p><code>conf/hdfs-site.xml</code>:</p>
<table>
<tr>
@@ -212,7 +213,7 @@
</tr>
</table>
- <p><br/><code>conf/mapred-site.xml</code>:</p>
+ <anchor id="mapred-site.xml"/><p><code>conf/mapred-site.xml</code>:</p>
<table>
<tr>
@@ -221,12 +222,12 @@
<th>Notes</th>
</tr>
<tr>
- <td>mapred.job.tracker</td>
+ <td>mapreduce.jobtracker.address</td>
<td>Host or IP and port of <code>JobTracker</code>.</td>
<td><em>host:port</em> pair.</td>
</tr>
<tr>
- <td>mapred.system.dir</td>
+ <td>mapreduce.jobtracker.system.dir</td>
<td>
Path on the HDFS where where the Map/Reduce framework stores
system files e.g. <code>/hadoop/mapred/system/</code>.
@@ -237,7 +238,7 @@
</td>
</tr>
<tr>
- <td>mapred.local.dir</td>
+ <td>mapreduce.cluster.local.dir</td>
<td>
Comma-separated list of paths on the local filesystem where
temporary Map/Reduce data is written.
@@ -264,7 +265,7 @@
</td>
</tr>
<tr>
- <td>mapred.hosts/mapred.hosts.exclude</td>
+ <td>mapreduce.jobtracker.hosts.filename/mapreduce.jobtracker.hosts.exclude.filename</td>
<td>List of permitted/excluded TaskTrackers.</td>
<td>
If necessary, use these files to control the list of allowable
@@ -272,82 +273,331 @@
</td>
</tr>
<tr>
- <td>mapred.queue.names</td>
- <td>Comma separated list of queues to which jobs can be submitted.</td>
+ <td>mapreduce.cluster.job-authorization-enabled</td>
+ <td>Boolean, specifying whether job ACLs are supported for
+ authorizing view and modification of a job</td>
<td>
- The Map/Reduce system always supports atleast one queue
- with the name as <em>default</em>. Hence, this parameter's
- value should always contain the string <em>default</em>.
- Some job schedulers supported in Hadoop, like the
- <a href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html">Capacity Scheduler</a>,
- support multiple queues. If such a scheduler is
- being used, the list of configured queue names must be
- specified here. Once queues are defined, users can submit
- jobs to a queue using the property name
- <em>mapred.job.queue.name</em> in the job configuration.
- There could be a separate
- configuration file for configuring properties of these
- queues that is managed by the scheduler.
- Refer to the documentation of the scheduler for information on
- the same.
+ If <em>true</em>, job ACLs would be checked while viewing or
+ modifying a job. More details are available at
+ <a href ="ext:mapred-tutorial/JobAuthorization">Job Authorization</a>.
</td>
</tr>
- <tr>
- <td>mapred.acls.enabled</td>
- <td>Specifies whether ACLs are supported for controlling job
- submission and administration</td>
- <td>
- If <em>true</em>, ACLs would be checked while submitting
- and administering jobs. ACLs can be specified using the
- configuration parameters of the form
- <em>mapred.queue.queue-name.acl-name</em>, defined below.
- </td>
- </tr>
- </table>
-
- <p><br/><code> conf/mapred-queue-acls.xml</code></p>
-
- <table>
- <tr>
- <th>Parameter</th>
- <th>Value</th>
- <th>Notes</th>
- </tr>
- <tr>
- <td>mapred.queue.<em>queue-name</em>.acl-submit-job</td>
- <td>List of users and groups that can submit jobs to the
- specified <em>queue-name</em>.</td>
- <td>
- The list of users and groups are both comma separated
- list of names. The two lists are separated by a blank.
- Example: <em>user1,user2 group1,group2</em>.
- If you wish to define only a list of groups, provide
- a blank at the beginning of the value.
- </td>
- </tr>
- <tr>
- <td>mapred.queue.<em>queue-name</em>.acl-administer-job</td>
- <td>List of users and groups that can change the priority
- or kill jobs that have been submitted to the
- specified <em>queue-name</em>.</td>
- <td>
- The list of users and groups are both comma separated
- list of names. The two lists are separated by a blank.
- Example: <em>user1,user2 group1,group2</em>.
- If you wish to define only a list of groups, provide
- a blank at the beginning of the value. Note that an
- owner of a job can always change the priority or kill
- his/her own job, irrespective of the ACLs.
- </td>
- </tr>
- </table>
-
+
+ </table>
<p>Typically all the above parameters are marked as
<a href="ext:api/org/apache/hadoop/conf/configuration/final_parameters">
final</a> to ensure that they cannot be overriden by user-applications.
</p>
+ <anchor id="mapred-queues.xml"/><p><code>conf/mapred-queues.xml
+ </code>:</p>
+ <p>This file is used to configure the queues in the Map/Reduce
+ system. Queues are abstract entities in the JobTracker that can be
+ used to manage collections of jobs. They provide a way for
+ administrators to organize jobs in specific ways and to enforce
+ certain policies on such collections, thus providing varying
+ levels of administrative control and management functions on jobs.
+ </p>
+ <p>One can imagine the following sample scenarios:</p>
+ <ul>
+ <li> Jobs submitted by a particular group of users can all be
+ submitted to one queue. </li>
+ <li> Long running jobs in an organization can be submitted to a
+ queue. </li>
+ <li> Short running jobs can be submitted to a queue and the number
+ of jobs that can run concurrently can be restricted. </li>
+ </ul>
+ <p>The usage of queues is closely tied to the scheduler configured
+ at the JobTracker via <em>mapreduce.jobtracker.taskscheduler</em>.
+ The degree of support of queues depends on the scheduler used. Some
+ schedulers support a single queue, while others support more complex
+ configurations. Schedulers also implement the policies that apply
+ to jobs in a queue. Some schedulers, such as the Fairshare scheduler,
+ implement their own mechanisms for collections of jobs and do not rely
+ on queues provided by the framework. The administrators are
+ encouraged to refer to the documentation of the scheduler they are
+ interested in for determining the level of support for queues.</p>
+ <p>The Map/Reduce framework supports some basic operations on queues
+ such as job submission to a specific queue, access control for queues,
+ queue states, viewing configured queues and their properties
+ and refresh of queue properties. In order to fully implement some of
+ these operations, the framework takes the help of the configured
+ scheduler.</p>
+ <p>The following types of queue configurations are possible:</p>
+ <ul>
+ <li> Single queue: The default configuration in Map/Reduce comprises
+ of a single queue, as supported by the default scheduler. All jobs
+ are submitted to this default queue which maintains jobs in a priority
+ based FIFO order.</li>
+ <li> Multiple single level queues: Multiple queues are defined, and
+ jobs can be submitted to any of these queues. Different policies
+ can be applied to these queues by schedulers that support this
+ configuration to provide a better level of support. For example,
+ the <a href="ext:capacity-scheduler">capacity scheduler</a>
+ provides ways of configuring different
+ capacity and fairness guarantees on these queues.</li>
+ <li> Hierarchical queues: Hierarchical queues are a configuration in
+ which queues can contain other queues within them recursively. The
+ queues that contain other queues are referred to as
+ container queues. Queues that do not contain other queues are
+ referred as leaf or job queues. Jobs can only be submitted to leaf
+ queues. Hierarchical queues can potentially offer a higher level
+ of control to administrators, as schedulers can now build a
+ hierarchy of policies where policies applicable to a container
+ queue can provide context for policies applicable to queues it
+ contains. It also opens up possibilities for delegating queue
+ administration where administration of queues in a container queue
+ can be turned over to a different set of administrators, within
+ the context provided by the container queue. For example, the
+ <a href="ext:capacity-scheduler">capacity scheduler</a>
+ uses hierarchical queues to partition capacity of a cluster
+ among container queues, and allowing queues they contain to divide
+ that capacity in more ways.</li>
+ </ul>
+
+ <p>Most of the configuration of the queues can be refreshed/reloaded
+ without restarting the Map/Reduce sub-system by editing this
+ configuration file as described in the section on
+ <a href="commands_manual.html#RefreshQueues">reloading queue
+ configuration</a>.
+ Not all configuration properties can be reloaded of course,
+ as will description of each property below explain.</p>
+
+ <p>The format of conf/mapred-queues.xml is different from the other
+ configuration files, supporting nested configuration
+ elements to support hierarchical queues. The format is as follows:
+ </p>
+
+ <source>
+ <queues aclsEnabled="$aclsEnabled">
+ <queue>
+ <name>$queue-name</name>
+ <state>$state</state>
+ <queue>
+ <name>$child-queue1</name>
+ <properties>
+ <property key="$key" value="$value"/>
+ ...
+ </properties>
+ <queue>
+ <name>$grand-child-queue1</name>
+ ...
+ </queue>
+ </queue>
+ <queue>
+ <name>$child-queue2</name>
+ ...
+ </queue>
+ ...
+ ...
+ ...
+ <queue>
+ <name>$leaf-queue</name>
+ <acl-submit-job>$acls</acl-submit-job>
+ <acl-administer-jobs>$acls</acl-administer-jobs>
+ <properties>
+ <property key="$key" value="$value"/>
+ ...
+ </properties>
+ </queue>
+ </queue>
+ </queues>
+ </source>
+ <table>
+ <tr>
+ <th>Tag/Attribute</th>
+ <th>Value</th>
+ <th>
+ <a href="commands_manual.html#RefreshQueues">Refresh-able?</a>
+ </th>
+ <th>Notes</th>
+ </tr>
+
+ <tr>
+ <td><anchor id="queues_tag"/>queues</td>
+ <td>Root element of the configuration file.</td>
+ <td>Not-applicable</td>
+ <td>All the queues are nested inside this root element of the
+ file. There can be only one root queues element in the file.</td>
+ </tr>
+
+ <tr>
+ <td>aclsEnabled</td>
+ <td>Boolean attribute to the
+ <a href="#queues_tag"><em><queues></em></a> tag
+ specifying whether ACLs are supported for controlling job
+ submission and administration for <em>all</em> the queues
+ configured.
+ </td>
+ <td>Yes</td>
+ <td>If <em>false</em>, ACLs are ignored for <em>all</em> the
+ configured queues. <br/><br/>
+ If <em>true</em>, the user and group details of the user
+ are checked against the configured ACLs of the corresponding
+ job-queue while submitting and administering jobs. ACLs can be
+ specified for each queue using the queue-specific tags
+ "acl-$acl_name", defined below. ACLs are checked only against
+ the job-queues, i.e. the leaf-level queues; ACLs configured
+ for the rest of the queues in the hierarchy are ignored.
+ </td>
+ </tr>
+
+ <tr>
+ <td><anchor id="queue_tag"/>queue</td>
+ <td>A child element of the
+ <a href="#queues_tag"><em><queues></em></a> tag or another
+ <a href="#queue_tag"><em><queue></em></a>. Denotes a queue
+ in the system.
+ </td>
+ <td>Not applicable</td>
+ <td>Queues can be hierarchical and so this element can contain
+ children of this same type.</td>
+ </tr>
+
+ <tr>
+ <td>name</td>
+ <td>Child element of a
+ <a href="#queue_tag"><em><queue></em></a> specifying the
+ name of the queue.</td>
+ <td>No</td>
+ <td>Name of the queue cannot contain the character <em>":"</em>
+ which is reserved as the queue-name delimiter when addressing a
+ queue in a hierarchy.</td>
+ </tr>
+
+ <tr>
+ <td>state</td>
+ <td>Child element of a
+ <a href="#queue_tag"><em><queue></em></a> specifying the
+ state of the queue.
+ </td>
+ <td>Yes</td>
+ <td>Each queue has a corresponding state. A queue in
+ <em>'running'</em> state can accept new jobs, while a queue in
+ <em>'stopped'</em> state will stop accepting any new jobs. State
+ is defined and respected by the framework only for the
+ leaf-level queues and is ignored for all other queues.
+ <br/><br/>
+ The state of the queue can be viewed from the command line using
+ <code>'bin/mapred queue'</code> command and also on the the Web
+ UI.<br/><br/>
+ Administrators can stop and start queues at runtime using the
+ feature of <a href="commands_manual.html#RefreshQueues">reloading
+ queue configuration</a>. If a queue is stopped at runtime, it
+ will complete all the existing running jobs and will stop
+ accepting any new jobs.
+ </td>
+ </tr>
+
+ <tr>
+ <td>acl-submit-job</td>
+ <td>Child element of a
+ <a href="#queue_tag"><em><queue></em></a> specifying the
+ list of users and groups that can submit jobs to the specified
+ queue.</td>
+ <td>Yes</td>
+ <td>
+ Applicable only to leaf-queues.<br/><br/>
+ The list of users and groups are both comma separated
+ list of names. The two lists are separated by a blank.
+ Example: <em>user1,user2 group1,group2</em>.
+ If you wish to define only a list of groups, provide
+ a blank at the beginning of the value.
+ <br/><br/>
+ </td>
+ </tr>
+
+ <tr>
+ <td>acl-administer-job</td>
+ <td>Child element of a
+ <a href="#queue_tag"><em><queue></em></a> specifying the
+ list of users and groups that can change the priority of a job
+ or kill a job that has been submitted to the specified queue.
+ </td>
+ <td>Yes</td>
+ <td>
+ Applicable only to leaf-queues.<br/><br/>
+ The list of users and groups are both comma separated
+ list of names. The two lists are separated by a blank.
+ Example: <em>user1,user2 group1,group2</em>.
+ If you wish to define only a list of groups, provide
+ a blank at the beginning of the value. Note that an
+ owner of a job can always change the priority or kill
+ his/her own job, irrespective of the ACLs.
+ </td>
+ </tr>
+
+ <tr>
+ <td><anchor id="properties_tag"/>properties</td>
+ <td>Child element of a
+ <a href="#queue_tag"><em><queue></em></a> specifying the
+ scheduler specific properties.</td>
+ <td>Not applicable</td>
+ <td>The scheduler specific properties are the children of this
+ element specified as a group of <property> tags described
+ below. The JobTracker completely ignores these properties. These
+ can be used as per-queue properties needed by the scheduler
+ being configured. Please look at the scheduler specific
+ documentation as to how these properties are used by that
+ particular scheduler.
+ </td>
+ </tr>
+
+ <tr>
+ <td><anchor id="property_tag"/>property</td>
+ <td>Child element of
+ <a href="#properties_tag"><em><properties></em></a> for a
+ specific queue.</td>
+ <td>Not applicable</td>
+ <td>A single scheduler specific queue-property. Ignored by
+ the JobTracker and used by the scheduler that is configured.</td>
+ </tr>
+
+ <tr>
+ <td>key</td>
+ <td>Attribute of a
+ <a href="#property_tag"><em><property></em></a> for a
+ specific queue.</td>
+ <td>Scheduler-specific</td>
+ <td>The name of a single scheduler specific queue-property.</td>
+ </tr>
+
+ <tr>
+ <td>value</td>
+ <td>Attribute of a
+ <a href="#property_tag"><em><property></em></a> for a
+ specific queue.</td>
+ <td>Scheduler-specific</td>
+ <td>The value of a single scheduler specific queue-property.
+ The value can be anything that is left for the proper
+ interpretation by the scheduler that is configured.</td>
+ </tr>
+
+ </table>
+
+ <p>Once the queues are configured properly and the Map/Reduce
+ system is up and running, from the command line one can
+ <a href="commands_manual.html#QueuesList">get the list
+ of queues</a> and
+ <a href="commands_manual.html#QueuesInfo">obtain
+ information specific to each queue</a>. This information is also
+ available from the web UI. On the web UI, queue information can be
+ seen by going to queueinfo.jsp, linked to from the queues table-cell
+ in the cluster-summary table. The queueinfo.jsp prints the hierarchy
+ of queues as well as the specific information for each queue.
+ </p>
+
+ <p> Users can submit jobs only to a
+ leaf-level queue by specifying the fully-qualified queue-name for
+ the property name <em>mapreduce.job.queuename</em> in the job
+ configuration. The character ':' is the queue-name delimiter and so,
+ for e.g., if one wants to submit to a configured job-queue 'Queue-C'
+ which is one of the sub-queues of 'Queue-B' which in-turn is a
+ sub-queue of 'Queue-A', then the job configuration should contain
+ property <em>mapreduce.job.queuename</em> set to the <em>
+ <value>Queue-A:Queue-B:Queue-C</value></em></p>
+ </section>
<section>
<title>Real-World Cluster Configurations</title>
@@ -383,7 +633,7 @@
</tr>
<tr>
<td>conf/mapred-site.xml</td>
- <td>mapred.reduce.parallel.copies</td>
+ <td>mapreduce.reduce.shuffle.parallelcopies</td>
<td>20</td>
<td>
Higher number of parallel copies run by reduces to fetch
@@ -392,7 +642,7 @@
</tr>
<tr>
<td>conf/mapred-site.xml</td>
- <td>mapred.map.child.java.opts</td>
+ <td>mapreduce.map.java.opts</td>
<td>-Xmx512M</td>
<td>
Larger heap-size for child jvms of maps.
@@ -400,7 +650,7 @@
</tr>
<tr>
<td>conf/mapred-site.xml</td>
- <td>mapred.reduce.child.java.opts</td>
+ <td>mapreduce.reduce.java.opts</td>
<td>-Xmx512M</td>
<td>
Larger heap-size for child jvms of reduces.
@@ -417,13 +667,13 @@
</tr>
<tr>
<td>conf/core-site.xml</td>
- <td>io.sort.factor</td>
+ <td>mapreduce.task.io.sort.factor</td>
<td>100</td>
<td>More streams merged at once while sorting files.</td>
</tr>
<tr>
<td>conf/core-site.xml</td>
- <td>io.sort.mb</td>
+ <td>mapreduce.task.io.sort.mb</td>
<td>200</td>
<td>Higher memory-limit while sorting data.</td>
</tr>
@@ -448,7 +698,7 @@
</tr>
<tr>
<td>conf/mapred-site.xml</td>
- <td>mapred.job.tracker.handler.count</td>
+ <td>mapreduce.jobtracker.handler.count</td>
<td>60</td>
<td>
More JobTracker server threads to handle RPCs from large
@@ -457,13 +707,13 @@
</tr>
<tr>
<td>conf/mapred-site.xml</td>
- <td>mapred.reduce.parallel.copies</td>
+ <td>mapreduce.reduce.shuffle.parallelcopies</td>
<td>50</td>
<td></td>
</tr>
<tr>
<td>conf/mapred-site.xml</td>
- <td>tasktracker.http.threads</td>
+ <td>mapreduce.tasktracker.http.threads</td>
<td>50</td>
<td>
More worker threads for the TaskTracker's http server. The
@@ -473,7 +723,7 @@
</tr>
<tr>
<td>conf/mapred-site.xml</td>
- <td>mapred.map.child.java.opts</td>
+ <td>mapreduce.map.java.opts</td>
<td>-Xmx512M</td>
<td>
Larger heap-size for child jvms of maps.
@@ -481,7 +731,7 @@
</tr>
<tr>
<td>conf/mapred-site.xml</td>
- <td>mapred.reduce.child.java.opts</td>
+ <td>mapreduce.reduce.java.opts</td>
<td>-Xmx1024M</td>
<td>Larger heap-size for child jvms of reduces.</td>
</tr>
@@ -500,11 +750,11 @@
or equal to the -Xmx passed to JavaVM, else the VM might not start.
</p>
- <p>Note: <code>mapred.child.java.opts</code> are used only for
+ <p>Note: <code>mapred.{map|reduce}.child.java.opts</code> are used only for
configuring the launched child tasks from task tracker. Configuring
- the memory options for daemons is documented under
+ the memory options for daemons is documented in
<a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
- Configuring the Environment of the Hadoop Daemons</a>.</p>
+ cluster_setup.html </a></p>
<p>The memory available to some parts of the framework is also
configurable. In map and reduce tasks, performance may be influenced
@@ -558,7 +808,7 @@
<table>
<tr><th>Name</th><th>Type</th><th>Description</th></tr>
- <tr><td>mapred.tasktracker.taskmemorymanager.monitoring-interval</td>
+ <tr><td>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</td>
<td>long</td>
<td>The time interval, in milliseconds, between which the TT
checks for any memory violation. The default value is 5000 msec
@@ -668,10 +918,11 @@
the tasks. For maximum security, this task controller
sets up restricted permissions and user/group ownership of
local files and directories used by the tasks such as the
- job jar files, intermediate files and task log files. Currently
- permissions on distributed cache files are opened up to be
- accessible by all users. In future, it is expected that stricter
- file permissions are set for these files too.
+ job jar files, intermediate files, task log files and distributed
+ cache files. Particularly note that, because of this, except the
+ job owner and tasktracker, no other user can access any of the
+ local files/directories including those localized as part of the
+ distributed cache.
</td>
</tr>
</table>
@@ -684,7 +935,7 @@
<th>Property</th><th>Value</th><th>Notes</th>
</tr>
<tr>
- <td>mapred.task.tracker.task-controller</td>
+ <td>mapreduce.tasktracker.taskcontroller</td>
<td>Fully qualified class name of the task controller class</td>
<td>Currently there are two implementations of task controller
in the Hadoop system, DefaultTaskController and LinuxTaskController.
@@ -715,21 +966,35 @@
<p>
The executable must have specific permissions as follows. The
executable should have <em>6050 or --Sr-s---</em> permissions
- user-owned by root(super-user) and group-owned by a group
- of which only the TaskTracker's user is the sole group member.
+ user-owned by root(super-user) and group-owned by a special group
+ of which the TaskTracker's user is the group member and no job
+ submitter is. If any job submitter belongs to this special group,
+ security will be compromised. This special group name should be
+ specified for the configuration property
+ <em>"mapreduce.tasktracker.group"</em> in both mapred-site.xml and
+ <a href="#task-controller.cfg">task-controller.cfg</a>.
For example, let's say that the TaskTracker is run as user
<em>mapred</em> who is part of the groups <em>users</em> and
- <em>mapredGroup</em> any of them being the primary group.
+ <em>specialGroup</em> any of them being the primary group.
Let also be that <em>users</em> has both <em>mapred</em> and
- another user <em>X</em> as its members, while <em>mapredGroup</em>
- has only <em>mapred</em> as its member. Going by the above
+ another user (job submitter) <em>X</em> as its members, and X does
+ not belong to <em>specialGroup</em>. Going by the above
description, the setuid/setgid executable should be set
<em>6050 or --Sr-s---</em> with user-owner as <em>mapred</em> and
- group-owner as <em>mapredGroup</em> which has
- only <em>mapred</em> as its member(and not <em>users</em> which has
+ group-owner as <em>specialGroup</em> which has
+ <em>mapred</em> as its member(and not <em>users</em> which has
<em>X</em> also as its member besides <em>mapred</em>).
</p>
+
+ <p>
+ The LinuxTaskController requires that paths including and leading up
+ to the directories specified in
+ <em>mapreduce.cluster.local.dir</em> and <em>hadoop.log.dir</em> to
+ be set 755 permissions.
+ </p>
+ <section>
+ <title>task-controller.cfg</title>
<p>The executable requires a configuration file called
<em>taskcontroller.cfg</em> to be
present in the configuration directory passed to the ant target
@@ -747,8 +1012,8 @@
</p>
<table><tr><th>Name</th><th>Description</th></tr>
<tr>
- <td>mapred.local.dir</td>
- <td>Path to mapred local directories. Should be same as the value
+ <td>mapreduce.cluster.local.dir</td>
+ <td>Path to mapreduce.cluster.local.directories. Should be same as the value
which was provided to key in mapred-site.xml. This is required to
validate paths passed to the setuid executable in order to prevent
arbitrary paths being passed to it.</td>
@@ -760,14 +1025,16 @@
permissions on the log files so that they can be written to by the user's
tasks and read by the TaskTracker for serving on the web UI.</td>
</tr>
+ <tr>
+ <td>mapreduce.tasktracker.group</td>
+ <td>Group to which the TaskTracker belongs. The group owner of the
+ taskcontroller binary should be this group. Should be same as
+ the value with which the TaskTracker is configured. This
+ configuration is required for validating the secure access of the
+ task-controller binary.</td>
+ </tr>
</table>
-
- <p>
- The LinuxTaskController requires that paths including and leading up to
- the directories specified in
- <em>mapred.local.dir</em> and <em>hadoop.log.dir</em> to be set 755
- permissions.
- </p>
+ </section>
</section>
</section>
@@ -800,7 +1067,7 @@
monitoring script in <em>mapred-site.xml</em>.</p>
<table>
<tr><th>Name</th><th>Description</th></tr>
- <tr><td><code>mapred.healthChecker.script.path</code></td>
+ <tr><td><code>mapreduce.tasktracker.healthchecker.script.path</code></td>
<td>Absolute path to the script which is periodically run by the
TaskTracker to determine if the node is
healthy or not. The file should be executable by the TaskTracker.
@@ -809,18 +1076,18 @@
is not started.</td>
</tr>
<tr>
- <td><code>mapred.healthChecker.interval</code></td>
+ <td><code>mapreduce.tasktracker.healthchecker.interval</code></td>
<td>Frequency at which the node health script is run,
in milliseconds</td>
</tr>
<tr>
- <td><code>mapred.healthChecker.script.timeout</code></td>
+ <td><code>mapreduce.tasktracker.healthchecker.script.timeout</code></td>
<td>Time after which the node health script will be killed by
the TaskTracker if unresponsive.
The node is marked unhealthy. if node health script times out.</td>
</tr>
<tr>
- <td><code>mapred.healthChecker.script.args</code></td>
+ <td><code>mapreduce.tasktracker.healthchecker.script.args</code></td>
<td>Extra arguments that can be passed to the node health script
when launched.
These should be comma separated list of arguments. </td>
@@ -857,17 +1124,17 @@
<title>History Logging</title>
<p> The job history files are stored in central location
- <code> hadoop.job.history.location </code> which can be on DFS also,
+ <code> mapreduce.jobtracker.jobhistory.location </code> which can be on DFS also,
whose default value is <code>${HADOOP_LOG_DIR}/history</code>.
The history web UI is accessible from job tracker web UI.</p>
<p> The history files are also logged to user specified directory
- <code>hadoop.job.history.user.location</code>
+ <code>mapreduce.job.userhistorylocation</code>
which defaults to job output directory. The files are stored in
"_logs/history/" in the specified directory. Hence, by default
- they will be in "mapred.output.dir/_logs/history/". User can stop
+ they will be in "mapreduce.output.fileoutputformat.outputdir/_logs/history/". User can stop
logging by giving the value <code>none</code> for
- <code>hadoop.job.history.user.location</code> </p>
+ <code>mapreduce.job.userhistorylocation</code> </p>
<p> User can view the history logs summary in specified directory
using the following command <br/>
@@ -880,7 +1147,6 @@
<code>$ bin/hadoop job -history all output-dir</code><br/></p>
</section>
</section>
- </section>
<p>Once all the necessary configuration is complete, distribute the files
to the <code>HADOOP_CONF_DIR</code> directory on all the machines,
@@ -891,9 +1157,9 @@
<section>
<title>Map/Reduce</title>
<p>The job tracker restart can recover running jobs if
- <code>mapred.jobtracker.restart.recover</code> is set true and
+ <code>mapreduce.jobtracker.restart.recover</code> is set true and
<a href="#Logging">JobHistory logging</a> is enabled. Also
- <code>mapred.jobtracker.job.history.block.size</code> value should be
+ <code>mapreduce.jobtracker.jobhistory.block.size</code> value should be
set to an optimal value to dump job history to disk as soon as
possible, the typical value is 3145728(3MB).</p>
</section>
@@ -951,7 +1217,7 @@
and starts the <code>TaskTracker</code> daemon on all the listed slaves.
</p>
</section>
-
+
<section>
<title>Hadoop Shutdown</title>
Modified: hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml?rev=951482&r1=951481&r2=951482&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml (original)
+++ hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml Fri Jun 4 16:35:47 2010
@@ -97,7 +97,7 @@
</section>
- <section>
+ <section id="Download">
<title>Download</title>
<p>
Modified: hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml?rev=951482&r1=951481&r2=951482&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml Fri Jun 4 16:35:47 2010
@@ -39,9 +39,11 @@ See http://forrest.apache.org/docs/linki
</docs>
<docs label="Guides">
+ <commands_manual label="Hadoop Commands" href="commands_manual.html" />
<fsshell label="File System Shell" href="file_system_shell.html" />
<SLA label="Service Level Authorization" href="service_level_auth.html"/>
<native_lib label="Native Libraries" href="native_libraries.html" />
+ <hod_scheduler label="Hadoop On Demand" href="hod_scheduler.html"/>
</docs>
<docs label="Miscellaneous">
@@ -68,6 +70,15 @@ See http://forrest.apache.org/docs/linki
<hdfs-default href="http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html" />
<mapred-default href="http://hadoop.apache.org/mapreduce/docs/current/mapred-default.html" />
+ <mapred-queues href="http://hadoop.apache.org/mapreduce/docs/current/mapred_queues.xml" />
+ <capacity-scheduler href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html" />
+ <mapred-tutorial href="http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html" >
+ <JobAuthorization href="#Job+Authorization" />
+ </mapred-tutorial>
+ <streaming href="http://hadoop.apache.org/mapreduce/docs/current/streaming.html" />
+ <distcp href="http://hadoop.apache.org/mapreduce/docs/current/distcp.html" />
+ <hadoop-archives href="http://hadoop.apache.org/mapreduce/docs/current/hadoop_archives.html" />
+
<zlib href="http://www.zlib.net/" />
<gzip href="http://www.gzip.org/" />
<bzip href="http://www.bzip.org/" />