You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by zj...@apache.org on 2015/03/03 20:31:45 UTC

[07/43] hadoop git commit: YARN-3168. Convert site documentation from apt to markdown (Gururaj Shetty via aw)

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md
new file mode 100644
index 0000000..1812a44
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md
@@ -0,0 +1,233 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Hadoop: Fair Scheduler
+======================
+
+* [Purpose](#Purpose)
+* [Introduction](#Introduction)
+* [Hierarchical queues with pluggable policies](#Hierarchical_queues_with_pluggable_policies)
+* [Automatically placing applications in queues](#Automatically_placing_applications_in_queues)
+* [Installation](#Installation)
+* [Configuration](#Configuration)
+    * [Properties that can be placed in yarn-site.xml](#Properties_that_can_be_placed_in_yarn-site.xml)
+    * [Allocation file format](#Allocation_file_format)
+    * [Queue Access Control Lists](#Queue_Access_Control_Lists)
+* [Administration](#Administration)
+    * [Modifying configuration at runtime](#Modifying_configuration_at_runtime)
+    * [Monitoring through web UI](#Monitoring_through_web_UI)
+    * [Moving applications between queues](#Moving_applications_between_queues)
+
+##Purpose
+
+This document describes the `FairScheduler`, a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly.
+
+##Introduction
+
+Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time. Hadoop NextGen is capable of scheduling multiple resource types. By default, the Fair Scheduler bases scheduling fairness decisions only on memory. It can be configured to schedule with both memory and CPU, using the notion of Dominant Resource Fairness developed by Ghodsi et al. When there is a single app running, that app uses the entire cluster. When other apps are submitted, resources that free up are assigned to the new apps, so that each app eventually on gets roughly the same amount of resources. Unlike the default Hadoop scheduler, which forms a queue of apps, this lets short apps finish in reasonable time while not starving long-lived apps. It is also a reasonable way to share a cluster between a number of users. Finally, fair sharing can also work with app priorities - the priorities are used as weights to determine the fraction of t
 otal resources that each app should get.
+
+The scheduler organizes apps further into "queues", and shares resources fairly between these queues. By default, all users share a single queue, named "default". If an app specifically lists a queue in a container resource request, the request is submitted to that queue. It is also possible to assign queues based on the user name included with the request through configuration. Within each queue, a scheduling policy is used to share resources between the running apps. The default is memory-based fair sharing, but FIFO and multi-resource with Dominant Resource Fairness can also be configured. Queues can be arranged in a hierarchy to divide resources and configured with weights to share the cluster in specific proportions.
+
+In addition to providing fair sharing, the Fair Scheduler allows assigning guaranteed minimum shares to queues, which is useful for ensuring that certain users, groups or production applications always get sufficient resources. When a queue contains apps, it gets at least its minimum share, but when the queue does not need its full guaranteed share, the excess is split between other running apps. This lets the scheduler guarantee capacity for queues while utilizing resources efficiently when these queues don't contain applications.
+
+The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler's queue until some of the user's earlier apps finish.
+
+##Hierarchical queues with pluggable policies
+
+The fair scheduler supports hierarchical queues. All queues descend from a queue named "root". Available resources are distributed among the children of the root queue in the typical fair scheduling fashion. Then, the children distribute the resources assigned to them to their children in the same fashion. Applications may only be scheduled on leaf queues. Queues can be specified as children of other queues by placing them as sub-elements of their parents in the fair scheduler allocation file.
+
+A queue's name starts with the names of its parents, with periods as separators. So a queue named "queue1" under the root queue, would be referred to as "root.queue1", and a queue named "queue2" under a queue named "parent1" would be referred to as "root.parent1.queue2". When referring to queues, the root part of the name is optional, so queue1 could be referred to as just "queue1", and a queue2 could be referred to as just "parent1.queue2".
+
+Additionally, the fair scheduler allows setting a different custom policy for each queue to allow sharing the queue's resources in any which way the user wants. A custom policy can be built by extending `org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.SchedulingPolicy`. FifoPolicy, FairSharePolicy (default), and DominantResourceFairnessPolicy are built-in and can be readily used.
+
+Certain add-ons are not yet supported which existed in the original (MR1) Fair Scheduler. Among them, is the use of a custom policies governing priority "boosting" over certain apps.
+
+##Automatically placing applications in queues
+
+The Fair Scheduler allows administrators to configure policies that automatically place submitted applications into appropriate queues. Placement can depend on the user and groups of the submitter and the requested queue passed by the application. A policy consists of a set of rules that are applied sequentially to classify an incoming application. Each rule either places the app into a queue, rejects it, or continues on to the next rule. Refer to the allocation file format below for how to configure these policies.
+
+##Installation
+
+To use the Fair Scheduler first assign the appropriate scheduler class in yarn-site.xml:
+
+    <property>
+      <name>yarn.resourcemanager.scheduler.class</name>
+      <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
+    </property>
+
+##Configuration
+
+Customizing the Fair Scheduler typically involves altering two files. First, scheduler-wide options can be set by adding configuration properties in the yarn-site.xml file in your existing configuration directory. Second, in most cases users will want to create an allocation file listing which queues exist and their respective weights and capacities. The allocation file is reloaded every 10 seconds, allowing changes to be made on the fly.
+
+###Properties that can be placed in yarn-site.xml
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.scheduler.fair.allocation.file` | Path to allocation file. An allocation file is an XML manifest describing queues and their properties, in addition to certain policy defaults. This file must be in the XML format described in the next section. If a relative path is given, the file is searched for on the classpath (which typically includes the Hadoop conf directory). Defaults to fair-scheduler.xml. |
+| `yarn.scheduler.fair.user-as-default-queue` | Whether to use the username associated with the allocation as the default queue name, in the event that a queue name is not specified. If this is set to "false" or unset, all jobs have a shared default queue, named "default". Defaults to true. If a queue placement policy is given in the allocations file, this property is ignored. |
+| `yarn.scheduler.fair.preemption` | Whether to use preemption. Defaults to false. |
+| `yarn.scheduler.fair.preemption.cluster-utilization-threshold` | The utilization threshold after which preemption kicks in. The utilization is computed as the maximum ratio of usage to capacity among all resources. Defaults to 0.8f. |
+| `yarn.scheduler.fair.sizebasedweight` | Whether to assign shares to individual apps based on their size, rather than providing an equal share to all apps regardless of size. When set to true, apps are weighted by the natural logarithm of one plus the app's total requested memory, divided by the natural logarithm of 2. Defaults to false. |
+| `yarn.scheduler.fair.assignmultiple` | Whether to allow multiple container assignments in one heartbeat. Defaults to false. |
+| `yarn.scheduler.fair.max.assign` | If assignmultiple is true, the maximum amount of containers that can be assigned in one heartbeat. Defaults to -1, which sets no limit. |
+| `yarn.scheduler.fair.locality.threshold.node` | For applications that request containers on particular nodes, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another node. Expressed as a float between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means don't pass up any scheduling opportunities. |
+| `yarn.scheduler.fair.locality.threshold.rack` | For applications that request containers on particular racks, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another rack. Expressed as a float between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means don't pass up any scheduling opportunities. |
+| `yarn.scheduler.fair.allow-undeclared-pools` | If this is true, new queues can be created at application submission time, whether because they are specified as the application's queue by the submitter or because they are placed there by the user-as-default-queue property. If this is false, any time an app would be placed in a queue that is not specified in the allocations file, it is placed in the "default" queue instead. Defaults to true. If a queue placement policy is given in the allocations file, this property is ignored. |
+| `yarn.scheduler.fair.update-interval-ms` | The interval at which to lock the scheduler and recalculate fair shares, recalculate demand, and check whether anything is due for preemption. Defaults to 500 ms. |
+
+###Allocation file format
+
+The allocation file must be in XML format. The format contains five types of elements:
+
+* **Queue elements**: which represent queues. Queue elements can take an optional attribute 'type', which when set to 'parent' makes it a parent queue. This is useful when we want to create a parent queue without configuring any leaf queues. Each queue element may contain the following properties:
+
+    * minResources: minimum resources the queue is entitled to, in the form "X mb, Y vcores". For the single-resource fairness policy, the vcores value is ignored. If a queue's minimum share is not satisfied, it will be offered available resources before any other queue under the same parent. Under the single-resource fairness policy, a queue is considered unsatisfied if its memory usage is below its minimum memory share. Under dominant resource fairness, a queue is considered unsatisfied if its usage for its dominant resource with respect to the cluster capacity is below its minimum share for that resource. If multiple queues are unsatisfied in this situation, resources go to the queue with the smallest ratio between relevant resource usage and minimum. Note that it is possible that a queue that is below its minimum may not immediately get up to its minimum when it submits an application, because already-running jobs may be using those resources.
+
+    * maxResources: maximum resources a queue is allowed, in the form "X mb, Y vcores". For the single-resource fairness policy, the vcores value is ignored. A queue will never be assigned a container that would put its aggregate usage over this limit.
+
+    * maxRunningApps: limit the number of apps from the queue to run at once
+
+    * maxAMShare: limit the fraction of the queue's fair share that can be used to run application masters. This property can only be used for leaf queues. For example, if set to 1.0f, then AMs in the leaf queue can take up to 100% of both the memory and CPU fair share. The value of -1.0f will disable this feature and the amShare will not be checked. The default value is 0.5f.
+
+    * weight: to share the cluster non-proportionally with other queues. Weights default to 1, and a queue with weight 2 should receive approximately twice as many resources as a queue with the default weight.
+
+    * schedulingPolicy: to set the scheduling policy of any queue. The allowed values are "fifo"/"fair"/"drf" or any class that extends `org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.SchedulingPolicy`. Defaults to "fair". If "fifo", apps with earlier submit times are given preference for containers, but apps submitted later may run concurrently if there is leftover space on the cluster after satisfying the earlier app's requests.
+
+    * aclSubmitApps: a list of users and/or groups that can submit apps to the queue. Refer to the ACLs section below for more info on the format of this list and how queue ACLs work.
+
+    * aclAdministerApps: a list of users and/or groups that can administer a queue. Currently the only administrative action is killing an application. Refer to the ACLs section below for more info on the format of this list and how queue ACLs work.
+
+    * minSharePreemptionTimeout: number of seconds the queue is under its minimum share before it will try to preempt containers to take resources from other queues. If not set, the queue will inherit the value from its parent queue.
+
+    * fairSharePreemptionTimeout: number of seconds the queue is under its fair share threshold before it will try to preempt containers to take resources from other queues. If not set, the queue will inherit the value from its parent queue.
+
+    * fairSharePreemptionThreshold: the fair share preemption threshold for the queue. If the queue waits fairSharePreemptionTimeout without receiving fairSharePreemptionThreshold\*fairShare resources, it is allowed to preempt containers to take resources from other queues. If not set, the queue will inherit the value from its parent queue.
+
+* **User elements**: which represent settings governing the behavior of individual users. They can contain a single property: maxRunningApps, a limit on the number of running apps for a particular user.
+
+* **A userMaxAppsDefault element**: which sets the default running app limit for any users whose limit is not otherwise specified.
+
+* **A defaultFairSharePreemptionTimeout element**: which sets the fair share preemption timeout for the root queue; overridden by fairSharePreemptionTimeout element in root queue.
+
+* **A defaultMinSharePreemptionTimeout element**: which sets the min share preemption timeout for the root queue; overridden by minSharePreemptionTimeout element in root queue.
+
+* **A defaultFairSharePreemptionThreshold element**: which sets the fair share preemption threshold for the root queue; overridden by fairSharePreemptionThreshold element in root queue.
+
+* **A queueMaxAppsDefault element**: which sets the default running app limit for queues; overriden by maxRunningApps element in each queue.
+
+* **A queueMaxAMShareDefault element**: which sets the default AM resource limit for queue; overriden by maxAMShare element in each queue.
+
+* **A defaultQueueSchedulingPolicy element**: which sets the default scheduling policy for queues; overriden by the schedulingPolicy element in each queue if specified. Defaults to "fair".
+
+* **A queuePlacementPolicy element**: which contains a list of rule elements that tell the scheduler how to place incoming apps into queues. Rules are applied in the order that they are listed. Rules may take arguments. All rules accept the "create" argument, which indicates whether the rule can create a new queue. "Create" defaults to true; if set to false and the rule would place the app in a queue that is not configured in the allocations file, we continue on to the next rule. The last rule must be one that can never issue a continue. Valid rules are:
+
+    * specified: the app is placed into the queue it requested. If the app requested no queue, i.e. it specified "default", we continue. If the app requested a queue name starting or ending with period, i.e. names like ".q1" or "q1." will be rejected.
+
+    * user: the app is placed into a queue with the name of the user who submitted it. Periods in the username will be replace with "\_dot\_", i.e. the queue name for user "first.last" is "first\_dot\_last".
+
+    * primaryGroup: the app is placed into a queue with the name of the primary group of the user who submitted it. Periods in the group name will be replaced with "\_dot\_", i.e. the queue name for group "one.two" is "one\_dot\_two".
+
+    * secondaryGroupExistingQueue: the app is placed into a queue with a name that matches a secondary group of the user who submitted it. The first secondary group that matches a configured queue will be selected. Periods in group names will be replaced with "\_dot\_", i.e. a user with "one.two" as one of their secondary groups would be placed into the "one\_dot\_two" queue, if such a queue exists.
+
+    * nestedUserQueue : the app is placed into a queue with the name of the user under the queue suggested by the nested rule. This is similar to ‘user’ rule,the difference being in 'nestedUserQueue' rule,user queues can be created under any parent queue, while 'user' rule creates user queues only under root queue. Note that nestedUserQueue rule would be applied only if the nested rule returns a parent queue.One can configure a parent queue either by setting 'type' attribute of queue to 'parent' or by configuring at least one leaf under that queue which makes it a parent. See example allocation for a sample use case.
+
+    * default: the app is placed into the queue specified in the 'queue' attribute of the default rule. If 'queue' attribute is not specified, the app is placed into 'root.default' queue.
+
+    * reject: the app is rejected.
+
+    An example allocation file is given here:
+
+```xml
+<?xml version="1.0"?>
+<allocations>
+  <queue name="sample_queue">
+    <minResources>10000 mb,0vcores</minResources>
+    <maxResources>90000 mb,0vcores</maxResources>
+    <maxRunningApps>50</maxRunningApps>
+    <maxAMShare>0.1</maxAMShare>
+    <weight>2.0</weight>
+    <schedulingPolicy>fair</schedulingPolicy>
+    <queue name="sample_sub_queue">
+      <aclSubmitApps>charlie</aclSubmitApps>
+      <minResources>5000 mb,0vcores</minResources>
+    </queue>
+  </queue>
+
+  <queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
+
+  <!-- Queue 'secondary_group_queue' is a parent queue and may have
+       user queues under it -->
+  <queue name="secondary_group_queue" type="parent">
+  <weight>3.0</weight>
+  </queue>
+  
+  <user name="sample_user">
+    <maxRunningApps>30</maxRunningApps>
+  </user>
+  <userMaxAppsDefault>5</userMaxAppsDefault>
+  
+  <queuePlacementPolicy>
+    <rule name="specified" />
+    <rule name="primaryGroup" create="false" />
+    <rule name="nestedUserQueue">
+        <rule name="secondaryGroupExistingQueue" create="false" />
+    </rule>
+    <rule name="default" queue="sample_queue"/>
+  </queuePlacementPolicy>
+</allocations>
+```
+
+  Note that for backwards compatibility with the original FairScheduler, "queue" elements can instead be named as "pool" elements.
+
+###Queue Access Control Lists
+
+Queue Access Control Lists (ACLs) allow administrators to control who may take actions on particular queues. They are configured with the aclSubmitApps and aclAdministerApps properties, which can be set per queue. Currently the only supported administrative action is killing an application. Anybody who may administer a queue may also submit applications to it. These properties take values in a format like "user1,user2 group1,group2" or " group1,group2". An action on a queue will be permitted if its user or group is in the ACL of that queue or in the ACL of any of that queue's ancestors. So if queue2 is inside queue1, and user1 is in queue1's ACL, and user2 is in queue2's ACL, then both users may submit to queue2.
+
+**Note:** The delimiter is a space character. To specify only ACL groups, begin the value with a space character.
+
+The root queue's ACLs are "\*" by default which, because ACLs are passed down, means that everybody may submit to and kill applications from every queue. To start restricting access, change the root queue's ACLs to something other than "\*".
+
+##Administration
+
+The fair scheduler provides support for administration at runtime through a few mechanisms:
+
+###Modifying configuration at runtime
+
+It is possible to modify minimum shares, limits, weights, preemption timeouts and queue scheduling policies at runtime by editing the allocation file. The scheduler will reload this file 10-15 seconds after it sees that it was modified.
+
+###Monitoring through web UI
+
+Current applications, queues, and fair shares can be examined through the ResourceManager's web interface, at `http://*ResourceManager URL*/cluster/scheduler`.
+
+The following fields can be seen for each queue on the web interface:
+
+* Used Resources - The sum of resources allocated to containers within the queue.
+
+* Num Active Applications - The number of applications in the queue that have received at least one container.
+
+* Num Pending Applications - The number of applications in the queue that have not yet received any containers.
+
+* Min Resources - The configured minimum resources that are guaranteed to the queue.
+
+* Max Resources - The configured maximum resources that are allowed to the queue.
+
+* Instantaneous Fair Share - The queue's instantaneous fair share of resources. These shares consider only actives queues (those with running applications), and are used for scheduling decisions. Queues may be allocated resources beyond their shares when other queues aren't using them. A queue whose resource consumption lies at or below its instantaneous fair share will never have its containers preempted.
+
+* Steady Fair Share - The queue's steady fair share of resources. These shares consider all the queues irrespective of whether they are active (have running applications) or not. These are computed less frequently and change only when the configuration or capacity changes.They are meant to provide visibility into resources the user can expect, and hence displayed in the Web UI.
+
+###Moving applications between queues
+
+The Fair Scheduler supports moving a running application to a different queue. This can be useful for moving an important application to a higher priority queue, or for moving an unimportant application to a lower priority queue. Apps can be moved by running `yarn application -movetoqueue appID -queue targetQueueName`.
+
+When an application is moved to a queue, its existing allocations become counted with the new queue's allocations instead of the old for purposes of determining fairness. An attempt to move an application to a queue will fail if the addition of the app's resources to that queue would violate the its maxRunningApps or maxResources constraints.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
new file mode 100644
index 0000000..6341c60
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
@@ -0,0 +1,57 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+NodeManager Overview
+=====================
+
+* [Overview](#Overview)
+* [Health Checker Service](#Health_checker_service)
+    * [Disk Checker](#Disk_Checker)
+    * [External Health Script](#External_Health_Script)
+
+Overview
+--------
+
+The NodeManager is responsible for launching and managing containers on a node. Containers execute tasks as specified by the AppMaster.
+
+Health Checker Service
+----------------------
+
+The NodeManager runs services to determine the health of the node it is executing on. The services perform checks on the disk as well as any user specified tests. If any health check fails, the NodeManager marks the node as unhealthy and communicates this to the ResourceManager, which then stops assigning containers to the node. Communication of the node status is done as part of the heartbeat between the NodeManager and the ResourceManager. The intervals at which the disk checker and health monitor(described below) run don't affect the heartbeat intervals. When the heartbeat takes place, the status of both checks is used to determine the health of the node.
+
+###Disk Checker
+
+  The disk checker checks the state of the disks that the NodeManager is configured to use(local-dirs and log-dirs, configured using yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs respectively). The checks include permissions and free disk space. It also checks that the filesystem isn't in a read-only state. The checks are run at 2 minute intervals by default but can be configured to run as often as the user desires. If a disk fails the check, the NodeManager stops using that particular disk but still reports the node status as healthy. However if a number of disks fail the check(the number can be configured, as explained below), then the node is reported as unhealthy to the ResourceManager and new containers will not be assigned to the node. In addition, once a disk is marked as unhealthy, the NodeManager stops checking it to see if it has recovered(e.g. disk became full and was then cleaned up). The only way for the NodeManager to use that disk to restart the software o
 n the node. The following configuration parameters can be used to modify the disk checks:
+
+| Configuration Name | Allowed Values | Description |
+|:---- |:---- |:---- |
+| `yarn.nodemanager.disk-health-checker.enable` | true, false | Enable or disable the disk health checker service |
+| `yarn.nodemanager.disk-health-checker.interval-ms` | Positive integer | The interval, in milliseconds, at which the disk checker should run; the default value is 2 minutes |
+| `yarn.nodemanager.disk-health-checker.min-healthy-disks` | Float between 0-1 | The minimum fraction of disks that must pass the check for the NodeManager to mark the node as healthy; the default is 0.25 |
+| `yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage` | Float between 0-100 | The maximum percentage of disk space that may be utilized before a disk is marked as unhealthy by the disk checker service. This check is run for every disk used by the NodeManager. The default value is 100 i.e. the entire disk can be used. |
+| `yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb` | Integer | The minimum amount of free space that must be available on the disk for the disk checker service to mark the disk as healthy. This check is run for every disk used by the NodeManager. The default value is 0 i.e. the entire disk can be used. |
+
+
+###External Health Script
+
+  Users may specify their own health checker script that will be invoked by the health checker service. Users may specify a timeout as well as options to be passed to the script. If the script exits with a non-zero exit code, times out or results in an exception being thrown, the node is marked as unhealthy. Please note that if the script cannot be executed due to permissions or an incorrect path, etc, then it counts as a failure and the node will be reported as unhealthy. Please note that speifying a health check script is not mandatory. If no script is specified, only the disk checker status will be used to determine the health of the node. The following configuration parameters can be used to set the health script:
+
+| Configuration Name | Allowed Values | Description |
+|:---- |:---- |:---- |
+| `yarn.nodemanager.health-checker.interval-ms` | Postive integer | The interval, in milliseconds, at which health checker service runs; the default value is 10 minutes. |
+| `yarn.nodemanager.health-checker.script.timeout-ms` | Postive integer | The timeout for the health script that's executed; the default value is 20 minutes. |
+| `yarn.nodemanager.health-checker.script.path` | String | Absolute path to the health check script to be run. |
+| `yarn.nodemanager.health-checker.script.opts` | String | Arguments to be passed to the script when the script is executed. |
+
+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCgroups.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCgroups.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCgroups.md
new file mode 100644
index 0000000..79a428d
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCgroups.md
@@ -0,0 +1,57 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Using CGroups with YARN
+=======================
+
+* [CGroups Configuration](#CGroups_configuration)
+* [CGroups and Security](#CGroups_and_security)
+
+CGroups is a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour. CGroups is a Linux kernel feature and was merged into kernel version 2.6.24. From a YARN perspective, this allows containers to be limited in their resource usage. A good example of this is CPU usage. Without CGroups, it becomes hard to limit container CPU usage. Currently, CGroups is only used for limiting CPU usage.
+
+CGroups Configuration
+---------------------
+
+This section describes the configuration variables for using CGroups.
+
+The following settings are related to setting up CGroups. These need to be set in *yarn-site.xml*.
+
+|Configuration Name | Description |
+|:---- |:---- |
+| `yarn.nodemanager.container-executor.class` | This should be set to "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor". CGroups is a Linux kernel feature and is exposed via the LinuxContainerExecutor. |
+| `yarn.nodemanager.linux-container-executor.resources-handler.class` | This should be set to "org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler". Using the LinuxContainerExecutor doesn't force you to use CGroups. If you wish to use CGroups, the resource-handler-class must be set to CGroupsLCEResourceHandler. |
+| `yarn.nodemanager.linux-container-executor.cgroups.hierarchy` | The cgroups hierarchy under which to place YARN proccesses(cannot contain commas). If yarn.nodemanager.linux-container-executor.cgroups.mount is false (that is, if cgroups have been pre-configured), then this cgroups hierarchy must already exist |
+| `yarn.nodemanager.linux-container-executor.cgroups.mount` | Whether the LCE should attempt to mount cgroups if not found - can be true or false. |
+| `yarn.nodemanager.linux-container-executor.cgroups.mount-path` | Where the LCE should attempt to mount cgroups if not found. Common locations include /sys/fs/cgroup and /cgroup; the default location can vary depending on the Linux distribution in use. This path must exist before the NodeManager is launched. Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler, and yarn.nodemanager.linux-container-executor.cgroups.mount is true. A point to note here is that the container-executor binary will try to mount the path specified + "/" + the subsystem. In our case, since we are trying to limit CPU the binary tries to mount the path specified + "/cpu" and that's the path it expects to exist. |
+| `yarn.nodemanager.linux-container-executor.group` | The Unix group of the NodeManager. It should match the setting in "container-executor.cfg". This configuration is required for validating the secure access of the container-executor binary. |
+
+The following settings are related to limiting resource usage of YARN containers:
+
+|Configuration Name | Description |
+|:---- |:---- |
+| `yarn.nodemanager.resource.percentage-physical-cpu-limit` | This setting lets you limit the cpu usage of all YARN containers. It sets a hard upper limit on the cumulative CPU usage of the containers. For example, if set to 60, the combined CPU usage of all YARN containers will not exceed 60%. |
+| `yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage` | CGroups allows cpu usage limits to be hard or soft. When this setting is true, containers cannot use more CPU usage than allocated even if spare CPU is available. This ensures that containers can only use CPU that they were allocated. When set to false, containers can use spare CPU if available. It should be noted that irrespective of whether set to true or false, at no time can the combined CPU usage of all containers exceed the value specified in "yarn.nodemanager.resource.percentage-physical-cpu-limit". |
+
+CGroups and security
+--------------------
+
+CGroups itself has no requirements related to security. However, the LinuxContainerExecutor does have some requirements. If running in non-secure mode, by default, the LCE runs all jobs as user "nobody". This user can be changed by setting "yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" to the desired user. However, it can also be configured to run jobs as the user submitting the job. In that case "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users" should be set to false.
+
+| yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user | yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users | User running jobs |
+|:---- |:---- |:---- |
+| (default) | (default) | nobody |
+| yarn | (default) | yarn |
+| yarn | false | (User submitting the job) |
+
+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md
new file mode 100644
index 0000000..acafd28
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md
@@ -0,0 +1,543 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+NodeManager REST API's
+=======================
+
+* [Overview](#Overview)
+* [NodeManager Information API](#NodeManager_Information_API)
+* [Applications API](#Applications_API)
+* [Application API](#Application_API)
+* [Containers API](#Containers_API)
+* [Container API](#Container_API)
+
+Overview
+--------
+
+The NodeManager REST API's allow the user to get status on the node and information about applications and containers running on that node.
+
+NodeManager Information API
+---------------------------
+
+The node information resource provides overall information about that particular node.
+
+### URI
+
+Both of the following URI's give you the cluster information.
+
+      * http://<nm http address:port>/ws/v1/node
+      * http://<nm http address:port>/ws/v1/node/info
+
+### HTTP Operations Supported
+
+      * GET
+
+### Query Parameters Supported
+
+      None
+
+### Elements of the *nodeInfo* object
+
+| Properties | Data Type | Description |
+|:---- |:---- |:---- |
+| id | long | The NodeManager id |
+| nodeHostName | string | The host name of the NodeManager |
+| totalPmemAllocatedContainersMB | long | The amount of physical memory allocated for use by containers in MB |
+| totalVmemAllocatedContainersMB | long | The amount of virtual memory allocated for use by containers in MB |
+| totalVCoresAllocatedContainers | long | The number of virtual cores allocated for use by containers |
+| lastNodeUpdateTime | long | The last timestamp at which the health report was received (in ms since epoch) |
+| healthReport | string | The diagnostic health report of the node |
+| nodeHealthy | boolean | true/false indicator of if the node is healthy |
+| nodeManagerVersion | string | Version of the NodeManager |
+| nodeManagerBuildVersion | string | NodeManager build string with build version, user, and checksum |
+| nodeManagerVersionBuiltOn | string | Timestamp when NodeManager was built(in ms since epoch) |
+| hadoopVersion | string | Version of hadoop common |
+| hadoopBuildVersion | string | Hadoop common build string with build version, user, and checksum |
+| hadoopVersionBuiltOn | string | Timestamp when hadoop common was built(in ms since epoch) |
+
+### Response Examples
+
+**JSON response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/node/info
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/json
+      Transfer-Encoding: chunked
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```json
+{
+   "nodeInfo" : {
+      "hadoopVersionBuiltOn" : "Mon Jan  9 14:58:42 UTC 2012",
+      "nodeManagerBuildVersion" : "0.23.1-SNAPSHOT from 1228355 by user1 source checksum 20647f76c36430e888cc7204826a445c",
+      "lastNodeUpdateTime" : 1326222266126,
+      "totalVmemAllocatedContainersMB" : 17203,
+      "totalVCoresAllocatedContainers" : 8,
+      "nodeHealthy" : true,
+      "healthReport" : "",
+      "totalPmemAllocatedContainersMB" : 8192,
+      "nodeManagerVersionBuiltOn" : "Mon Jan  9 15:01:59 UTC 2012",
+      "nodeManagerVersion" : "0.23.1-SNAPSHOT",
+      "id" : "host.domain.com:8041",
+      "hadoopBuildVersion" : "0.23.1-SNAPSHOT from 1228292 by user1 source checksum 3eba233f2248a089e9b28841a784dd00",
+      "nodeHostName" : "host.domain.com",
+      "hadoopVersion" : "0.23.1-SNAPSHOT"
+   }
+}
+```
+
+**XML response**
+
+HTTP Request:
+
+      Accept: application/xml
+      GET http://<nm http address:port>/ws/v1/node/info
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/xml
+      Content-Length: 983
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```xml
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<nodeInfo>
+  <healthReport/>
+  <totalVmemAllocatedContainersMB>17203</totalVmemAllocatedContainersMB>
+  <totalPmemAllocatedContainersMB>8192</totalPmemAllocatedContainersMB>
+  <totalVCoresAllocatedContainers>8</totalVCoresAllocatedContainers>
+  <lastNodeUpdateTime>1326222386134</lastNodeUpdateTime>
+  <nodeHealthy>true</nodeHealthy>
+  <nodeManagerVersion>0.23.1-SNAPSHOT</nodeManagerVersion>
+  <nodeManagerBuildVersion>0.23.1-SNAPSHOT from 1228355 by user1 source checksum 20647f76c36430e888cc7204826a445c</nodeManagerBuildVersion>
+  <nodeManagerVersionBuiltOn>Mon Jan  9 15:01:59 UTC 2012</nodeManagerVersionBuiltOn>
+  <hadoopVersion>0.23.1-SNAPSHOT</hadoopVersion>
+  <hadoopBuildVersion>0.23.1-SNAPSHOT from 1228292 by user1 source checksum 3eba233f2248a089e9b28841a784dd00</hadoopBuildVersion>
+  <hadoopVersionBuiltOn>Mon Jan  9 14:58:42 UTC 2012</hadoopVersionBuiltOn>
+  <id>host.domain.com:8041</id>
+  <nodeHostName>host.domain.com</nodeHostName>
+</nodeInfo>
+```
+
+Applications API
+----------------
+
+With the Applications API, you can obtain a collection of resources, each of which represents an application. When you run a GET operation on this resource, you obtain a collection of Application Objects. See also [Application API](#Application_API) for syntax of the application object.
+
+### URI
+
+      * http://<nm http address:port>/ws/v1/node/apps
+
+### HTTP Operations Supported
+
+      * GET
+
+### Query Parameters Supported
+
+Multiple paramters can be specified.
+
+      * state - application state 
+      * user - user name
+
+### Elements of the *apps* (Applications) object
+
+When you make a request for the list of applications, the information will be returned as a collection of app objects. See also [Application API](#Application_API) for syntax of the app object.
+
+| Properties | Data Type | Description |
+|:---- |:---- |:---- |
+| app | array of app objects(JSON)/zero or more app objects(XML) | A collection of application objects |
+
+### Response Examples
+
+**JSON response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/node/apps
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/json
+      Transfer-Encoding: chunked
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```json
+{
+   "apps" : {
+      "app" : [
+         {
+            "containerids" : [
+               "container_1326121700862_0003_01_000001",
+               "container_1326121700862_0003_01_000002"
+            ],
+            "user" : "user1",
+            "id" : "application_1326121700862_0003",
+            "state" : "RUNNING"
+         },
+         {
+            "user" : "user1",
+            "id" : "application_1326121700862_0002",
+            "state" : "FINISHED"
+         }
+      ]
+   }
+}
+```
+
+**XML response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/node/apps
+      Accept: application/xml
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/xml
+      Content-Length: 400
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```xml
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<apps>
+  <app>
+    <id>application_1326121700862_0002</id>
+    <state>FINISHED</state>
+    <user>user1</user>
+  </app>
+  <app>
+    <id>application_1326121700862_0003</id>
+    <state>RUNNING</state>
+    <user>user1</user>
+    <containerids>container_1326121700862_0003_01_000002</containerids>
+    <containerids>container_1326121700862_0003_01_000001</containerids>
+  </app>
+</apps>
+```
+
+Application API
+---------------
+
+An application resource contains information about a particular application that was run or is running on this NodeManager.
+
+### URI
+
+Use the following URI to obtain an app Object, for a application identified by the appid value.
+
+      * http://<nm http address:port>/ws/v1/node/apps/{appid}
+
+### HTTP Operations Supported
+
+      * GET
+
+### Query Parameters Supported
+
+      None
+
+### Elements of the *app* (Application) object
+
+| Properties | Data Type | Description |
+|:---- |:---- |:---- |
+| id | string | The application id |
+| user | string | The user who started the application |
+| state | string | The state of the application - valid states are: NEW, INITING, RUNNING, FINISHING\_CONTAINERS\_WAIT, APPLICATION\_RESOURCES\_CLEANINGUP, FINISHED |
+| containerids | array of containerids(JSON)/zero or more containerids(XML) | The list of containerids currently being used by the application on this node. If not present then no containers are currently running for this application. |
+
+### Response Examples
+
+**JSON response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/node/apps/application_1326121700862_0005
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/json
+      Transfer-Encoding: chunked
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```json
+{
+   "app" : {
+      "containerids" : [
+         "container_1326121700862_0005_01_000003",
+         "container_1326121700862_0005_01_000001"
+      ],
+      "user" : "user1",
+      "id" : "application_1326121700862_0005",
+      "state" : "RUNNING"
+   }
+}
+```
+
+**XML response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/node/apps/application_1326121700862_0005
+      Accept: application/xml
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/xml
+      Content-Length: 281 
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```xml
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<app>
+  <id>application_1326121700862_0005</id>
+  <state>RUNNING</state>
+  <user>user1</user>
+  <containerids>container_1326121700862_0005_01_000003</containerids>
+  <containerids>container_1326121700862_0005_01_000001</containerids>
+</app>
+```
+
+Containers API
+--------------
+
+With the containers API, you can obtain a collection of resources, each of which represents a container. When you run a GET operation on this resource, you obtain a collection of Container Objects. See also [Container API](#Container_API) for syntax of the container object.
+
+### URI
+
+      * http://<nm http address:port>/ws/v1/node/containers
+
+### HTTP Operations Supported
+
+      * GET
+
+### Query Parameters Supported
+
+      None
+
+### Elements of the *containers* object
+
+When you make a request for the list of containers, the information will be returned as collection of container objects. See also [Container API](#Container_API) for syntax of the container object.
+
+| Properties | Data Type | Description |
+|:---- |:---- |:---- |
+| containers | array of container objects(JSON)/zero or more container objects(XML) | A collection of container objects |
+
+### Response Examples
+
+**JSON response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/node/containers
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/json
+      Transfer-Encoding: chunked
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```json
+{
+   "containers" : {
+      "container" : [
+         {
+            "nodeId" : "host.domain.com:8041",
+            "totalMemoryNeededMB" : 2048,
+            "totalVCoresNeeded" : 1,
+            "state" : "RUNNING",
+            "diagnostics" : "",
+            "containerLogsLink" : "http://host.domain.com:8042/node/containerlogs/container_1326121700862_0006_01_000001/user1",
+            "user" : "user1",
+            "id" : "container_1326121700862_0006_01_000001",
+            "exitCode" : -1000
+         },
+         {
+            "nodeId" : "host.domain.com:8041",
+            "totalMemoryNeededMB" : 2048,
+            "totalVCoresNeeded" : 2,
+            "state" : "RUNNING",
+            "diagnostics" : "",
+            "containerLogsLink" : "http://host.domain.com:8042/node/containerlogs/container_1326121700862_0006_01_000003/user1",
+            "user" : "user1",
+            "id" : "container_1326121700862_0006_01_000003",
+            "exitCode" : -1000
+         }
+      ]
+   }
+}
+```
+
+**XML response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/node/containers
+      Accept: application/xml
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/xml
+      Content-Length: 988
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```xml
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<containers>
+  <container>
+    <id>container_1326121700862_0006_01_000001</id>
+    <state>RUNNING</state>
+    <exitCode>-1000</exitCode>
+    <diagnostics/>
+    <user>user1</user>
+    <totalMemoryNeededMB>2048</totalMemoryNeededMB>
+    <totalVCoresNeeded>1</totalVCoresNeeded>
+    <containerLogsLink>http://host.domain.com:8042/node/containerlogs/container_1326121700862_0006_01_000001/user1</containerLogsLink>
+    <nodeId>host.domain.com:8041</nodeId>
+  </container>
+  <container>
+    <id>container_1326121700862_0006_01_000003</id>
+    <state>DONE</state>
+    <exitCode>0</exitCode>
+    <diagnostics>Container killed by the ApplicationMaster.</diagnostics>
+    <user>user1</user>
+    <totalMemoryNeededMB>2048</totalMemoryNeededMB>
+    <totalVCoresNeeded>2</totalVCoresNeeded>
+    <containerLogsLink>http://host.domain.com:8042/node/containerlogs/container_1326121700862_0006_01_000003/user1</containerLogsLink>
+    <nodeId>host.domain.com:8041</nodeId>
+  </container>
+</containers>
+```
+
+Container API
+-------------
+
+A container resource contains information about a particular container that is running on this NodeManager.
+
+### URI
+
+Use the following URI to obtain a Container Object, from a container identified by the containerid value.
+
+      * http://<nm http address:port>/ws/v1/node/containers/{containerid}
+
+### HTTP Operations Supported
+
+      * GET
+
+### Query Parameters Supported
+
+      None
+
+### Elements of the *container* object
+
+| Properties | Data Type | Description |
+|:---- |:---- |:---- |
+| id | string | The container id |
+| state | string | State of the container - valid states are: NEW, LOCALIZING, LOCALIZATION\_FAILED, LOCALIZED, RUNNING, EXITED\_WITH\_SUCCESS, EXITED\_WITH\_FAILURE, KILLING, CONTAINER\_CLEANEDUP\_AFTER\_KILL, CONTAINER\_RESOURCES\_CLEANINGUP, DONE |
+| nodeId | string | The id of the node the container is on |
+| containerLogsLink | string | The http link to the container logs |
+| user | string | The user name of the user which started the container |
+| exitCode | int | Exit code of the container |
+| diagnostics | string | A diagnostic message for failed containers |
+| totalMemoryNeededMB | long | Total amout of memory needed by the container (in MB) |
+| totalVCoresNeeded | long | Total number of virtual cores needed by the container |
+
+### Response Examples
+
+**JSON response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/nodes/containers/container_1326121700862_0007_01_000001
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/json
+      Transfer-Encoding: chunked
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```json
+{
+   "container" : {
+      "nodeId" : "host.domain.com:8041",
+      "totalMemoryNeededMB" : 2048,
+      "totalVCoresNeeded" : 1,
+      "state" : "RUNNING",
+      "diagnostics" : "",
+      "containerLogsLink" : "http://host.domain.com:8042/node/containerlogs/container_1326121700862_0007_01_000001/user1",
+      "user" : "user1",
+      "id" : "container_1326121700862_0007_01_000001",
+      "exitCode" : -1000
+   }
+}
+```
+
+**XML response**
+
+HTTP Request:
+
+      GET http://<nm http address:port>/ws/v1/node/containers/container_1326121700862_0007_01_000001
+      Accept: application/xml
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/xml
+      Content-Length: 491 
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```xml
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<container>
+  <id>container_1326121700862_0007_01_000001</id>
+  <state>RUNNING</state>
+  <exitCode>-1000</exitCode>
+  <diagnostics/>
+  <user>user1</user>
+  <totalMemoryNeededMB>2048</totalMemoryNeededMB>
+  <totalVCoresNeeded>1</totalVCoresNeeded>
+  <containerLogsLink>http://host.domain.com:8042/node/containerlogs/container_1326121700862_0007_01_000001/user1</containerLogsLink>
+  <nodeId>host.domain.com:8041</nodeId>
+</container>
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md
new file mode 100644
index 0000000..be7d75b
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md
@@ -0,0 +1,53 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+NodeManager Restart
+===================
+
+* [Introduction](#Introduction)
+* [Enabling NM Restart](#Enabling_NM_Restart)
+
+Introduction
+------------
+
+This document gives an overview of NodeManager (NM) restart, a feature that enables the NodeManager to be restarted without losing the active containers running on the node. At a high level, the NM stores any necessary state to a local state-store as it processes container-management requests. When the NM restarts, it recovers by first loading state for various subsystems and then letting those subsystems perform recovery using the loaded state.
+
+Enabling NM Restart
+-------------------
+
+Step 1. To enable NM Restart functionality, set the following property in **conf/yarn-site.xml** to *true*.
+
+| Property | Value |
+|:---- |:---- |
+| `yarn.nodemanager.recovery.enabled` | `true`, (default value is set to false) |
+
+Step 2.  Configure a path to the local file-system directory where the NodeManager can save its run state.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.nodemanager.recovery.dir` | The local filesystem directory in which the node manager will store state when recovery is enabled. The default value is set to `$hadoop.tmp.dir/yarn-nm-recovery`. |
+
+Step 3.  Configure a valid RPC address for the NodeManager.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.nodemanager.address` | Ephemeral ports (port 0, which is default) cannot be used for the NodeManager's RPC server specified via yarn.nodemanager.address as it can make NM use different ports before and after a restart. This will break any previously running clients that were communicating with the NM before restart. Explicitly setting yarn.nodemanager.address to an address with specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling NM restart. |
+
+Step 4.  Auxiliary services.
+
+  * NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely functional NM restart, YARN relies on any auxiliary service configured to also support recovery. This usually includes (1) avoiding usage of ephemeral ports so that previously running clients (in this case, usually containers) are not disrupted after restart and (2) having the auxiliary service itself support recoverability by reloading any previous state when NodeManager restarts and reinitializes the auxiliary service.
+
+  * A simple example for the above is the auxiliary service 'ShuffleHandler' for MapReduce (MR). ShuffleHandler respects the above two requirements already, so users/admins don't have do anything for it to support NM restart: (1) The configuration property **mapreduce.shuffle.port** controls which port the ShuffleHandler on a NodeManager host binds to, and it defaults to a non-ephemeral port. (2) The ShuffleHandler service also already supports recovery of previous state after NM restarts.
+
+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerHA.md
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerHA.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerHA.md
new file mode 100644
index 0000000..491b885
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerHA.md
@@ -0,0 +1,140 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+ResourceManager High Availability
+=================================
+
+* [Introduction](#Introduction)
+* [Architecture](#Architecture)
+    * [RM Failover](#RM_Failover)
+    * [Recovering prevous active-RM's state](#Recovering_prevous_active-RMs_state)
+* [Deployment](#Deployment)
+    * [Configurations](#Configurations)
+    * [Admin commands](#Admin_commands)
+    * [ResourceManager Web UI services](#ResourceManager_Web_UI_services)
+    * [Web Services](#Web_Services)
+
+Introduction
+------------
+
+This guide provides an overview of High Availability of YARN's ResourceManager, and details how to configure and use this feature. The ResourceManager (RM) is responsible for tracking the resources in a cluster, and scheduling applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager is the single point of failure in a YARN cluster. The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure.
+
+Architecture
+------------
+
+![Overview of ResourceManager High Availability](images/rm-ha-overview.png)
+
+### RM Failover
+
+ResourceManager HA is realized through an Active/Standby architecture - at any point of time, one of the RMs is Active, and one or more RMs are in Standby mode waiting to take over should anything happen to the Active. The trigger to transition-to-active comes from either the admin (through CLI) or through the integrated failover-controller when automatic-failover is enabled.
+
+#### Manual transitions and failover
+
+When automatic failover is not enabled, admins have to manually transition one of the RMs to Active. To failover from one RM to the other, they are expected to first transition the Active-RM to Standby and transition a Standby-RM to Active. All this can be done using the "`yarn rmadmin`" CLI.
+
+#### Automatic failover
+
+The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to decide which RM should be the Active. When the Active goes down or becomes unresponsive, another RM is automatically elected to be the Active which then takes over. Note that, there is no need to run a separate ZKFC daemon as is the case for HDFS because ActiveStandbyElector embedded in RMs acts as a failure detector and a leader elector instead of a separate ZKFC deamon.
+
+#### Client, ApplicationMaster and NodeManager on RM failover
+
+When there are multiple RMs, the configuration (yarn-site.xml) used by clients and nodes is expected to list all the RMs. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in a round-robin fashion until they hit the Active RM. If the Active goes down, they resume the round-robin polling until they hit the "new" Active. This default retry logic is implemented as `org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider`. You can override the logic by implementing `org.apache.hadoop.yarn.client.RMFailoverProxyProvider` and setting the value of `yarn.client.failover-proxy-provider` to the class name.
+
+### Recovering prevous active-RM's state
+
+With the [ResourceManger Restart](./ResourceManagerRestart.html) enabled, the RM being promoted to an active state loads the RM internal state and continues to operate from where the previous active left off as much as possible depending on the RM restart feature. A new attempt is spawned for each managed application previously submitted to the RM. Applications can checkpoint periodically to avoid losing any work. The state-store must be visible from the both of Active/Standby RMs. Currently, there are two RMStateStore implementations for persistence - FileSystemRMStateStore and ZKRMStateStore. The `ZKRMStateStore` implicitly allows write access to a single RM at any point in time, and hence is the recommended store to use in an HA cluster. When using the ZKRMStateStore, there is no need for a separate fencing mechanism to address a potential split-brain situation where multiple RMs can potentially assume the Active role.
+
+Deployment
+----------
+
+### Configurations
+
+Most of the failover functionality is tunable using various configuration properties. Following is a list of required/important ones. yarn-default.xml carries a full-list of knobs. See [yarn-default.xml](../hadoop-yarn-common/yarn-default.xml) for more information including default values. See the document for [ResourceManger Restart](./ResourceManagerRestart.html) also for instructions on setting up the state-store.
+
+| Configuration Properties | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.zk-address` | Address of the ZK-quorum. Used both for the state-store and embedded leader-election. |
+| `yarn.resourcemanager.ha.enabled` | Enable RM HA. |
+| `yarn.resourcemanager.ha.rm-ids` | List of logical IDs for the RMs. e.g., "rm1,rm2". |
+| `yarn.resourcemanager.hostname.*rm-id*` | For each *rm-id*, specify the hostname the RM corresponds to. Alternately, one could set each of the RM's service addresses. |
+| `yarn.resourcemanager.ha.id` | Identifies the RM in the ensemble. This is optional; however, if set, admins have to ensure that all the RMs have their own IDs in the config. |
+| `yarn.resourcemanager.ha.automatic-failover.enabled` | Enable automatic failover; By default, it is enabled only when HA is enabled. |
+| `yarn.resourcemanager.ha.automatic-failover.embedded` | Use embedded leader-elector to pick the Active RM, when automatic failover is enabled. By default, it is enabled only when HA is enabled. |
+| `yarn.resourcemanager.cluster-id` | Identifies the cluster. Used by the elector to ensure an RM doesn't take over as Active for another cluster. |
+| `yarn.client.failover-proxy-provider` | The class to be used by Clients, AMs and NMs to failover to the Active RM. |
+| `yarn.client.failover-max-attempts` | The max number of times FailoverProxyProvider should attempt failover. |
+| `yarn.client.failover-sleep-base-ms` | The sleep base (in milliseconds) to be used for calculating the exponential delay between failovers. |
+| `yarn.client.failover-sleep-max-ms` | The maximum sleep time (in milliseconds) between failovers. |
+| `yarn.client.failover-retries` | The number of retries per attempt to connect to a ResourceManager. |
+| `yarn.client.failover-retries-on-socket-timeouts` | The number of retries per attempt to connect to a ResourceManager on socket timeouts. |
+
+#### Sample configurations
+
+Here is the sample of minimal setup for RM failover.
+
+```xml
+<property>
+  <name>yarn.resourcemanager.ha.enabled</name>
+  <value>true</value>
+</property>
+<property>
+  <name>yarn.resourcemanager.cluster-id</name>
+  <value>cluster1</value>
+</property>
+<property>
+  <name>yarn.resourcemanager.ha.rm-ids</name>
+  <value>rm1,rm2</value>
+</property>
+<property>
+  <name>yarn.resourcemanager.hostname.rm1</name>
+  <value>master1</value>
+</property>
+<property>
+  <name>yarn.resourcemanager.hostname.rm2</name>
+  <value>master2</value>
+</property>
+<property>
+  <name>yarn.resourcemanager.zk-address</name>
+  <value>zk1:2181,zk2:2181,zk3:2181</value>
+</property>
+```
+
+### Admin commands
+
+`yarn rmadmin` has a few HA-specific command options to check the health/state of an RM, and transition to Active/Standby. Commands for HA take service id of RM set by `yarn.resourcemanager.ha.rm-ids` as argument.
+
+     $ yarn rmadmin -getServiceState rm1
+     active
+     
+     $ yarn rmadmin -getServiceState rm2
+     standby
+
+If automatic failover is enabled, you can not use manual transition command. Though you can override this by --forcemanual flag, you need caution.
+
+     $ yarn rmadmin -transitionToStandby rm1
+     Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@1d8299fd
+     Refusing to manually manage HA state, since it may cause
+     a split-brain scenario or other incorrect state.
+     If you are very sure you know what you are doing, please
+     specify the forcemanual flag.
+
+See [YarnCommands](./YarnCommands.html) for more details.
+
+### ResourceManager Web UI services
+
+Assuming a standby RM is up and running, the Standby automatically redirects all web requests to the Active, except for the "About" page.
+
+### Web Services
+
+Assuming a standby RM is up and running, RM web-services described at [ResourceManager REST APIs](./ResourceManagerRest.html) when invoked on a standby RM are automatically redirected to the Active RM.