You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by zh...@apache.org on 2015/03/02 18:15:52 UTC
[36/50] [abbrv] hadoop git commit: YARN-3168. Convert site
documentation from apt to markdown (Gururaj Shetty via aw)
http://git-wip-us.apache.org/repos/asf/hadoop/blob/06aca7c6/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm
deleted file mode 100644
index 36b8621..0000000
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm
+++ /dev/null
@@ -1,645 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
- ---
- NodeManager REST API's.
- ---
- ---
- ${maven.build.timestamp}
-
-NodeManager REST API's.
-
-%{toc|section=1|fromDepth=0|toDepth=2}
-
-* Overview
-
- The NodeManager REST API's allow the user to get status on the node and information about applications and containers running on that node.
-
-* NodeManager Information API
-
- The node information resource provides overall information about that particular node.
-
-** URI
-
- Both of the following URI's give you the cluster information.
-
-------
- * http://<nm http address:port>/ws/v1/node
- * http://<nm http address:port>/ws/v1/node/info
-------
-
-** HTTP Operations Supported
-
-------
- * GET
-------
-
-** Query Parameters Supported
-
-------
- None
-------
-
-** Elements of the <nodeInfo> object
-
-*---------------+--------------+-------------------------------+
-|| Item || Data Type || Description |
-*---------------+--------------+-------------------------------+
-| id | long | The NodeManager id |
-*---------------+--------------+-------------------------------+
-| nodeHostName | string | The host name of the NodeManager |
-*---------------+--------------+-------------------------------+
-| totalPmemAllocatedContainersMB | long | The amount of physical memory allocated for use by containers in MB |
-*---------------+--------------+-------------------------------+
-| totalVmemAllocatedContainersMB | long | The amount of virtual memory allocated for use by containers in MB |
-*---------------+--------------+-------------------------------+
-| totalVCoresAllocatedContainers | long | The number of virtual cores allocated for use by containers |
-*---------------+--------------+-------------------------------+
-| lastNodeUpdateTime | long | The last timestamp at which the health report was received (in ms since epoch)|
-*---------------+--------------+-------------------------------+
-| healthReport | string | The diagnostic health report of the node |
-*---------------+--------------+-------------------------------+
-| nodeHealthy | boolean | true/false indicator of if the node is healthy|
-*---------------+--------------+-------------------------------+
-| nodeManagerVersion | string | Version of the NodeManager |
-*---------------+--------------+-------------------------------+
-| nodeManagerBuildVersion | string | NodeManager build string with build version, user, and checksum |
-*---------------+--------------+-------------------------------+
-| nodeManagerVersionBuiltOn | string | Timestamp when NodeManager was built(in ms since epoch) |
-*---------------+--------------+-------------------------------+
-| hadoopVersion | string | Version of hadoop common |
-*---------------+--------------+-------------------------------+
-| hadoopBuildVersion | string | Hadoop common build string with build version, user, and checksum |
-*---------------+--------------+-------------------------------+
-| hadoopVersionBuiltOn | string | Timestamp when hadoop common was built(in ms since epoch) |
-*---------------+--------------+-------------------------------+
-
-** Response Examples
-
- <<JSON response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/node/info
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-{
- "nodeInfo" : {
- "hadoopVersionBuiltOn" : "Mon Jan 9 14:58:42 UTC 2012",
- "nodeManagerBuildVersion" : "0.23.1-SNAPSHOT from 1228355 by user1 source checksum 20647f76c36430e888cc7204826a445c",
- "lastNodeUpdateTime" : 1326222266126,
- "totalVmemAllocatedContainersMB" : 17203,
- "totalVCoresAllocatedContainers" : 8,
- "nodeHealthy" : true,
- "healthReport" : "",
- "totalPmemAllocatedContainersMB" : 8192,
- "nodeManagerVersionBuiltOn" : "Mon Jan 9 15:01:59 UTC 2012",
- "nodeManagerVersion" : "0.23.1-SNAPSHOT",
- "id" : "host.domain.com:8041",
- "hadoopBuildVersion" : "0.23.1-SNAPSHOT from 1228292 by user1 source checksum 3eba233f2248a089e9b28841a784dd00",
- "nodeHostName" : "host.domain.com",
- "hadoopVersion" : "0.23.1-SNAPSHOT"
- }
-}
-+---+
-
- <<XML response>>
-
- HTTP Request:
-
------
- Accept: application/xml
- GET http://<nm http address:port>/ws/v1/node/info
------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/xml
- Content-Length: 983
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<nodeInfo>
- <healthReport/>
- <totalVmemAllocatedContainersMB>17203</totalVmemAllocatedContainersMB>
- <totalPmemAllocatedContainersMB>8192</totalPmemAllocatedContainersMB>
- <totalVCoresAllocatedContainers>8</totalVCoresAllocatedContainers>
- <lastNodeUpdateTime>1326222386134</lastNodeUpdateTime>
- <nodeHealthy>true</nodeHealthy>
- <nodeManagerVersion>0.23.1-SNAPSHOT</nodeManagerVersion>
- <nodeManagerBuildVersion>0.23.1-SNAPSHOT from 1228355 by user1 source checksum 20647f76c36430e888cc7204826a445c</nodeManagerBuildVersion>
- <nodeManagerVersionBuiltOn>Mon Jan 9 15:01:59 UTC 2012</nodeManagerVersionBuiltOn>
- <hadoopVersion>0.23.1-SNAPSHOT</hadoopVersion>
- <hadoopBuildVersion>0.23.1-SNAPSHOT from 1228292 by user1 source checksum 3eba233f2248a089e9b28841a784dd00</hadoopBuildVersion>
- <hadoopVersionBuiltOn>Mon Jan 9 14:58:42 UTC 2012</hadoopVersionBuiltOn>
- <id>host.domain.com:8041</id>
- <nodeHostName>host.domain.com</nodeHostName>
-</nodeInfo>
-+---+
-
-* Applications API
-
- With the Applications API, you can obtain a collection of resources, each of which represents an application. When you run a GET operation on this resource, you obtain a collection of Application Objects. See also {{Application API}} for syntax of the application object.
-
-** URI
-
-------
- * http://<nm http address:port>/ws/v1/node/apps
-------
-
-** HTTP Operations Supported
-
-------
- * GET
-------
-
-** Query Parameters Supported
-
- Multiple paramters can be specified.
-
-------
- * state - application state
- * user - user name
-------
-
-** Elements of the <apps> (Applications) object
-
- When you make a request for the list of applications, the information will be returned as a collection of app objects.
- See also {{Application API}} for syntax of the app object.
-
-*---------------+--------------+-------------------------------+
-|| Item || Data Type || Description |
-*---------------+--------------+-------------------------------+
-| app | array of app objects(JSON)/zero or more app objects(XML) | A collection of application objects |
-*---------------+--------------+--------------------------------+
-
-** Response Examples
-
- <<JSON response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/node/apps
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-{
- "apps" : {
- "app" : [
- {
- "containerids" : [
- "container_1326121700862_0003_01_000001",
- "container_1326121700862_0003_01_000002"
- ],
- "user" : "user1",
- "id" : "application_1326121700862_0003",
- "state" : "RUNNING"
- },
- {
- "user" : "user1",
- "id" : "application_1326121700862_0002",
- "state" : "FINISHED"
- }
- ]
- }
-}
-+---+
-
- <<XML response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/node/apps
- Accept: application/xml
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/xml
- Content-Length: 400
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<apps>
- <app>
- <id>application_1326121700862_0002</id>
- <state>FINISHED</state>
- <user>user1</user>
- </app>
- <app>
- <id>application_1326121700862_0003</id>
- <state>RUNNING</state>
- <user>user1</user>
- <containerids>container_1326121700862_0003_01_000002</containerids>
- <containerids>container_1326121700862_0003_01_000001</containerids>
- </app>
-</apps>
-
-+---+
-
-* {Application API}
-
- An application resource contains information about a particular application that was run or is running on this NodeManager.
-
-** URI
-
- Use the following URI to obtain an app Object, for a application identified by the {appid} value.
-
-------
- * http://<nm http address:port>/ws/v1/node/apps/{appid}
-------
-
-** HTTP Operations Supported
-
-------
- * GET
-------
-
-** Query Parameters Supported
-
-------
- None
-------
-
-** Elements of the <app> (Application) object
-
-*---------------+--------------+-------------------------------+
-|| Item || Data Type || Description |
-*---------------+--------------+-------------------------------+
-| id | string | The application id |
-*---------------+--------------+--------------------------------+
-| user | string | The user who started the application |
-*---------------+--------------+--------------------------------+
-| state | string | The state of the application - valid states are: NEW, INITING, RUNNING, FINISHING_CONTAINERS_WAIT, APPLICATION_RESOURCES_CLEANINGUP, FINISHED |
-*---------------+--------------+--------------------------------+
-| containerids | array of containerids(JSON)/zero or more containerids(XML) | The list of containerids currently being used by the application on this node. If not present then no containers are currently running for this application.|
-*---------------+--------------+--------------------------------+
-
-** Response Examples
-
- <<JSON response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/node/apps/application_1326121700862_0005
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-{
- "app" : {
- "containerids" : [
- "container_1326121700862_0005_01_000003",
- "container_1326121700862_0005_01_000001"
- ],
- "user" : "user1",
- "id" : "application_1326121700862_0005",
- "state" : "RUNNING"
- }
-}
-+---+
-
- <<XML response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/node/apps/application_1326121700862_0005
- Accept: application/xml
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/xml
- Content-Length: 281
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<app>
- <id>application_1326121700862_0005</id>
- <state>RUNNING</state>
- <user>user1</user>
- <containerids>container_1326121700862_0005_01_000003</containerids>
- <containerids>container_1326121700862_0005_01_000001</containerids>
-</app>
-+---+
-
-
-* Containers API
-
- With the containers API, you can obtain a collection of resources, each of which represents a container. When you run a GET operation on this resource, you obtain a collection of Container Objects. See also {{Container API}} for syntax of the container object.
-
-** URI
-
-------
- * http://<nm http address:port>/ws/v1/node/containers
-------
-
-** HTTP Operations Supported
-
-------
- * GET
-------
-
-** Query Parameters Supported
-
-------
- None
-------
-
-** Elements of the <containers> object
-
- When you make a request for the list of containers, the information will be returned as collection of container objects.
- See also {{Container API}} for syntax of the container object.
-
-*---------------+--------------+-------------------------------+
-|| Item || Data Type || Description |
-*---------------+--------------+-------------------------------+
-| containers | array of container objects(JSON)/zero or more container objects(XML) | A collection of container objects |
-*---------------+--------------+-------------------------------+
-
-** Response Examples
-
- <<JSON response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/node/containers
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-{
- "containers" : {
- "container" : [
- {
- "nodeId" : "host.domain.com:8041",
- "totalMemoryNeededMB" : 2048,
- "totalVCoresNeeded" : 1,
- "state" : "RUNNING",
- "diagnostics" : "",
- "containerLogsLink" : "http://host.domain.com:8042/node/containerlogs/container_1326121700862_0006_01_000001/user1",
- "user" : "user1",
- "id" : "container_1326121700862_0006_01_000001",
- "exitCode" : -1000
- },
- {
- "nodeId" : "host.domain.com:8041",
- "totalMemoryNeededMB" : 2048,
- "totalVCoresNeeded" : 2,
- "state" : "RUNNING",
- "diagnostics" : "",
- "containerLogsLink" : "http://host.domain.com:8042/node/containerlogs/container_1326121700862_0006_01_000003/user1",
- "user" : "user1",
- "id" : "container_1326121700862_0006_01_000003",
- "exitCode" : -1000
- }
- ]
- }
-}
-+---+
-
- <<XML response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/node/containers
- Accept: application/xml
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/xml
- Content-Length: 988
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<containers>
- <container>
- <id>container_1326121700862_0006_01_000001</id>
- <state>RUNNING</state>
- <exitCode>-1000</exitCode>
- <diagnostics/>
- <user>user1</user>
- <totalMemoryNeededMB>2048</totalMemoryNeededMB>
- <totalVCoresNeeded>1</totalVCoresNeeded>
- <containerLogsLink>http://host.domain.com:8042/node/containerlogs/container_1326121700862_0006_01_000001/user1</containerLogsLink>
- <nodeId>host.domain.com:8041</nodeId>
- </container>
- <container>
- <id>container_1326121700862_0006_01_000003</id>
- <state>DONE</state>
- <exitCode>0</exitCode>
- <diagnostics>Container killed by the ApplicationMaster.</diagnostics>
- <user>user1</user>
- <totalMemoryNeededMB>2048</totalMemoryNeededMB>
- <totalVCoresNeeded>2</totalVCoresNeeded>
- <containerLogsLink>http://host.domain.com:8042/node/containerlogs/container_1326121700862_0006_01_000003/user1</containerLogsLink>
- <nodeId>host.domain.com:8041</nodeId>
- </container>
-</containers>
-+---+
-
-
-* {Container API}
-
- A container resource contains information about a particular container that is running on this NodeManager.
-
-** URI
-
- Use the following URI to obtain a Container Object, from a container identified by the {containerid} value.
-
-------
- * http://<nm http address:port>/ws/v1/node/containers/{containerid}
-------
-
-** HTTP Operations Supported
-
-------
- * GET
-------
-
-** Query Parameters Supported
-
-------
- None
-------
-
-** Elements of the <container> object
-
-*---------------+--------------+-------------------------------+
-|| Item || Data Type || Description |
-*---------------+--------------+-------------------------------+
-| id | string | The container id |
-*---------------+--------------+-------------------------------+
-| state | string | State of the container - valid states are: NEW, LOCALIZING, LOCALIZATION_FAILED, LOCALIZED, RUNNING, EXITED_WITH_SUCCESS, EXITED_WITH_FAILURE, KILLING, CONTAINER_CLEANEDUP_AFTER_KILL, CONTAINER_RESOURCES_CLEANINGUP, DONE|
-*---------------+--------------+-------------------------------+
-| nodeId | string | The id of the node the container is on|
-*---------------+--------------+-------------------------------+
-| containerLogsLink | string | The http link to the container logs |
-*---------------+--------------+-------------------------------+
-| user | string | The user name of the user which started the container|
-*---------------+--------------+-------------------------------+
-| exitCode | int | Exit code of the container |
-*---------------+--------------+-------------------------------+
-| diagnostics | string | A diagnostic message for failed containers |
-*---------------+--------------+-------------------------------+
-| totalMemoryNeededMB | long | Total amout of memory needed by the container (in MB) |
-*---------------+--------------+-------------------------------+
-| totalVCoresNeeded | long | Total number of virtual cores needed by the container |
-*---------------+--------------+-------------------------------+
-
-** Response Examples
-
- <<JSON response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/nodes/containers/container_1326121700862_0007_01_000001
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-{
- "container" : {
- "nodeId" : "host.domain.com:8041",
- "totalMemoryNeededMB" : 2048,
- "totalVCoresNeeded" : 1,
- "state" : "RUNNING",
- "diagnostics" : "",
- "containerLogsLink" : "http://host.domain.com:8042/node/containerlogs/container_1326121700862_0007_01_000001/user1",
- "user" : "user1",
- "id" : "container_1326121700862_0007_01_000001",
- "exitCode" : -1000
- }
-}
-+---+
-
- <<XML response>>
-
- HTTP Request:
-
-------
- GET http://<nm http address:port>/ws/v1/node/containers/container_1326121700862_0007_01_000001
- Accept: application/xml
-------
-
- Response Header:
-
-+---+
- HTTP/1.1 200 OK
- Content-Type: application/xml
- Content-Length: 491
- Server: Jetty(6.1.26)
-+---+
-
- Response Body:
-
-+---+
-<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<container>
- <id>container_1326121700862_0007_01_000001</id>
- <state>RUNNING</state>
- <exitCode>-1000</exitCode>
- <diagnostics/>
- <user>user1</user>
- <totalMemoryNeededMB>2048</totalMemoryNeededMB>
- <totalVCoresNeeded>1</totalVCoresNeeded>
- <containerLogsLink>http://host.domain.com:8042/node/containerlogs/container_1326121700862_0007_01_000001/user1</containerLogsLink>
- <nodeId>host.domain.com:8041</nodeId>
-</container>
-+---+
-
http://git-wip-us.apache.org/repos/asf/hadoop/blob/06aca7c6/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
deleted file mode 100644
index ba03f4e..0000000
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
+++ /dev/null
@@ -1,86 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
- ---
- NodeManager Restart
- ---
- ---
- ${maven.build.timestamp}
-
-NodeManager Restart
-
-* Introduction
-
- This document gives an overview of NodeManager (NM) restart, a feature that
- enables the NodeManager to be restarted without losing
- the active containers running on the node. At a high level, the NM stores any
- necessary state to a local state-store as it processes container-management
- requests. When the NM restarts, it recovers by first loading state for
- various subsystems and then letting those subsystems perform recovery using
- the loaded state.
-
-* Enabling NM Restart
-
- [[1]] To enable NM Restart functionality, set the following property in <<conf/yarn-site.xml>> to true:
-
-*--------------------------------------+--------------------------------------+
-|| Property || Value |
-*--------------------------------------+--------------------------------------+
-| <<<yarn.nodemanager.recovery.enabled>>> | |
-| | <<<true>>>, (default value is set to false) |
-*--------------------------------------+--------------------------------------+
-
- [[2]] Configure a path to the local file-system directory where the
- NodeManager can save its run state
-
-*--------------------------------------+--------------------------------------+
-|| Property || Description |
-*--------------------------------------+--------------------------------------+
-| <<<yarn.nodemanager.recovery.dir>>> | |
-| | The local filesystem directory in which the node manager will store state |
-| | when recovery is enabled. |
-| | The default value is set to |
-| | <<<${hadoop.tmp.dir}/yarn-nm-recovery>>>. |
-*--------------------------------------+--------------------------------------+
-
- [[3]] Configure a valid RPC address for the NodeManager
-
-*--------------------------------------+--------------------------------------+
-|| Property || Description |
-*--------------------------------------+--------------------------------------+
-| <<<yarn.nodemanager.address>>> | |
-| | Ephemeral ports (port 0, which is default) cannot be used for the |
-| | NodeManager's RPC server specified via yarn.nodemanager.address as it can |
-| | make NM use different ports before and after a restart. This will break any |
-| | previously running clients that were communicating with the NM before |
-| | restart. Explicitly setting yarn.nodemanager.address to an address with |
-| | specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling |
-| | NM restart. |
-*--------------------------------------+--------------------------------------+
-
- [[4]] Auxiliary services
-
- NodeManagers in a YARN cluster can be configured to run auxiliary services.
- For a completely functional NM restart, YARN relies on any auxiliary service
- configured to also support recovery. This usually includes (1) avoiding usage
- of ephemeral ports so that previously running clients (in this case, usually
- containers) are not disrupted after restart and (2) having the auxiliary
- service itself support recoverability by reloading any previous state when
- NodeManager restarts and reinitializes the auxiliary service.
-
- A simple example for the above is the auxiliary service 'ShuffleHandler' for
- MapReduce (MR). ShuffleHandler respects the above two requirements already,
- so users/admins don't have do anything for it to support NM restart: (1) The
- configuration property <<mapreduce.shuffle.port>> controls which port the
- ShuffleHandler on a NodeManager host binds to, and it defaults to a
- non-ephemeral port. (2) The ShuffleHandler service also already supports
- recovery of previous state after NM restarts.
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/hadoop/blob/06aca7c6/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
deleted file mode 100644
index 0346cda..0000000
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
+++ /dev/null
@@ -1,233 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
- ---
- ResourceManager High Availability
- ---
- ---
- ${maven.build.timestamp}
-
-ResourceManager High Availability
-
-%{toc|section=1|fromDepth=0}
-
-* Introduction
-
- This guide provides an overview of High Availability of YARN's ResourceManager,
- and details how to configure and use this feature. The ResourceManager (RM)
- is responsible for tracking the resources in a cluster, and scheduling
- applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager
- is the single point of failure in a YARN cluster. The High Availability
- feature adds redundancy in the form of an Active/Standby ResourceManager pair
- to remove this otherwise single point of failure.
-
-* Architecture
-
-[images/rm-ha-overview.png] Overview of ResourceManager High Availability
-
-** RM Failover
-
- ResourceManager HA is realized through an Active/Standby architecture - at
- any point of time, one of the RMs is Active, and one or more RMs are in
- Standby mode waiting to take over should anything happen to the Active.
- The trigger to transition-to-active comes from either the admin (through CLI)
- or through the integrated failover-controller when automatic-failover is
- enabled.
-
-*** Manual transitions and failover
-
- When automatic failover is not enabled, admins have to manually transition
- one of the RMs to Active. To failover from one RM to the other, they are
- expected to first transition the Active-RM to Standby and transition a
- Standby-RM to Active. All this can be done using the "<<<yarn rmadmin>>>"
- CLI.
-
-*** Automatic failover
-
- The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to
- decide which RM should be the Active. When the Active goes down or becomes
- unresponsive, another RM is automatically elected to be the Active which
- then takes over. Note that, there is no need to run a separate ZKFC daemon
- as is the case for HDFS because ActiveStandbyElector embedded in RMs acts
- as a failure detector and a leader elector instead of a separate ZKFC
- deamon.
-
-*** Client, ApplicationMaster and NodeManager on RM failover
-
- When there are multiple RMs, the configuration (yarn-site.xml) used by
- clients and nodes is expected to list all the RMs. Clients,
- ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in
- a round-robin fashion until they hit the Active RM. If the Active goes down,
- they resume the round-robin polling until they hit the "new" Active.
- This default retry logic is implemented as
- <<<org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider>>>.
- You can override the logic by
- implementing <<<org.apache.hadoop.yarn.client.RMFailoverProxyProvider>>> and
- setting the value of <<<yarn.client.failover-proxy-provider>>> to
- the class name.
-
-** Recovering prevous active-RM's state
-
- With the {{{./ResourceManagerRestart.html}ResourceManger Restart}} enabled,
- the RM being promoted to an active state loads the RM internal state and
- continues to operate from where the previous active left off as much as
- possible depending on the RM restart feature. A new attempt is spawned for
- each managed application previously submitted to the RM. Applications can
- checkpoint periodically to avoid losing any work. The state-store must be
- visible from the both of Active/Standby RMs. Currently, there are two
- RMStateStore implementations for persistence - FileSystemRMStateStore
- and ZKRMStateStore. The <<<ZKRMStateStore>>> implicitly allows write access
- to a single RM at any point in time, and hence is the recommended store to
- use in an HA cluster. When using the ZKRMStateStore, there is no need for a
- separate fencing mechanism to address a potential split-brain situation
- where multiple RMs can potentially assume the Active role.
-
-
-* Deployment
-
-** Configurations
-
- Most of the failover functionality is tunable using various configuration
- properties. Following is a list of required/important ones. yarn-default.xml
- carries a full-list of knobs. See
- {{{../hadoop-yarn-common/yarn-default.xml}yarn-default.xml}}
- for more information including default values.
- See {{{./ResourceManagerRestart.html}the document for ResourceManger
- Restart}} also for instructions on setting up the state-store.
-
-*-------------------------+----------------------------------------------+
-|| Configuration Property || Description |
-*-------------------------+----------------------------------------------+
-| yarn.resourcemanager.zk-address | |
-| | Address of the ZK-quorum.
-| | Used both for the state-store and embedded leader-election.
-*-------------------------+----------------------------------------------+
-| yarn.resourcemanager.ha.enabled | |
-| | Enable RM HA
-*-------------------------+----------------------------------------------+
-| yarn.resourcemanager.ha.rm-ids | |
-| | List of logical IDs for the RMs. |
-| | e.g., "rm1,rm2" |
-*-------------------------+----------------------------------------------+
-| yarn.resourcemanager.hostname.<rm-id> | |
-| | For each <rm-id>, specify the hostname the |
-| | RM corresponds to. Alternately, one could set each of the RM's service |
-| | addresses. |
-*-------------------------+----------------------------------------------+
-| yarn.resourcemanager.ha.id | |
-| | Identifies the RM in the ensemble. This is optional; |
-| | however, if set, admins have to ensure that all the RMs have their own |
-| | IDs in the config |
-*-------------------------+----------------------------------------------+
-| yarn.resourcemanager.ha.automatic-failover.enabled | |
-| | Enable automatic failover; |
-| | By default, it is enabled only when HA is enabled. |
-*-------------------------+----------------------------------------------+
-| yarn.resourcemanager.ha.automatic-failover.embedded | |
-| | Use embedded leader-elector |
-| | to pick the Active RM, when automatic failover is enabled. By default, |
-| | it is enabled only when HA is enabled. |
-*-------------------------+----------------------------------------------+
-| yarn.resourcemanager.cluster-id | |
-| | Identifies the cluster. Used by the elector to |
-| | ensure an RM doesn't take over as Active for another cluster. |
-*-------------------------+----------------------------------------------+
-| yarn.client.failover-proxy-provider | |
-| | The class to be used by Clients, AMs and NMs to failover to the Active RM. |
-*-------------------------+----------------------------------------------+
-| yarn.client.failover-max-attempts | |
-| | The max number of times FailoverProxyProvider should attempt failover. |
-*-------------------------+----------------------------------------------+
-| yarn.client.failover-sleep-base-ms | |
-| | The sleep base (in milliseconds) to be used for calculating |
-| | the exponential delay between failovers. |
-*-------------------------+----------------------------------------------+
-| yarn.client.failover-sleep-max-ms | |
-| | The maximum sleep time (in milliseconds) between failovers |
-*-------------------------+----------------------------------------------+
-| yarn.client.failover-retries | |
-| | The number of retries per attempt to connect to a ResourceManager. |
-*-------------------------+----------------------------------------------+
-| yarn.client.failover-retries-on-socket-timeouts | |
-| | The number of retries per attempt to connect to a ResourceManager on socket timeouts. |
-*-------------------------+----------------------------------------------+
-
-*** Sample configurations
-
- Here is the sample of minimal setup for RM failover.
-
-+---+
- <property>
- <name>yarn.resourcemanager.ha.enabled</name>
- <value>true</value>
- </property>
- <property>
- <name>yarn.resourcemanager.cluster-id</name>
- <value>cluster1</value>
- </property>
- <property>
- <name>yarn.resourcemanager.ha.rm-ids</name>
- <value>rm1,rm2</value>
- </property>
- <property>
- <name>yarn.resourcemanager.hostname.rm1</name>
- <value>master1</value>
- </property>
- <property>
- <name>yarn.resourcemanager.hostname.rm2</name>
- <value>master2</value>
- </property>
- <property>
- <name>yarn.resourcemanager.zk-address</name>
- <value>zk1:2181,zk2:2181,zk3:2181</value>
- </property>
-+---+
-
-** Admin commands
-
- <<<yarn rmadmin>>> has a few HA-specific command options to check the health/state of an
- RM, and transition to Active/Standby.
- Commands for HA take service id of RM set by <<<yarn.resourcemanager.ha.rm-ids>>>
- as argument.
-
-+---+
- $ yarn rmadmin -getServiceState rm1
- active
-
- $ yarn rmadmin -getServiceState rm2
- standby
-+---+
-
- If automatic failover is enabled, you can not use manual transition command.
- Though you can override this by --forcemanual flag, you need caution.
-
-+---+
- $ yarn rmadmin -transitionToStandby rm1
- Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@1d8299fd
- Refusing to manually manage HA state, since it may cause
- a split-brain scenario or other incorrect state.
- If you are very sure you know what you are doing, please
- specify the forcemanual flag.
-+---+
-
- See {{{./YarnCommands.html}YarnCommands}} for more details.
-
-** ResourceManager Web UI services
-
- Assuming a standby RM is up and running, the Standby automatically redirects
- all web requests to the Active, except for the "About" page.
-
-** Web Services
-
- Assuming a standby RM is up and running, RM web-services described at
- {{{./ResourceManagerRest.html}ResourceManager REST APIs}} when invoked on
- a standby RM are automatically redirected to the Active RM.