You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by jl...@apache.org on 2014/11/08 00:41:17 UTC
hadoop git commit: YARN-2632. Document NM Restart feature.
Contributed by Junping Du and Vinod Kumar Vavilapalli
Repository: hadoop
Updated Branches:
refs/heads/trunk c3d475070 -> 1e215e8ba
YARN-2632. Document NM Restart feature. Contributed by Junping Du and Vinod Kumar Vavilapalli
Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/1e215e8b
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/1e215e8b
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/1e215e8b
Branch: refs/heads/trunk
Commit: 1e215e8ba2e801eb26f16c307daee756d6b2ca66
Parents: c3d4750
Author: Jason Lowe <jl...@apache.org>
Authored: Fri Nov 7 23:40:22 2014 +0000
Committer: Jason Lowe <jl...@apache.org>
Committed: Fri Nov 7 23:40:22 2014 +0000
----------------------------------------------------------------------
hadoop-project/src/site/site.xml | 1 +
hadoop-yarn-project/CHANGES.txt | 3 +
.../src/site/apt/NodeManagerRestart.apt.vm | 86 ++++++++++++++++++++
3 files changed, 90 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-project/src/site/site.xml
----------------------------------------------------------------------
diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml
index e1d4c92..6a61a83 100644
--- a/hadoop-project/src/site/site.xml
+++ b/hadoop-project/src/site/site.xml
@@ -123,6 +123,7 @@
<item name="Writing YARN Applications" href="hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html"/>
<item name="YARN Commands" href="hadoop-yarn/hadoop-yarn-site/YarnCommands.html"/>
<item name="Scheduler Load Simulator" href="hadoop-sls/SchedulerLoadSimulator.html"/>
+ <item name="NodeManager Restart" href="hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html"/>
</menu>
<menu name="YARN REST APIs" inherit="top">
http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-yarn-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt
index e4b116d..df7e3ea 100644
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@@ -195,6 +195,9 @@ Release 2.6.0 - UNRELEASED
YARN-2647. Added a queue CLI for getting queue information. (Sunil Govind via
vinodkv)
+ YARN-2632. Document NM Restart feature. (Junping Du and Vinod Kumar
+ Vavilapalli via jlowe)
+
IMPROVEMENTS
YARN-2197. Add a link to YARN CHANGES.txt in the left side of doc
http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
new file mode 100644
index 0000000..ba03f4e
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
@@ -0,0 +1,86 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~ http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+ ---
+ NodeManager Restart
+ ---
+ ---
+ ${maven.build.timestamp}
+
+NodeManager Restart
+
+* Introduction
+
+ This document gives an overview of NodeManager (NM) restart, a feature that
+ enables the NodeManager to be restarted without losing
+ the active containers running on the node. At a high level, the NM stores any
+ necessary state to a local state-store as it processes container-management
+ requests. When the NM restarts, it recovers by first loading state for
+ various subsystems and then letting those subsystems perform recovery using
+ the loaded state.
+
+* Enabling NM Restart
+
+ [[1]] To enable NM Restart functionality, set the following property in <<conf/yarn-site.xml>> to true:
+
+*--------------------------------------+--------------------------------------+
+|| Property || Value |
+*--------------------------------------+--------------------------------------+
+| <<<yarn.nodemanager.recovery.enabled>>> | |
+| | <<<true>>>, (default value is set to false) |
+*--------------------------------------+--------------------------------------+
+
+ [[2]] Configure a path to the local file-system directory where the
+ NodeManager can save its run state
+
+*--------------------------------------+--------------------------------------+
+|| Property || Description |
+*--------------------------------------+--------------------------------------+
+| <<<yarn.nodemanager.recovery.dir>>> | |
+| | The local filesystem directory in which the node manager will store state |
+| | when recovery is enabled. |
+| | The default value is set to |
+| | <<<${hadoop.tmp.dir}/yarn-nm-recovery>>>. |
+*--------------------------------------+--------------------------------------+
+
+ [[3]] Configure a valid RPC address for the NodeManager
+
+*--------------------------------------+--------------------------------------+
+|| Property || Description |
+*--------------------------------------+--------------------------------------+
+| <<<yarn.nodemanager.address>>> | |
+| | Ephemeral ports (port 0, which is default) cannot be used for the |
+| | NodeManager's RPC server specified via yarn.nodemanager.address as it can |
+| | make NM use different ports before and after a restart. This will break any |
+| | previously running clients that were communicating with the NM before |
+| | restart. Explicitly setting yarn.nodemanager.address to an address with |
+| | specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling |
+| | NM restart. |
+*--------------------------------------+--------------------------------------+
+
+ [[4]] Auxiliary services
+
+ NodeManagers in a YARN cluster can be configured to run auxiliary services.
+ For a completely functional NM restart, YARN relies on any auxiliary service
+ configured to also support recovery. This usually includes (1) avoiding usage
+ of ephemeral ports so that previously running clients (in this case, usually
+ containers) are not disrupted after restart and (2) having the auxiliary
+ service itself support recoverability by reloading any previous state when
+ NodeManager restarts and reinitializes the auxiliary service.
+
+ A simple example for the above is the auxiliary service 'ShuffleHandler' for
+ MapReduce (MR). ShuffleHandler respects the above two requirements already,
+ so users/admins don't have do anything for it to support NM restart: (1) The
+ configuration property <<mapreduce.shuffle.port>> controls which port the
+ ShuffleHandler on a NodeManager host binds to, and it defaults to a
+ non-ephemeral port. (2) The ShuffleHandler service also already supports
+ recovery of previous state after NM restarts.
\ No newline at end of file