You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by jl...@apache.org on 2014/11/08 00:41:17 UTC

hadoop git commit: YARN-2632. Document NM Restart feature. Contributed by Junping Du and Vinod Kumar Vavilapalli

Repository: hadoop
Updated Branches:
  refs/heads/trunk c3d475070 -> 1e215e8ba


YARN-2632. Document NM Restart feature. Contributed by Junping Du and Vinod Kumar Vavilapalli


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/1e215e8b
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/1e215e8b
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/1e215e8b

Branch: refs/heads/trunk
Commit: 1e215e8ba2e801eb26f16c307daee756d6b2ca66
Parents: c3d4750
Author: Jason Lowe <jl...@apache.org>
Authored: Fri Nov 7 23:40:22 2014 +0000
Committer: Jason Lowe <jl...@apache.org>
Committed: Fri Nov 7 23:40:22 2014 +0000

----------------------------------------------------------------------
 hadoop-project/src/site/site.xml                |  1 +
 hadoop-yarn-project/CHANGES.txt                 |  3 +
 .../src/site/apt/NodeManagerRestart.apt.vm      | 86 ++++++++++++++++++++
 3 files changed, 90 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-project/src/site/site.xml
----------------------------------------------------------------------
diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml
index e1d4c92..6a61a83 100644
--- a/hadoop-project/src/site/site.xml
+++ b/hadoop-project/src/site/site.xml
@@ -123,6 +123,7 @@
       <item name="Writing YARN Applications" href="hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html"/>
       <item name="YARN Commands" href="hadoop-yarn/hadoop-yarn-site/YarnCommands.html"/>
       <item name="Scheduler Load Simulator" href="hadoop-sls/SchedulerLoadSimulator.html"/>
+      <item name="NodeManager Restart" href="hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html"/>
     </menu>
 
     <menu name="YARN REST APIs" inherit="top">

http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-yarn-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt
index e4b116d..df7e3ea 100644
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@@ -195,6 +195,9 @@ Release 2.6.0 - UNRELEASED
     YARN-2647. Added a queue CLI for getting queue information. (Sunil Govind via
     vinodkv)
 
+    YARN-2632. Document NM Restart feature. (Junping Du and Vinod Kumar
+    Vavilapalli via jlowe)
+
   IMPROVEMENTS
 
     YARN-2197. Add a link to YARN CHANGES.txt in the left side of doc

http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
new file mode 100644
index 0000000..ba03f4e
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm
@@ -0,0 +1,86 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  NodeManager Restart
+  ---
+  ---
+  ${maven.build.timestamp}
+
+NodeManager Restart
+
+* Introduction
+
+  This document gives an overview of NodeManager (NM) restart, a feature that
+  enables the NodeManager to be restarted without losing 
+  the active containers running on the node. At a high level, the NM stores any 
+  necessary state to a local state-store as it processes container-management
+  requests. When the NM restarts, it recovers by first loading state for
+  various subsystems and then letting those subsystems perform recovery using
+  the loaded state.
+
+* Enabling NM Restart
+
+  [[1]] To enable NM Restart functionality, set the following property in <<conf/yarn-site.xml>> to true:
+
+*--------------------------------------+--------------------------------------+
+|| Property                            || Value                                |
+*--------------------------------------+--------------------------------------+
+| <<<yarn.nodemanager.recovery.enabled>>> | |
+| | <<<true>>>, (default value is set to false) |
+*--------------------------------------+--------------------------------------+ 
+
+  [[2]] Configure a path to the local file-system directory where the
+  NodeManager can save its run state
+
+*--------------------------------------+--------------------------------------+
+|| Property                            || Description                        |
+*--------------------------------------+--------------------------------------+
+| <<<yarn.nodemanager.recovery.dir>>> | |
+| | The local filesystem directory in which the node manager will store state |
+| | when recovery is enabled.  |
+| | The default value is set to |
+| | <<<${hadoop.tmp.dir}/yarn-nm-recovery>>>. |
+*--------------------------------------+--------------------------------------+ 
+
+  [[3]] Configure a valid RPC address for the NodeManager
+  
+*--------------------------------------+--------------------------------------+
+|| Property                            || Description                        |
+*--------------------------------------+--------------------------------------+
+| <<<yarn.nodemanager.address>>> | |
+| |   Ephemeral ports (port 0, which is default) cannot be used for the |
+| | NodeManager's RPC server specified via yarn.nodemanager.address as it can |
+| | make NM use different ports before and after a restart. This will break any |
+| | previously running clients that were communicating with the NM before |
+| | restart. Explicitly setting yarn.nodemanager.address to an address with |
+| | specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling |
+| | NM restart. |
+*--------------------------------------+--------------------------------------+
+
+  [[4]] Auxiliary services
+  
+  NodeManagers in a YARN cluster can be configured to run auxiliary services.
+  For a completely functional NM restart, YARN relies on any auxiliary service
+  configured to also support recovery. This usually includes (1) avoiding usage
+  of ephemeral ports so that previously running clients (in this case, usually
+  containers) are not disrupted after restart and (2) having the auxiliary
+  service itself support recoverability by reloading any previous state when
+  NodeManager restarts and reinitializes the auxiliary service.
+  
+  A simple example for the above is the auxiliary service 'ShuffleHandler' for
+  MapReduce (MR). ShuffleHandler respects the above two requirements already,
+  so users/admins don't have do anything for it to support NM restart: (1) The
+  configuration property <<mapreduce.shuffle.port>> controls which port the
+  ShuffleHandler on a NodeManager host binds to, and it defaults to a
+  non-ephemeral port. (2) The ShuffleHandler service also already supports
+  recovery of previous state after NM restarts.
\ No newline at end of file