You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by jh...@apache.org on 2019/03/26 18:27:22 UTC

[hadoop] branch YARN-8200 updated (dcb7d7a -> 9c6dbd8)

This is an automated email from the ASF dual-hosted git repository.

jhung pushed a change to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git.


 discard dcb7d7a  YARN-9271. Backport YARN-6927 for resource type support in MapReduce
 discard a06fc3f  YARN-8183. Fix ConcurrentModificationException inside RMAppAttemptMetrics#convertAtomicLongMaptoLongMap. (Suma Shivaprasad via wangda)
 discard 26e1c49  YARN-7383. Node resource is not parsed correctly for resource names containing dot. Contributed by Gergely Novák.
 discard 5ea878d  YARN-7143. FileNotFound handling in ResourceUtils is inconsistent
 discard a9c58d3  YARN-7345. GPU Isolation: Incorrect minor device numbers written to devices.deny file. (Jonathan Hung via wangda)
 discard ff91a67  YARN-9291. Backport YARN-7637 to branch-2
 discard 7d1832d  YARN-9397. Fix empty NMResourceInfo object test failures in branch-2
 discard e985a39  YARN-7223. Document GPU isolation feature. Contributed by Wangda Tan.
 discard 7cc7a63  YARN-7594. TestNMWebServices#testGetNMResourceInfo fails on trunk. Contributed by Gergely Novák.
 discard f493e0d  YARN-7573. Gpu Information page could be empty for nodes without GPU. (Sunil G via wangda)
 discard a8defcd  YARN-9289. Backport YARN-7330 for GPU in UI to branch-2
 discard 16127ea  YARN-7396. NPE when accessing container logs due to null dirsHandler. Contributed by Jonathan Hung
 discard 010845d  YARN-9174. Backport YARN-7224 for refactoring of GpuDevice class
 discard b4ae7ab  YARN-9280. Backport YARN-6620 to YARN-8200/branch-2 for NodeManager-side GPU isolation
 discard 83ab6b3  YARN-9180. Port YARN-7033 NM recovery of assigned resources to branch-2
 discard 574d7a2  YARN-9187. Backport YARN-6852 for GPU-specific native changes to branch-2
 discard 9d689f7  YARN-9175. Null resources check in ResourceInfo for branch-3.0
 discard 5a60e94  YARN-7137. [YARN-3926] Move newly added APIs to unstable in YARN-3926 branch. Contributed by Wangda Tan.
 discard a77aeb5  YARN-7270 addendum: Reapplied changes after YARN-3926 backports
 discard 1b09071  YARN-9188. Port YARN-7136 to branch-2
     new 0d54ad7  YARN-9188. Port YARN-7136 to branch-2
     new 93fe781  YARN-7270 addendum: Reapplied changes after YARN-3926 backports
     new cbcbfc2  YARN-7137. [YARN-3926] Move newly added APIs to unstable in YARN-3926 branch. Contributed by Wangda Tan.
     new b32e2a7  YARN-9175. Null resources check in ResourceInfo for branch-3.0
     new f0dcb31  YARN-9187. Backport YARN-6852 for GPU-specific native changes to branch-2
     new 4a1c7e6  YARN-9180. Port YARN-7033 NM recovery of assigned resources to branch-2
     new 25167b5  YARN-9280. Backport YARN-6620 to YARN-8200/branch-2 for NodeManager-side GPU isolation
     new faf0b36  YARN-9174. Backport YARN-7224 for refactoring of GpuDevice class
     new 3d5a652  YARN-7396. NPE when accessing container logs due to null dirsHandler. Contributed by Jonathan Hung
     new 9a61778  YARN-9289. Backport YARN-7330 for GPU in UI to branch-2
     new 2116edd  YARN-7573. Gpu Information page could be empty for nodes without GPU. (Sunil G via wangda)
     new 618d015  YARN-7594. TestNMWebServices#testGetNMResourceInfo fails on trunk. Contributed by Gergely Novák.
     new df6a7b0  YARN-7223. Document GPU isolation feature. Contributed by Wangda Tan.
     new f279f92  YARN-9397. Fix empty NMResourceInfo object test failures in branch-2
     new ea259c4  YARN-9291. Backport YARN-7637 to branch-2
     new 05292fe  YARN-7345. GPU Isolation: Incorrect minor device numbers written to devices.deny file. (Jonathan Hung via wangda)
     new f350c42  YARN-7143. FileNotFound handling in ResourceUtils is inconsistent
     new 6239faf  YARN-7383. Node resource is not parsed correctly for resource names containing dot. Contributed by Gergely Novák.
     new 4d9f4e7  YARN-8183. Fix ConcurrentModificationException inside RMAppAttemptMetrics#convertAtomicLongMaptoLongMap. (Suma Shivaprasad via wangda)
     new 9c6dbd8  YARN-9271. Backport YARN-6927 for resource type support in MapReduce

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (dcb7d7a)
            \
             N -- N -- N   refs/heads/YARN-8200 (9c6dbd8)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 20 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java     | 2 --
 1 file changed, 2 deletions(-)


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 13/20: YARN-7223. Document GPU isolation feature. Contributed by Wangda Tan.

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit df6a7b0c1052b8e236d9c462e5cfe02099b56cbf
Author: Sunil G <su...@apache.org>
AuthorDate: Wed Feb 21 14:16:45 2018 +0530

    YARN-7223. Document GPU isolation feature. Contributed by Wangda Tan.
---
 .../src/site/markdown/UsingGpus.md                 | 230 +++++++++++++++++++++
 1 file changed, 230 insertions(+)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/UsingGpus.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/UsingGpus.md
new file mode 100644
index 0000000..f6000e7
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/UsingGpus.md
@@ -0,0 +1,230 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+# Using GPU On YARN
+# Prerequisites
+
+- As of now, only Nvidia GPUs are supported by YARN
+- YARN node managers have to be pre-installed with Nvidia drivers.
+- When Docker is used as container runtime context, nvidia-docker 1.0 needs to be installed (Current supported version in YARN for nvidia-docker).
+
+# Configs
+
+## GPU scheduling
+
+In `resource-types.xml`
+
+Add following properties
+
+```
+<configuration>
+  <property>
+     <name>yarn.resource-types</name>
+     <value>yarn.io/gpu</value>
+  </property>
+</configuration>
+```
+
+In `yarn-site.xml`
+
+`DominantResourceCalculator` MUST be configured to enable GPU scheduling/isolation.
+
+For `Capacity Scheduler`, use following property to configure `DominantResourceCalculator` (In `capacity-scheduler.xml`):
+
+| Property | Default value |
+| --- | --- |
+| 	yarn.scheduler.capacity.resource-calculator | org.apache.hadoop.yarn.util.resource.DominantResourceCalculator |
+
+
+## GPU Isolation
+
+### In `yarn-site.xml`
+
+```
+  <property>
+    <name>yarn.nodemanager.resource-plugins</name>
+    <value>yarn.io/gpu</value>
+  </property>
+```
+
+This is to enable GPU isolation module on NodeManager side.
+
+By default, YARN will automatically detect and config GPUs when above config is set. Following configs need to be set in `yarn-site.xml` only if admin has specialized requirements.
+
+**1) Allowed GPU Devices**
+
+| Property | Default value |
+| --- | --- |
+| yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices | auto |
+
+  Specify GPU devices which can be managed by YARN NodeManager (split by comma).
+  Number of GPU devices will be reported to RM to make scheduling decisions.
+  Set to auto (default) let YARN automatically discover GPU resource from
+  system.
+
+  Manually specify GPU devices if auto detect GPU device failed or admin
+  only want subset of GPU devices managed by YARN. GPU device is identified
+  by their minor device number and index. A common approach to get minor
+  device number of GPUs is using `nvidia-smi -q` and search `Minor Number`
+  output.
+
+  When minor numbers are specified manually, admin needs to include indice of GPUs
+  as well, format is `index:minor_number[,index:minor_number...]`. An example
+  of manual specification is `0:0,1:1,2:2,3:4"`to allow YARN NodeManager to
+  manage GPU devices with indices `0/1/2/3` and minor number `0/1/2/4`.
+  numbers .
+
+**2) Executable to discover GPUs**
+
+| Property | value |
+| --- | --- |
+| yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables | /absolute/path/to/nvidia-smi |
+
+When `yarn.nodemanager.resource.gpu.allowed-gpu-devices=auto` specified,
+YARN NodeManager needs to run GPU discovery binary (now only support
+`nvidia-smi`) to get GPU-related information.
+When value is empty (default), YARN NodeManager will try to locate
+discovery executable itself.
+An example of the config value is: `/usr/local/bin/nvidia-smi`
+
+**3) Docker Plugin Related Configs**
+
+Following configs can be customized when user needs to run GPU applications inside Docker container. They're not required if admin follows default installation/configuration of `nvidia-docker`.
+
+| Property | Default value |
+| --- | --- |
+| yarn.nodemanager.resource-plugins.gpu.docker-plugin | nvidia-docker-v1 |
+
+Specify docker command plugin for GPU. By default uses Nvidia docker V1.0.
+
+| Property | Default value |
+| --- | --- |
+| yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidia-docker-v1.endpoint | http://localhost:3476/v1.0/docker/cli |
+
+Specify end point of `nvidia-docker-plugin`. Please find documentation: https://github.com/NVIDIA/nvidia-docker/wiki For more details.
+
+**4) CGroups mount**
+
+GPU isolation uses CGroup [devices controller](https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt) to do per-GPU device isolation. Following configs should be added to `yarn-site.xml` to automatically mount CGroup sub devices, otherwise admin has to manually create devices subfolder in order to use this feature.
+
+| Property | Default value |
+| --- | --- |
+| yarn.nodemanager.linux-container-executor.cgroups.mount | true |
+
+
+### In `container-executor.cfg`
+
+In general, following config needs to be added to `container-executor.cfg`
+
+```
+[gpu]
+module.enabled=true
+```
+
+When user needs to run GPU applications under non-Docker environment:
+
+```
+[cgroups]
+# This should be same as yarn.nodemanager.linux-container-executor.cgroups.mount-path inside yarn-site.xml
+root=/sys/fs/cgroup
+# This should be same as yarn.nodemanager.linux-container-executor.cgroups.hierarchy inside yarn-site.xml
+yarn-hierarchy=yarn
+```
+
+When user needs to run GPU applications under Docker environment:
+
+**1) Add GPU related devices to docker section:**
+
+Values separated by comma, you can get this by running `ls /dev/nvidia*`
+
+```
+[docker]
+docker.allowed.devices=/dev/nvidiactl,/dev/nvidia-uvm,/dev/nvidia-uvm-tools,/dev/nvidia1,/dev/nvidia0
+```
+
+**2) Add `nvidia-docker` to volume-driver whitelist.**
+
+```
+[docker]
+...
+docker.allowed.volume-drivers
+```
+
+**3) Add `nvidia_driver_<version>` to readonly mounts whitelist.**
+
+```
+[docker]
+...
+docker.allowed.ro-mounts=nvidia_driver_375.66
+```
+
+# Use it
+
+## Distributed-shell + GPU
+
+Distributed shell currently support specify additional resource types other than memory and vcores.
+
+### Distributed-shell + GPU without Docker
+
+Run distributed shell without using docker container (Asks 2 tasks, each task has 3GB memory, 1 vcore, 2 GPU device resource):
+
+```
+yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
+  -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
+  -shell_command /usr/local/nvidia/bin/nvidia-smi \
+  -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2 \
+  -num_containers 2
+```
+
+You should be able to see output like
+
+```
+Tue Dec  5 22:21:47 2017
++-----------------------------------------------------------------------------+
+| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|===============================+======================+======================|
+|   0  Tesla P100-PCIE...  Off  | 0000:04:00.0     Off |                    0 |
+| N/A   30C    P0    24W / 250W |      0MiB / 12193MiB |      0%      Default |
++-------------------------------+----------------------+----------------------+
+|   1  Tesla P100-PCIE...  Off  | 0000:82:00.0     Off |                    0 |
+| N/A   34C    P0    25W / 250W |      0MiB / 12193MiB |      0%      Default |
++-------------------------------+----------------------+----------------------+
+
++-----------------------------------------------------------------------------+
+| Processes:                                                       GPU Memory |
+|  GPU       PID  Type  Process name                               Usage      |
+|=============================================================================|
+|  No running processes found                                                 |
++-----------------------------------------------------------------------------+
+```
+
+For launched container task.
+
+### Distributed-shell + GPU with Docker
+
+You can also run distributed shell with Docker container. `YARN_CONTAINER_RUNTIME_TYPE`/`YARN_CONTAINER_RUNTIME_DOCKER_IMAGE` must be specified to use docker container.
+
+```
+yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
+       -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
+       -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \
+       -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<docker-image-name> \
+       -shell_command nvidia-smi \
+       -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2 \
+       -num_containers 2
+```
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 03/20: YARN-7137. [YARN-3926] Move newly added APIs to unstable in YARN-3926 branch. Contributed by Wangda Tan.

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit cbcbfc247913987cd02c80f857ef3d0944d0f187
Author: Sunil G <su...@apache.org>
AuthorDate: Tue Sep 12 20:31:47 2017 +0530

    YARN-7137. [YARN-3926] Move newly added APIs to unstable in YARN-3926 branch. Contributed by Wangda Tan.
    
    (cherry picked from commit da0b6a354bf6f6bf37ca5a05a4a8eece09aa4893)
    (cherry picked from commit 74030d808cd95e26a0c48500c08d269fcb4150ee)
---
 .../apache/hadoop/yarn/api/records/Resource.java   | 24 +++++++++++-----------
 .../hadoop/yarn/api/records/ResourceRequest.java   |  1 +
 .../hadoop/yarn/util/resource/ResourceUtils.java   | 19 -----------------
 .../hadoop/yarn/util/resource/package-info.java    |  6 +-----
 .../server/resourcemanager/webapp/dao/AppInfo.java |  2 +-
 5 files changed, 15 insertions(+), 37 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
index be0ab58..7e8c01d 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
@@ -206,8 +206,8 @@ public abstract class Resource implements Comparable<Resource> {
    *
    * @return Map of resource name to ResourceInformation
    */
-  @Public
-  @Evolving
+  @InterfaceAudience.Private
+  @InterfaceStability.Unstable
   public ResourceInformation[] getResources() {
     return resources;
   }
@@ -220,7 +220,7 @@ public abstract class Resource implements Comparable<Resource> {
    * @throws ResourceNotFoundException if the resource can't be found
    */
   @Public
-  @Evolving
+  @InterfaceStability.Unstable
   public ResourceInformation getResourceInformation(String resource)
       throws ResourceNotFoundException {
     Integer index = ResourceUtils.getResourceTypeIndex().get(resource);
@@ -240,8 +240,8 @@ public abstract class Resource implements Comparable<Resource> {
    * @throws ResourceNotFoundException
    *           if the resource can't be found
    */
-  @Public
-  @Evolving
+  @InterfaceAudience.Private
+  @InterfaceStability.Unstable
   public ResourceInformation getResourceInformation(int index)
       throws ResourceNotFoundException {
     ResourceInformation ri = null;
@@ -262,7 +262,7 @@ public abstract class Resource implements Comparable<Resource> {
    * @throws ResourceNotFoundException if the resource can't be found
    */
   @Public
-  @Evolving
+  @InterfaceStability.Unstable
   public long getResourceValue(String resource)
       throws ResourceNotFoundException {
     return getResourceInformation(resource).getValue();
@@ -276,7 +276,7 @@ public abstract class Resource implements Comparable<Resource> {
    * @throws ResourceNotFoundException if the resource is not found
    */
   @Public
-  @Evolving
+  @InterfaceStability.Unstable
   public void setResourceInformation(String resource,
       ResourceInformation resourceInformation)
       throws ResourceNotFoundException {
@@ -302,8 +302,8 @@ public abstract class Resource implements Comparable<Resource> {
    * @throws ResourceNotFoundException
    *           if the resource is not found
    */
-  @Public
-  @Evolving
+  @InterfaceAudience.Private
+  @InterfaceStability.Unstable
   public void setResourceInformation(int index,
       ResourceInformation resourceInformation)
       throws ResourceNotFoundException {
@@ -323,7 +323,7 @@ public abstract class Resource implements Comparable<Resource> {
    * @throws ResourceNotFoundException if the resource is not found
    */
   @Public
-  @Evolving
+  @InterfaceStability.Unstable
   public void setResourceValue(String resource, long value)
       throws ResourceNotFoundException {
     if (resource.equals(ResourceInformation.MEMORY_URI)) {
@@ -350,8 +350,8 @@ public abstract class Resource implements Comparable<Resource> {
    * @throws ResourceNotFoundException
    *           if the resource is not found
    */
-  @Public
-  @Evolving
+  @InterfaceAudience.Private
+  @InterfaceStability.Unstable
   public void setResourceValue(int index, long value)
       throws ResourceNotFoundException {
     try {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
index 94eda7c..e1a98ae 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
@@ -21,6 +21,7 @@ package org.apache.hadoop.yarn.api.records;
 import java.io.Serializable;
 
 import org.apache.hadoop.classification.InterfaceAudience.Public;
+import org.apache.hadoop.classification.InterfaceStability;
 import org.apache.hadoop.classification.InterfaceStability.Evolving;
 import org.apache.hadoop.classification.InterfaceStability.Stable;
 import org.apache.hadoop.classification.InterfaceStability.Unstable;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
index 110453a..1da5d6a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
@@ -49,8 +49,6 @@ import java.util.concurrent.ConcurrentHashMap;
 /**
  * Helper class to read the resource-types to be supported by the system.
  */
-@InterfaceAudience.Public
-@InterfaceStability.Unstable
 public class ResourceUtils {
 
   public static final String UNITS = ".units";
@@ -65,7 +63,6 @@ public class ResourceUtils {
   private static final Map<String, Integer> RESOURCE_NAME_TO_INDEX =
       new ConcurrentHashMap<String, Integer>();
   private static volatile Map<String, ResourceInformation> resourceTypes;
-  private static volatile String[] resourceNamesArray;
   private static volatile ResourceInformation[] resourceTypesArray;
   private static volatile boolean initializedNodeResources = false;
   private static volatile Map<String, ResourceInformation> readOnlyNodeResources;
@@ -270,7 +267,6 @@ public class ResourceUtils {
 
   private static void updateKnownResources() {
     // Update resource names.
-    resourceNamesArray = new String[resourceTypes.size()];
     resourceTypesArray = new ResourceInformation[resourceTypes.size()];
 
     int index = 2;
@@ -278,14 +274,11 @@ public class ResourceUtils {
       if (resInfo.getName().equals(MEMORY)) {
         resourceTypesArray[0] = ResourceInformation
             .newInstance(resourceTypes.get(MEMORY));
-        resourceNamesArray[0] = MEMORY;
       } else if (resInfo.getName().equals(VCORES)) {
         resourceTypesArray[1] = ResourceInformation
             .newInstance(resourceTypes.get(VCORES));
-        resourceNamesArray[1] = VCORES;
       } else {
         resourceTypesArray[index] = ResourceInformation.newInstance(resInfo);
-        resourceNamesArray[index] = resInfo.getName();
         index++;
       }
     }
@@ -319,18 +312,6 @@ public class ResourceUtils {
         YarnConfiguration.RESOURCE_TYPES_CONFIGURATION_FILE);
   }
 
-  /**
-   * Get resource names array, this is mostly for performance perspective. Never
-   * modify returned array.
-   *
-   * @return resourceNamesArray
-   */
-  public static String[] getResourceNamesArray() {
-    initializeResourceTypesIfNeeded(null,
-        YarnConfiguration.RESOURCE_TYPES_CONFIGURATION_FILE);
-    return resourceNamesArray;
-  }
-
   public static ResourceInformation[] getResourceTypesArray() {
     initializeResourceTypesIfNeeded(null,
         YarnConfiguration.RESOURCE_TYPES_CONFIGURATION_FILE);
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/package-info.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/package-info.java
index 1e925d7..d7c799d 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/package-info.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/package-info.java
@@ -19,8 +19,4 @@
  * Package org.apache.hadoop.yarn.util.resource contains classes
  * which is used as utility class for resource profile computations.
  */
-@InterfaceAudience.Public
-@InterfaceStability.Unstable
-package org.apache.hadoop.yarn.util.resource;
-import org.apache.hadoop.classification.InterfaceAudience;
-import org.apache.hadoop.classification.InterfaceStability;
\ No newline at end of file
+package org.apache.hadoop.yarn.util.resource;
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
index 1c9df71..880d22f 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
@@ -503,7 +503,7 @@ public class AppInfo {
   public Map<String, Long> getPreemptedResourceSecondsMap() {
     return preemptedResourceSecondsMap;
   }
-  
+
   public List<ResourceRequestInfo> getResourceRequests() {
     return this.resourceRequests;
   }


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 17/20: YARN-7143. FileNotFound handling in ResourceUtils is inconsistent

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit f350c42f6a1a6caecea3d78a603620f345e38bb6
Author: Daniel Templeton <te...@apache.org>
AuthorDate: Thu Nov 9 10:36:49 2017 -0800

    YARN-7143. FileNotFound handling in ResourceUtils is inconsistent
    
    Change-Id: Ib1bb487e14a15edd2b5a42cf5078c5a2b295f069
    (cherry picked from commit db82a41d94872cea4d0c1bb1336916cebc2faeec)
---
 .../hadoop/yarn/util/resource/ResourceUtils.java   | 52 +++++++++-------------
 1 file changed, 22 insertions(+), 30 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
index f3edc74..abf58a6 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
@@ -338,18 +338,14 @@ public class ResourceUtils {
     if (!initializedResources) {
       synchronized (ResourceUtils.class) {
         if (!initializedResources) {
-          if (conf == null) {
-            conf = new YarnConfiguration();
-          }
-          try {
-            addResourcesFileToConf(resourceFile, conf);
-            LOG.debug("Found " + resourceFile + ", adding to configuration");
-          } catch (FileNotFoundException fe) {
-            LOG.info("Unable to find '" + resourceFile
-                + "'. Falling back to memory and vcores as resources.");
+          Configuration resConf = conf;
+
+          if (resConf == null) {
+            resConf = new YarnConfiguration();
           }
-          initializeResourcesMap(conf);
 
+          addResourcesFileToConf(resourceFile, resConf);
+          initializeResourcesMap(resConf);
         }
       }
     }
@@ -386,21 +382,17 @@ public class ResourceUtils {
   }
 
   private static void addResourcesFileToConf(String resourceFile,
-      Configuration conf) throws FileNotFoundException {
+      Configuration conf) {
     try {
       InputStream ris = getConfInputStream(resourceFile, conf);
       LOG.debug("Found " + resourceFile + ", adding to configuration");
       conf.addResource(ris);
     } catch (FileNotFoundException fe) {
-      throw fe;
-    } catch (IOException ie) {
+      LOG.info("Unable to find '" + resourceFile + "'.");
+    } catch (IOException | YarnException ex) {
       LOG.fatal("Exception trying to read resource types configuration '"
-          + resourceFile + "'.", ie);
-      throw new YarnRuntimeException(ie);
-    } catch (YarnException ye) {
-      LOG.fatal("YARN Exception trying to read resource types configuration '"
-          + resourceFile + "'.", ye);
-      throw new YarnRuntimeException(ye);
+          + resourceFile + "'.", ex);
+      throw new YarnRuntimeException(ex);
     }
   }
 
@@ -462,19 +454,19 @@ public class ResourceUtils {
   private static Map<String, ResourceInformation> initializeNodeResourceInformation(
       Configuration conf) {
     Map<String, ResourceInformation> nodeResources = new HashMap<>();
-    try {
-      addResourcesFileToConf(
-          YarnConfiguration.NODE_RESOURCES_CONFIGURATION_FILE, conf);
-      for (Map.Entry<String, String> entry : conf) {
-        String key = entry.getKey();
-        String value = entry.getValue();
-        if (key.startsWith(YarnConfiguration.NM_RESOURCES_PREFIX)) {
-          addResourceInformation(key, value, nodeResources);
-        }
+
+    addResourcesFileToConf(YarnConfiguration.NODE_RESOURCES_CONFIGURATION_FILE,
+        conf);
+
+    for (Map.Entry<String, String> entry : conf) {
+      String key = entry.getKey();
+      String value = entry.getValue();
+
+      if (key.startsWith(YarnConfiguration.NM_RESOURCES_PREFIX)) {
+        addResourceInformation(key, value, nodeResources);
       }
-    } catch (FileNotFoundException fe) {
-      LOG.info("Couldn't find node resources file");
     }
+
     return nodeResources;
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 07/20: YARN-9280. Backport YARN-6620 to YARN-8200/branch-2 for NodeManager-side GPU isolation

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 25167b5e954550486f983da55d083f9f518621c3
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Wed Feb 6 14:41:03 2019 -0800

    YARN-9280. Backport YARN-6620 to YARN-8200/branch-2 for NodeManager-side GPU isolation
---
 .../yarn/api/records/ResourceInformation.java      |  10 +
 .../apache/hadoop/yarn/conf/YarnConfiguration.java |  33 ++
 .../hadoop/yarn/util/resource/ResourceUtils.java   |  51 +-
 .../src/main/resources/yarn-default.xml            |  39 ++
 .../yarn/util/resource/TestResourceUtils.java      |  17 +
 .../yarn/server/nodemanager/ContainerExecutor.java |   3 +-
 .../hadoop/yarn/server/nodemanager/Context.java    |   3 +
 .../nodemanager/DefaultContainerExecutor.java      |   2 +-
 .../nodemanager/DockerContainerExecutor.java       |   2 +-
 .../server/nodemanager/LinuxContainerExecutor.java |  10 +-
 .../yarn/server/nodemanager/NodeManager.java       |  92 ++--
 .../server/nodemanager/NodeStatusUpdaterImpl.java  |  38 +-
 .../linux/privileged/PrivilegedOperation.java      |   1 +
 .../linux/resources/ResourceHandlerChain.java      |   4 +-
 .../linux/resources/ResourceHandlerModule.java     |  42 +-
 .../linux/resources/gpu/GpuResourceAllocator.java  | 242 +++++++++
 .../resources/gpu/GpuResourceHandlerImpl.java      | 153 ++++++
 .../resourceplugin/NodeResourceUpdaterPlugin.java  |  52 ++
 .../resourceplugin/ResourcePlugin.java             |  83 ++++
 .../resourceplugin/ResourcePluginManager.java      | 106 ++++
 .../resourceplugin/gpu/GpuDiscoverer.java          | 254 ++++++++++
 .../gpu/GpuNodeResourceUpdateHandler.java          |  66 +++
 .../resourceplugin/gpu/GpuResourcePlugin.java      |  61 +++
 .../webapp/dao/gpu/GpuDeviceInformation.java       |  72 +++
 .../webapp/dao/gpu/GpuDeviceInformationParser.java |  87 ++++
 .../webapp/dao/gpu/PerGpuDeviceInformation.java    | 165 +++++++
 .../webapp/dao/gpu/PerGpuMemoryUsage.java          |  58 +++
 .../webapp/dao/gpu/PerGpuTemperature.java          |  80 +++
 .../webapp/dao/gpu/PerGpuUtilizations.java         |  50 ++
 .../server/nodemanager/NodeManagerTestBase.java    | 164 ++++++
 .../nodemanager/TestDefaultContainerExecutor.java  |   4 +-
 .../TestDockerContainerExecutorWithMocks.java      |   2 +-
 .../nodemanager/TestLinuxContainerExecutor.java    |   2 +-
 .../TestLinuxContainerExecutorWithMocks.java       |   2 +-
 .../yarn/server/nodemanager/TestNodeManager.java   |   2 +-
 .../server/nodemanager/TestNodeStatusUpdater.java  | 100 +---
 .../nodemanager/amrmproxy/BaseAMRMProxyTest.java   |  46 +-
 .../linux/resources/TestResourceHandlerModule.java |   8 +-
 .../resources/gpu/TestGpuResourceHandler.java      | 385 +++++++++++++++
 .../TestContainersMonitorResourceChange.java       |   2 +-
 .../resourceplugin/TestResourcePluginManager.java  | 261 ++++++++++
 .../resourceplugin/gpu/TestGpuDiscoverer.java      | 123 +++++
 .../dao/gpu/TestGpuDeviceInformationParser.java    |  50 ++
 .../test/resources/nvidia-smi-sample-xml-output    | 547 +++++++++++++++++++++
 44 files changed, 3368 insertions(+), 206 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceInformation.java
index 0cc1e9c..8917a84 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceInformation.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceInformation.java
@@ -18,10 +18,13 @@
 
 package org.apache.hadoop.yarn.api.records;
 
+import com.google.common.collect.ImmutableMap;
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.yarn.api.protocolrecords.ResourceTypes;
 import org.apache.hadoop.yarn.util.UnitsConversionUtil;
 
+import java.util.Map;
+
 /**
  * Class to encapsulate information about a Resource - the name of the resource,
  * the units(milli, micro, etc), the type(countable), and the value.
@@ -35,13 +38,20 @@ public class ResourceInformation implements Comparable<ResourceInformation> {
   private long minimumAllocation;
   private long maximumAllocation;
 
+  // Known resource types
   public static final String MEMORY_URI = "memory-mb";
   public static final String VCORES_URI = "vcores";
+  public static final String GPU_URI = "yarn.io/gpu";
 
   public static final ResourceInformation MEMORY_MB =
       ResourceInformation.newInstance(MEMORY_URI, "Mi");
   public static final ResourceInformation VCORES =
       ResourceInformation.newInstance(VCORES_URI);
+  public static final ResourceInformation GPUS =
+      ResourceInformation.newInstance(GPU_URI);
+
+  public static final Map<String, ResourceInformation> MANDATORY_RESOURCES =
+      ImmutableMap.of(MEMORY_URI, MEMORY_MB, VCORES_URI, VCORES, GPU_URI, GPUS);
 
   /**
    * Get the name for the resource.
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
index 4bf180a..ce1b893 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
@@ -1411,6 +1411,39 @@ public class YarnConfiguration extends Configuration {
   public static final String NM_NETWORK_RESOURCE_OUTBOUND_BANDWIDTH_YARN_MBIT =
       NM_NETWORK_RESOURCE_PREFIX + "outbound-bandwidth-yarn-mbit";
 
+  /**
+   * Prefix for computation resources, example of computation resources like
+   * GPU / FPGA / TPU, etc.
+   */
+  @Private
+  public static final String NM_RESOURCE_PLUGINS =
+      NM_PREFIX + "resource-plugins";
+
+  /**
+   * Prefix for gpu configurations. Work in progress: This configuration
+   * parameter may be changed/removed in the future.
+   */
+  @Private
+  public static final String NM_GPU_RESOURCE_PREFIX =
+      NM_RESOURCE_PLUGINS + ".gpu.";
+
+  @Private
+  public static final String NM_GPU_ALLOWED_DEVICES =
+      NM_GPU_RESOURCE_PREFIX + "allowed-gpu-devices";
+  @Private
+  public static final String AUTOMATICALLY_DISCOVER_GPU_DEVICES = "auto";
+
+  /**
+   * This setting controls where to how to invoke GPU binaries
+   */
+  @Private
+  public static final String NM_GPU_PATH_TO_EXEC =
+      NM_GPU_RESOURCE_PREFIX + "path-to-discovery-executables";
+
+  @Private
+  public static final String DEFAULT_NM_GPU_PATH_TO_EXEC = "";
+
+
   /** NM Webapp address.**/
   public static final String NM_WEBAPP_ADDRESS = NM_PREFIX + "webapp.address";
   public static final int DEFAULT_NM_WEBAPP_PORT = 8042;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
index 1da5d6a..f3edc74 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
@@ -46,6 +46,8 @@ import java.util.List;
 import java.util.Map;
 import java.util.concurrent.ConcurrentHashMap;
 
+import static org.apache.hadoop.yarn.api.records.ResourceInformation.GPU_URI;
+
 /**
  * Helper class to read the resource-types to be supported by the system.
  */
@@ -82,33 +84,32 @@ public class ResourceUtils {
      */
     String key = "memory";
     if (resourceInformationMap.containsKey(key)) {
-      LOG.warn("Attempt to define resource '" + key +
-          "', but it is not allowed.");
-      throw new YarnRuntimeException("Attempt to re-define mandatory resource '"
-          + key + "'.");
+      LOG.warn(
+          "Attempt to define resource '" + key + "', but it is not allowed.");
+      throw new YarnRuntimeException(
+          "Attempt to re-define mandatory resource '" + key + "'.");
     }
 
-    if (resourceInformationMap.containsKey(MEMORY)) {
-      ResourceInformation memInfo = resourceInformationMap.get(MEMORY);
-      String memUnits = ResourceInformation.MEMORY_MB.getUnits();
-      ResourceTypes memType = ResourceInformation.MEMORY_MB.getResourceType();
-      if (!memInfo.getUnits().equals(memUnits) || !memInfo.getResourceType()
-          .equals(memType)) {
-        throw new YarnRuntimeException(
-            "Attempt to re-define mandatory resource 'memory-mb'. It can only"
-                + " be of type 'COUNTABLE' and have units 'Mi'.");
-      }
-    }
-
-    if (resourceInformationMap.containsKey(VCORES)) {
-      ResourceInformation vcoreInfo = resourceInformationMap.get(VCORES);
-      String vcoreUnits = ResourceInformation.VCORES.getUnits();
-      ResourceTypes vcoreType = ResourceInformation.VCORES.getResourceType();
-      if (!vcoreInfo.getUnits().equals(vcoreUnits) || !vcoreInfo
-          .getResourceType().equals(vcoreType)) {
-        throw new YarnRuntimeException(
-            "Attempt to re-define mandatory resource 'vcores'. It can only be"
-                + " of type 'COUNTABLE' and have units ''(no units).");
+    for (Map.Entry<String, ResourceInformation> mandatoryResourceEntry :
+        ResourceInformation.MANDATORY_RESOURCES.entrySet()) {
+      key = mandatoryResourceEntry.getKey();
+      ResourceInformation mandatoryRI = mandatoryResourceEntry.getValue();
+
+      ResourceInformation newDefinedRI = resourceInformationMap.get(key);
+      if (newDefinedRI != null) {
+        String expectedUnit = mandatoryRI.getUnits();
+        ResourceTypes expectedType = mandatoryRI.getResourceType();
+        String actualUnit = newDefinedRI.getUnits();
+        ResourceTypes actualType = newDefinedRI.getResourceType();
+
+        if (!expectedUnit.equals(actualUnit) || !expectedType.equals(
+            actualType)) {
+          throw new YarnRuntimeException("Defined mandatory resource type="
+              + key + " inside resource-types.xml, however its type or "
+              + "unit is conflict to mandatory resource types, expected type="
+              + expectedType + ", unit=" + expectedUnit + "; actual type="
+              + actualType + " actual unit=" + actualUnit);
+        }
       }
     }
   }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
index 768deb2..5392b39 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
@@ -3456,6 +3456,45 @@
   </property>
 
   <property>
+    <description>
+      When yarn.nodemanager.resource.gpu.allowed-gpu-devices=auto specified,
+      YARN NodeManager needs to run GPU discovery binary (now only support
+      nvidia-smi) to get GPU-related information.
+      When value is empty (default), YARN NodeManager will try to locate
+      discovery executable itself.
+      An example of the config value is: /usr/local/bin/nvidia-smi
+    </description>
+    <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
+    <value></value>
+  </property>
+
+  <property>
+    <description>
+      Enable additional discovery/isolation of resources on the NodeManager,
+      split by comma. By default, this is empty. Acceptable values: { "yarn-io/gpu" }.
+    </description>
+    <name>yarn.nodemanager.resource-plugins</name>
+    <value></value>
+  </property>
+
+  <property>
+    <description>
+      Specify GPU devices which can be managed by YARN NodeManager, split by comma
+      Number of GPU devices will be reported to RM to make scheduling decisions.
+      Set to auto (default) let YARN automatically discover GPU resource from
+      system.
+      Manually specify GPU devices if auto detect GPU device failed or admin
+      only want subset of GPU devices managed by YARN. GPU device is identified
+      by their minor device number. A common approach to get minor device number
+      of GPUs is using "nvidia-smi -q" and search "Minor Number" output. An
+      example of manual specification is "0,1,2,4" to allow YARN NodeManager
+      to manage GPU devices with minor number 0/1/2/4.
+    </description>
+    <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
+    <value>auto</value>
+  </property>
+
+  <property>
     <description>The http address of the timeline reader web application.</description>
     <name>yarn.timeline-service.reader.webapp.address</name>
     <value>${yarn.timeline-service.webapp.address}</value>
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
index d6bab92..80555ca 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
@@ -52,6 +52,23 @@ public class TestResourceUtils {
     }
   }
 
+  public static void addNewTypesToResources(String... resourceTypes) {
+    // Initialize resource map
+    Map<String, ResourceInformation> riMap = new HashMap<>();
+
+    // Initialize mandatory resources
+    riMap.put(ResourceInformation.MEMORY_URI, ResourceInformation.MEMORY_MB);
+    riMap.put(ResourceInformation.VCORES_URI, ResourceInformation.VCORES);
+
+    for (String newResource : resourceTypes) {
+      riMap.put(newResource, ResourceInformation
+          .newInstance(newResource, "", 0, ResourceTypes.COUNTABLE, 0,
+              Integer.MAX_VALUE));
+    }
+
+    ResourceUtils.initializeResourcesFromResourceInformationMap(riMap);
+  }
+
   @Before
   public void setup() {
     ResourceUtils.resetResourceTypes();
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
index 1851a1d..5f13bb4 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
@@ -112,9 +112,10 @@ public abstract class ContainerExecutor implements Configurable {
    * Run the executor initialization steps.
    * Verify that the necessary configs and permissions are in place.
    *
+   * @param nmContext Context of NM
    * @throws IOException if initialization fails
    */
-  public abstract void init() throws IOException;
+  public abstract void init(Context nmContext) throws IOException;
 
   /**
    * This function localizes the JAR file on-demand.
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java
index 33cefea..7e16034 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java
@@ -34,6 +34,7 @@ import org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
 
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService;
 import org.apache.hadoop.yarn.server.scheduler.OpportunisticContainerAllocator;
 import org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager;
@@ -122,4 +123,6 @@ public interface Context {
   void setNMTimelinePublisher(NMTimelinePublisher nmMetricsPublisher);
 
   NMTimelinePublisher getNMTimelinePublisher();
+
+  ResourcePluginManager getResourcePluginManager();
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
index b54b7f5..e659c3e 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
@@ -134,7 +134,7 @@ public class DefaultContainerExecutor extends ContainerExecutor {
   }
 
   @Override
-  public void init() throws IOException {
+  public void init(Context nmContext) throws IOException {
     // nothing to do or verify here
   }
 
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
index a044cb6..6c2eb96 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
@@ -117,7 +117,7 @@ public class DockerContainerExecutor extends ContainerExecutor {
   }
 
   @Override
-  public void init() throws IOException {
+  public void init(Context nmContext) throws IOException {
     String auth =
       getConf().get(CommonConfigurationKeys.HADOOP_SECURITY_AUTHENTICATION);
     if (auth != null && !auth.equals("simple")) {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
index c33d4be..a1ec820 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
@@ -20,6 +20,7 @@ package org.apache.hadoop.yarn.server.nodemanager;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Optional;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.apache.hadoop.conf.Configuration;
@@ -281,7 +282,7 @@ public class LinuxContainerExecutor extends ContainerExecutor {
   }
 
   @Override
-  public void init() throws IOException {
+  public void init(Context nmContext) throws IOException {
     Configuration conf = super.getConf();
 
     // Send command to executor which will just start up,
@@ -305,7 +306,7 @@ public class LinuxContainerExecutor extends ContainerExecutor {
 
     try {
       resourceHandlerChain = ResourceHandlerModule
-          .getConfiguredResourceHandlerChain(conf);
+          .getConfiguredResourceHandlerChain(conf, nmContext);
       if (LOG.isDebugEnabled()) {
         LOG.debug("Resource handler chain enabled = " + (resourceHandlerChain
             != null));
@@ -845,4 +846,9 @@ public class LinuxContainerExecutor extends ContainerExecutor {
           e);
     }
   }
+
+  @VisibleForTesting
+  public ResourceHandler getResourceHandler() {
+    return resourceHandlerChain;
+  }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
index fcb5474..c74b54e 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
@@ -18,23 +18,7 @@
 
 package org.apache.hadoop.yarn.server.nodemanager;
 
-import java.io.IOException;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
-import java.util.concurrent.ConcurrentLinkedQueue;
-import java.util.concurrent.ConcurrentMap;
-import java.util.concurrent.ConcurrentSkipListMap;
-import java.util.concurrent.atomic.AtomicBoolean;
-
-import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEvent;
-import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl;
-import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerState;
-import org.apache.hadoop.yarn.state.MultiStateTransitionListener;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
+import com.google.common.annotations.VisibleForTesting;
 import org.apache.hadoop.classification.InterfaceAudience.Private;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
@@ -65,12 +49,16 @@ import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
 import org.apache.hadoop.yarn.server.api.protocolrecords.LogAggregationReport;
 import org.apache.hadoop.yarn.server.api.records.AppCollectorData;
 import org.apache.hadoop.yarn.server.api.records.NodeHealthStatus;
-import org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManager;
 import org.apache.hadoop.yarn.server.nodemanager.collectormanager.NMCollectorService;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManager;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEvent;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerState;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager;
 import org.apache.hadoop.yarn.server.nodemanager.metrics.NodeManagerMetrics;
 import org.apache.hadoop.yarn.server.nodemanager.nodelabels.ConfigurationNodeLabelsProvider;
 import org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider;
@@ -78,14 +66,25 @@ import org.apache.hadoop.yarn.server.nodemanager.nodelabels.ScriptBasedNodeLabel
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMNullStateStoreService;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService;
-import org.apache.hadoop.yarn.server.scheduler.OpportunisticContainerAllocator;
 import org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager;
 import org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM;
 import org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher;
 import org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer;
+import org.apache.hadoop.yarn.server.scheduler.OpportunisticContainerAllocator;
 import org.apache.hadoop.yarn.server.security.ApplicationACLsManager;
+import org.apache.hadoop.yarn.state.MultiStateTransitionListener;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
-import com.google.common.annotations.VisibleForTesting;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ConcurrentSkipListMap;
+import java.util.concurrent.atomic.AtomicBoolean;
 
 public class NodeManager extends CompositeService 
     implements EventHandler<NodeManagerEvent> {
@@ -332,6 +331,18 @@ public class NodeManager extends CompositeService
         nmCheckintervalTime, scriptTimeout, scriptArgs);
   }
 
+  @VisibleForTesting
+  protected ResourcePluginManager createResourcePluginManager() {
+    return new ResourcePluginManager();
+  }
+
+  @VisibleForTesting
+  protected ContainerExecutor createContainerExecutor(Configuration conf) {
+    return ReflectionUtils.newInstance(
+        conf.getClass(YarnConfiguration.NM_CONTAINER_EXECUTOR,
+            DefaultContainerExecutor.class, ContainerExecutor.class), conf);
+  }
+
   @Override
   protected void serviceInit(Configuration conf) throws Exception {
 
@@ -360,11 +371,20 @@ public class NodeManager extends CompositeService
     
     this.aclsManager = new ApplicationACLsManager(conf);
 
-    ContainerExecutor exec = ReflectionUtils.newInstance(
-        conf.getClass(YarnConfiguration.NM_CONTAINER_EXECUTOR,
-          DefaultContainerExecutor.class, ContainerExecutor.class), conf);
+    boolean isDistSchedulingEnabled =
+        conf.getBoolean(YarnConfiguration.DIST_SCHEDULING_ENABLED,
+            YarnConfiguration.DEFAULT_DIST_SCHEDULING_ENABLED);
+
+    this.context = createNMContext(containerTokenSecretManager,
+        nmTokenSecretManager, nmStore, isDistSchedulingEnabled, conf);
+
+    ResourcePluginManager pluginManager = createResourcePluginManager();
+    pluginManager.initialize(context);
+    ((NMContext)context).setResourcePluginManager(pluginManager);
+
+    ContainerExecutor exec = createContainerExecutor(conf);
     try {
-      exec.init();
+      exec.init(context);
     } catch (IOException e) {
       throw new YarnRuntimeException("Failed to initialize container executor", e);
     }    
@@ -380,13 +400,6 @@ public class NodeManager extends CompositeService
             getNodeHealthScriptRunner(conf), dirsHandler);
     addService(nodeHealthChecker);
 
-    boolean isDistSchedulingEnabled =
-        conf.getBoolean(YarnConfiguration.DIST_SCHEDULING_ENABLED,
-            YarnConfiguration.DEFAULT_DIST_SCHEDULING_ENABLED);
-
-    this.context = createNMContext(containerTokenSecretManager,
-        nmTokenSecretManager, nmStore, isDistSchedulingEnabled, conf);
-
 
     ((NMContext)context).setContainerExecutor(exec);
 
@@ -460,6 +473,12 @@ public class NodeManager extends CompositeService
     try {
       super.serviceStop();
       DefaultMetricsSystem.shutdown();
+
+      // Cleanup ResourcePluginManager
+      ResourcePluginManager rpm = context.getResourcePluginManager();
+      if (rpm != null) {
+        rpm.cleanup();
+      }
     } finally {
       // YARN-3641: NM's services stop get failed shouldn't block the
       // release of NMLevelDBStore.
@@ -607,6 +626,8 @@ public class NodeManager extends CompositeService
 
     private NMTimelinePublisher nmTimelinePublisher;
 
+    private ResourcePluginManager resourcePluginManager;
+
     public NMContext(NMContainerTokenSecretManager containerTokenSecretManager,
         NMTokenSecretManagerInNM nmTokenSecretManager,
         LocalDirsHandlerService dirsHandler, ApplicationACLsManager aclsManager,
@@ -807,6 +828,15 @@ public class NodeManager extends CompositeService
     public NMTimelinePublisher getNMTimelinePublisher() {
       return nmTimelinePublisher;
     }
+
+    public ResourcePluginManager getResourcePluginManager() {
+      return resourcePluginManager;
+    }
+
+    public void setResourcePluginManager(
+        ResourcePluginManager resourcePluginManager) {
+      this.resourcePluginManager = resourcePluginManager;
+    }
   }
 
   /**
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
index 888ee85..d776bdf 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
@@ -33,6 +33,9 @@ import java.util.Map.Entry;
 import java.util.Random;
 import java.util.Set;
 import java.util.concurrent.ConcurrentLinkedQueue;
+
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePlugin;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -178,14 +181,15 @@ public class NodeStatusUpdaterImpl extends AbstractService implements
     long memoryMb = totalResource.getMemorySize();
     float vMemToPMem =
         conf.getFloat(
-            YarnConfiguration.NM_VMEM_PMEM_RATIO, 
-            YarnConfiguration.DEFAULT_NM_VMEM_PMEM_RATIO); 
+            YarnConfiguration.NM_VMEM_PMEM_RATIO,
+            YarnConfiguration.DEFAULT_NM_VMEM_PMEM_RATIO);
     long virtualMemoryMb = (long)Math.ceil(memoryMb * vMemToPMem);
-    
     int virtualCores = totalResource.getVirtualCores();
-    LOG.info("Nodemanager resources: memory set to " + memoryMb + "MB.");
-    LOG.info("Nodemanager resources: vcores set to " + virtualCores + ".");
-    LOG.info("Nodemanager resources: " + totalResource);
+
+    // Update configured resources via plugins.
+    updateConfiguredResourcesViaPlugins(totalResource);
+
+    LOG.info("Nodemanager resources is set to: " + totalResource);
 
     metrics.addResource(totalResource);
 
@@ -342,12 +346,27 @@ public class NodeStatusUpdaterImpl extends AbstractService implements
     return ServerRMProxy.createRMProxy(conf, ResourceTracker.class);
   }
 
+  private void updateConfiguredResourcesViaPlugins(
+      Resource configuredResource) throws YarnException {
+    ResourcePluginManager pluginManager = context.getResourcePluginManager();
+    if (pluginManager != null && pluginManager.getNameToPlugins() != null) {
+      // Update configured resource
+      for (ResourcePlugin resourcePlugin : pluginManager.getNameToPlugins()
+          .values()) {
+        if (resourcePlugin.getNodeResourceHandlerInstance() != null) {
+          resourcePlugin.getNodeResourceHandlerInstance()
+              .updateConfiguredResource(configuredResource);
+        }
+      }
+    }
+  }
+
   @VisibleForTesting
   protected void registerWithRM()
       throws YarnException, IOException {
     RegisterNodeManagerResponse regNMResponse;
     Set<NodeLabel> nodeLabels = nodeLabelsHandler.getNodeLabelsForRegistration();
- 
+
     // Synchronize NM-RM registration with
     // ContainerManagerImpl#increaseContainersResource and
     // ContainerManagerImpl#startContainers to avoid race condition
@@ -358,6 +377,7 @@ public class NodeStatusUpdaterImpl extends AbstractService implements
           RegisterNodeManagerRequest.newInstance(nodeId, httpPort, totalResource,
               nodeManagerVersionId, containerReports, getRunningApplications(),
               nodeLabels, physicalResource);
+
       if (containerReports != null) {
         LOG.info("Registering with RM using containers :" + containerReports);
       }
@@ -406,7 +426,7 @@ public class NodeStatusUpdaterImpl extends AbstractService implements
     if (masterKey != null) {
       this.context.getContainerTokenSecretManager().setMasterKey(masterKey);
     }
-    
+
     masterKey = regNMResponse.getNMTokenMasterKey();
     if (masterKey != null) {
       this.context.getNMTokenSecretManager().setMasterKey(masterKey);
@@ -733,7 +753,7 @@ public class NodeStatusUpdaterImpl extends AbstractService implements
       }
     }
   }
-  
+
   @Override
   public long getRMIdentifier() {
     return this.rmIdentifier;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
index 8402a16..db0b225 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
@@ -51,6 +51,7 @@ public class PrivilegedOperation {
     TC_READ_STATS("--tc-read-stats"),
     ADD_PID_TO_CGROUP(""), //no CLI switch supported yet.
     RUN_DOCKER_CMD("--run-docker"),
+    GPU("--module-gpu"),
     LIST_AS_USER(""); //no CLI switch supported yet.
 
     private final String option;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerChain.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerChain.java
index 955d216..72bf30c 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerChain.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerChain.java
@@ -20,6 +20,7 @@
 
 package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources;
 
+import com.google.common.annotations.VisibleForTesting;
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceStability;
 import org.apache.hadoop.conf.Configuration;
@@ -135,7 +136,8 @@ public class ResourceHandlerChain implements ResourceHandler {
     return allOperations;
   }
 
-  List<ResourceHandler> getResourceHandlerList() {
+  @VisibleForTesting
+  public List<ResourceHandler> getResourceHandlerList() {
     return Collections.unmodifiableList(resourceHandlers);
   }
 
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java
index 3c61cd4..ce850ab 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java
@@ -21,25 +21,28 @@
 package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources;
 
 import com.google.common.annotations.VisibleForTesting;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceStability;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePlugin;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager;
 import org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler;
 import org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import java.io.File;
 import java.io.IOException;
-import java.util.Set;
-import java.util.HashSet;
-import java.util.Map;
-import java.util.HashMap;
-import java.util.Arrays;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
+import java.util.Set;
 
 /**
  * Provides mechanisms to get various resource handlers - cpu, memory, network,
@@ -206,22 +209,41 @@ public class ResourceHandlerModule {
   }
 
   private static void initializeConfiguredResourceHandlerChain(
-      Configuration conf) throws ResourceHandlerException {
+      Configuration conf, Context nmContext)
+      throws ResourceHandlerException {
     ArrayList<ResourceHandler> handlerList = new ArrayList<>();
 
     addHandlerIfNotNull(handlerList, getOutboundBandwidthResourceHandler(conf));
     addHandlerIfNotNull(handlerList, getDiskResourceHandler(conf));
     addHandlerIfNotNull(handlerList, getMemoryResourceHandler(conf));
     addHandlerIfNotNull(handlerList, getCGroupsCpuResourceHandler(conf));
+    addHandlersFromConfiguredResourcePlugins(handlerList, conf, nmContext);
     resourceHandlerChain = new ResourceHandlerChain(handlerList);
   }
 
+  private static void addHandlersFromConfiguredResourcePlugins(
+      List<ResourceHandler> handlerList, Configuration conf,
+      Context nmContext) throws ResourceHandlerException {
+    ResourcePluginManager pluginManager = nmContext.getResourcePluginManager();
+    if (pluginManager != null) {
+       Map<String, ResourcePlugin> pluginMap = pluginManager.getNameToPlugins();
+       if (pluginMap != null) {
+        for (ResourcePlugin plugin : pluginMap.values()) {
+          addHandlerIfNotNull(handlerList, plugin
+              .createResourceHandler(nmContext,
+                  getInitializedCGroupsHandler(conf),
+                  PrivilegedOperationExecutor.getInstance(conf)));
+        }
+      }
+    }
+  }
+
   public static ResourceHandlerChain getConfiguredResourceHandlerChain(
-      Configuration conf) throws ResourceHandlerException {
+      Configuration conf, Context nmContext) throws ResourceHandlerException {
     if (resourceHandlerChain == null) {
       synchronized (ResourceHandlerModule.class) {
         if (resourceHandlerChain == null) {
-          initializeConfiguredResourceHandlerChain(conf);
+          initializeConfiguredResourceHandlerChain(conf, nmContext);
         }
       }
     }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
new file mode 100644
index 0000000..d6bae09
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
@@ -0,0 +1,242 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Sets;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hadoop.yarn.api.records.ContainerId;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
+import org.apache.hadoop.yarn.exceptions.ResourceNotFoundException;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeMap;
+import java.util.TreeSet;
+
+import static org.apache.hadoop.yarn.api.records.ResourceInformation.GPU_URI;
+
+/**
+ * Allocate GPU resources according to requirements
+ */
+public class GpuResourceAllocator {
+  final static Log LOG = LogFactory.getLog(GpuResourceAllocator.class);
+
+  private Set<Integer> allowedGpuDevices = new TreeSet<>();
+  private Map<Integer, ContainerId> usedDevices = new TreeMap<>();
+  private Context nmContext;
+
+  public GpuResourceAllocator(Context ctx) {
+    this.nmContext = ctx;
+  }
+
+  /**
+   * Contains allowed and denied devices with minor number.
+   * Denied devices will be useful for cgroups devices module to do blacklisting
+   */
+  static class GpuAllocation {
+    private Set<Integer> allowed = Collections.emptySet();
+    private Set<Integer> denied = Collections.emptySet();
+
+    GpuAllocation(Set<Integer> allowed, Set<Integer> denied) {
+      if (allowed != null) {
+        this.allowed = ImmutableSet.copyOf(allowed);
+      }
+      if (denied != null) {
+        this.denied = ImmutableSet.copyOf(denied);
+      }
+    }
+
+    public Set<Integer> getAllowedGPUs() {
+      return allowed;
+    }
+
+    public Set<Integer> getDeniedGPUs() {
+      return denied;
+    }
+  }
+
+  /**
+   * Add GPU to allowed list
+   * @param minorNumber minor number of the GPU device.
+   */
+  public synchronized void addGpu(int minorNumber) {
+    allowedGpuDevices.add(minorNumber);
+  }
+
+  private String getResourceHandlerExceptionMessage(int numRequestedGpuDevices,
+      ContainerId containerId) {
+    return "Failed to find enough GPUs, requestor=" + containerId
+        + ", #RequestedGPUs=" + numRequestedGpuDevices + ", #availableGpus="
+        + getAvailableGpus();
+  }
+
+  @VisibleForTesting
+  public synchronized int getAvailableGpus() {
+    return allowedGpuDevices.size() - usedDevices.size();
+  }
+
+  public synchronized void recoverAssignedGpus(ContainerId containerId)
+      throws ResourceHandlerException {
+    Container c = nmContext.getContainers().get(containerId);
+    if (null == c) {
+      throw new ResourceHandlerException(
+          "This shouldn't happen, cannot find container with id="
+              + containerId);
+    }
+
+    for (Serializable deviceId : c.getResourceMappings().getAssignedResources(
+        GPU_URI)){
+      if (!(deviceId instanceof String)) {
+        throw new ResourceHandlerException(
+            "Trying to recover device id, however it"
+                + " is not String, this shouldn't happen");
+      }
+
+
+      int devId;
+      try {
+        devId = Integer.parseInt((String)deviceId);
+      } catch (NumberFormatException e) {
+        throw new ResourceHandlerException("Failed to recover device id because"
+            + "it is not a valid integer, devId:" + deviceId);
+      }
+
+      // Make sure it is in allowed GPU device.
+      if (!allowedGpuDevices.contains(devId)) {
+        throw new ResourceHandlerException("Try to recover device id = " + devId
+            + " however it is not in allowed device list:" + StringUtils
+            .join(",", allowedGpuDevices));
+      }
+
+      // Make sure it is not occupied by anybody else
+      if (usedDevices.containsKey(devId)) {
+        throw new ResourceHandlerException("Try to recover device id = " + devId
+            + " however it is already assigned to container=" + usedDevices
+            .get(devId) + ", please double check what happened.");
+      }
+
+      usedDevices.put(devId, containerId);
+    }
+  }
+
+  private int getRequestedGpus(Resource requestedResource) {
+    try {
+      return Long.valueOf(requestedResource.getResourceValue(
+          GPU_URI)).intValue();
+    } catch (ResourceNotFoundException e) {
+      return 0;
+    }
+  }
+
+  /**
+   * Assign GPU to requestor
+   * @param container container to allocate
+   * @return List of denied Gpus with minor numbers
+   * @throws ResourceHandlerException When failed to
+   */
+  public synchronized GpuAllocation assignGpus(Container container)
+      throws ResourceHandlerException {
+    Resource requestedResource = container.getResource();
+    ContainerId containerId = container.getContainerId();
+    int numRequestedGpuDevices = getRequestedGpus(requestedResource);
+    // Assign Gpus to container if requested some.
+    if (numRequestedGpuDevices > 0) {
+      if (numRequestedGpuDevices > getAvailableGpus()) {
+        throw new ResourceHandlerException(
+            getResourceHandlerExceptionMessage(numRequestedGpuDevices,
+                containerId));
+      }
+
+      Set<Integer> assignedGpus = new HashSet<>();
+
+      for (int deviceNum : allowedGpuDevices) {
+        if (!usedDevices.containsKey(deviceNum)) {
+          usedDevices.put(deviceNum, containerId);
+          assignedGpus.add(deviceNum);
+          if (assignedGpus.size() == numRequestedGpuDevices) {
+            break;
+          }
+        }
+      }
+
+      // Record in state store if we allocated anything
+      if (!assignedGpus.isEmpty()) {
+        List<Serializable> allocatedDevices = new ArrayList<>();
+        for (int gpu : assignedGpus) {
+          allocatedDevices.add(String.valueOf(gpu));
+        }
+        try {
+          // Update Container#getResourceMapping.
+          ResourceMappings.AssignedResources assignedResources =
+              new ResourceMappings.AssignedResources();
+          assignedResources.updateAssignedResources(allocatedDevices);
+          container.getResourceMappings().addAssignedResources(GPU_URI,
+              assignedResources);
+
+          // Update state store.
+          nmContext.getNMStateStore().storeAssignedResources(containerId,
+              GPU_URI, allocatedDevices);
+        } catch (IOException e) {
+          cleanupAssignGpus(containerId);
+          throw new ResourceHandlerException(e);
+        }
+      }
+
+      return new GpuAllocation(assignedGpus,
+          Sets.difference(allowedGpuDevices, assignedGpus));
+    }
+    return new GpuAllocation(null, allowedGpuDevices);
+  }
+
+  /**
+   * Clean up all Gpus assigned to containerId
+   * @param containerId containerId
+   */
+  public synchronized void cleanupAssignGpus(ContainerId containerId) {
+    Iterator<Map.Entry<Integer, ContainerId>> iter =
+        usedDevices.entrySet().iterator();
+    while (iter.hasNext()) {
+      if (iter.next().getValue().equals(containerId)) {
+        iter.remove();
+      }
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized Map<Integer, ContainerId> getDeviceAllocationMapping() {
+     return new HashMap<>(usedDevices);
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
new file mode 100644
index 0000000..7144bb2
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
@@ -0,0 +1,153 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hadoop.yarn.api.records.ContainerId;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
+import org.apache.hadoop.yarn.exceptions.ResourceNotFoundException;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperation;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+public class GpuResourceHandlerImpl implements ResourceHandler {
+  final static Log LOG = LogFactory
+      .getLog(GpuResourceHandlerImpl.class);
+
+  // This will be used by container-executor to add necessary clis
+  public static final String EXCLUDED_GPUS_CLI_OPTION = "--excluded_gpus";
+  public static final String CONTAINER_ID_CLI_OPTION = "--container_id";
+
+  private GpuResourceAllocator gpuAllocator;
+  private CGroupsHandler cGroupsHandler;
+  private PrivilegedOperationExecutor privilegedOperationExecutor;
+
+  public GpuResourceHandlerImpl(Context nmContext,
+      CGroupsHandler cGroupsHandler,
+      PrivilegedOperationExecutor privilegedOperationExecutor) {
+    this.cGroupsHandler = cGroupsHandler;
+    this.privilegedOperationExecutor = privilegedOperationExecutor;
+    gpuAllocator = new GpuResourceAllocator(nmContext);
+  }
+
+  @Override
+  public List<PrivilegedOperation> bootstrap(Configuration configuration)
+      throws ResourceHandlerException {
+    List<Integer> minorNumbersOfUsableGpus;
+    try {
+      minorNumbersOfUsableGpus = GpuDiscoverer.getInstance()
+          .getMinorNumbersOfGpusUsableByYarn();
+    } catch (YarnException e) {
+      LOG.error("Exception when trying to get usable GPU device", e);
+      throw new ResourceHandlerException(e);
+    }
+
+    for (int minorNumber : minorNumbersOfUsableGpus) {
+      gpuAllocator.addGpu(minorNumber);
+    }
+
+    // And initialize cgroups
+    this.cGroupsHandler.initializeCGroupController(
+        CGroupsHandler.CGroupController.DEVICES);
+
+    return null;
+  }
+
+  @Override
+  public synchronized List<PrivilegedOperation> preStart(Container container)
+      throws ResourceHandlerException {
+    String containerIdStr = container.getContainerId().toString();
+
+    // Assign Gpus to container if requested some.
+    GpuResourceAllocator.GpuAllocation allocation = gpuAllocator.assignGpus(
+        container);
+
+    // Create device cgroups for the container
+    cGroupsHandler.createCGroup(CGroupsHandler.CGroupController.DEVICES,
+        containerIdStr);
+    try {
+      // Execute c-e to setup GPU isolation before launch the container
+      PrivilegedOperation privilegedOperation = new PrivilegedOperation(
+          PrivilegedOperation.OperationType.GPU, Arrays
+          .asList(CONTAINER_ID_CLI_OPTION, containerIdStr));
+      if (!allocation.getDeniedGPUs().isEmpty()) {
+        privilegedOperation.appendArgs(Arrays.asList(EXCLUDED_GPUS_CLI_OPTION,
+            StringUtils.join(",", allocation.getDeniedGPUs())));
+      }
+
+      privilegedOperationExecutor.executePrivilegedOperation(
+          privilegedOperation, true);
+    } catch (PrivilegedOperationException e) {
+      cGroupsHandler.deleteCGroup(CGroupsHandler.CGroupController.DEVICES,
+          containerIdStr);
+      LOG.warn("Could not update cgroup for container", e);
+      throw new ResourceHandlerException(e);
+    }
+
+    List<PrivilegedOperation> ret = new ArrayList<>();
+    ret.add(new PrivilegedOperation(
+        PrivilegedOperation.OperationType.ADD_PID_TO_CGROUP,
+        PrivilegedOperation.CGROUP_ARG_PREFIX
+            + cGroupsHandler.getPathForCGroupTasks(
+            CGroupsHandler.CGroupController.DEVICES, containerIdStr)));
+
+    return ret;
+  }
+
+  @VisibleForTesting
+  public GpuResourceAllocator getGpuAllocator() {
+    return gpuAllocator;
+  }
+
+  @Override
+  public List<PrivilegedOperation> reacquireContainer(ContainerId containerId)
+      throws ResourceHandlerException {
+    gpuAllocator.recoverAssignedGpus(containerId);
+    return null;
+  }
+
+  @Override
+  public synchronized List<PrivilegedOperation> postComplete(
+      ContainerId containerId) throws ResourceHandlerException {
+    gpuAllocator.cleanupAssignGpus(containerId);
+    cGroupsHandler.deleteCGroup(CGroupsHandler.CGroupController.DEVICES,
+        containerId.toString());
+    return null;
+  }
+
+  @Override
+  public List<PrivilegedOperation> teardown() throws ResourceHandlerException {
+    return null;
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/NodeResourceUpdaterPlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/NodeResourceUpdaterPlugin.java
new file mode 100644
index 0000000..88f77ed
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/NodeResourceUpdaterPlugin.java
@@ -0,0 +1,52 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin;
+
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+
+/**
+ * Plugins to handle resources on a node. This will be used by
+ * {@link org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdater}
+ */
+public abstract class NodeResourceUpdaterPlugin {
+  /**
+   * Update configured resource for the given component.
+   * @param res resource passed in by external mododule (such as
+   *            {@link org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdater}
+   * @throws YarnException when any issue happens.
+   */
+  public abstract void updateConfiguredResource(Resource res)
+      throws YarnException;
+
+  /**
+   * This method will be called when the node's resource is loaded from
+   * dynamic-resources.xml in ResourceManager.
+   *
+   * @param newResource newResource reported by RM
+   * @throws YarnException when any mismatch between NM/RM
+   */
+  public void handleUpdatedResourceFromRM(Resource newResource) throws
+      YarnException {
+    // by default do nothing, subclass should implement this method when any
+    // special activities required upon new resource reported by RM.
+  }
+
+  // TODO: add implementation to update node attribute once YARN-3409 merged.
+}
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePlugin.java
new file mode 100644
index 0000000..6e134b3
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePlugin.java
@@ -0,0 +1,83 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin;
+
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain;
+
+/**
+ * {@link ResourcePlugin} is an interface for node manager to easier support
+ * discovery/manage/isolation for new resource types.
+ *
+ * <p>
+ * It has two major part: {@link ResourcePlugin#createResourceHandler(Context,
+ * CGroupsHandler, PrivilegedOperationExecutor)} and
+ * {@link ResourcePlugin#getNodeResourceHandlerInstance()}, see javadocs below
+ * for more details.
+ * </p>
+ */
+public interface ResourcePlugin {
+  /**
+   * Initialize the plugin, this will be invoked during NM startup.
+   * @param context NM Context
+   * @throws YarnException when any issue occurs
+   */
+  void initialize(Context context) throws YarnException;
+
+  /**
+   * Plugin needs to return {@link ResourceHandler} when any special isolation
+   * required for the resource type. This will be added to
+   * {@link ResourceHandlerChain} during NodeManager startup. When no special
+   * isolation need, return null.
+   *
+   * @param nmContext NodeManager context.
+   * @param cGroupsHandler CGroupsHandler
+   * @param privilegedOperationExecutor Privileged Operation Executor.
+   * @return ResourceHandler
+   */
+  ResourceHandler createResourceHandler(Context nmContext,
+      CGroupsHandler cGroupsHandler,
+      PrivilegedOperationExecutor privilegedOperationExecutor);
+
+  /**
+   * Plugin needs to return {@link NodeResourceUpdaterPlugin} when any discovery
+   * mechanism required for the resource type. For example, if we want to set
+   * resource-value during NM registration or send update during NM-RM heartbeat
+   * We can implement a {@link NodeResourceUpdaterPlugin} and update fields of
+   * {@link org.apache.hadoop.yarn.server.api.protocolrecords.NodeHeartbeatRequest}
+   * or {@link org.apache.hadoop.yarn.server.api.protocolrecords.RegisterNodeManagerRequest}
+   *
+   * This will be invoked during every node status update or node registration,
+   * please avoid creating new instance every time.
+   *
+   * @return NodeResourceUpdaterPlugin, could be null when no discovery needed.
+   */
+  NodeResourceUpdaterPlugin getNodeResourceHandlerInstance();
+
+  /**
+   * Do cleanup of the plugin, this will be invoked when
+   * {@link org.apache.hadoop.yarn.server.nodemanager.NodeManager} stops
+   * @throws YarnException if any issue occurs
+   */
+  void cleanup() throws YarnException;
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePluginManager.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePluginManager.java
new file mode 100644
index 0000000..73d6038
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePluginManager.java
@@ -0,0 +1,106 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin;
+
+import com.google.common.collect.ImmutableSet;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuResourcePlugin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+import static org.apache.hadoop.yarn.api.records.ResourceInformation.GPU_URI;
+
+/**
+ * Manages {@link ResourcePlugin} configured on this NodeManager.
+ */
+public class ResourcePluginManager {
+  private static final Logger LOG =
+      LoggerFactory.getLogger(ResourcePluginManager.class);
+  private static final Set<String> SUPPORTED_RESOURCE_PLUGINS = ImmutableSet.of(
+      GPU_URI);
+
+  private Map<String, ResourcePlugin> configuredPlugins = Collections.EMPTY_MAP;
+
+  public synchronized void initialize(Context context)
+      throws YarnException {
+    Configuration conf = context.getConf();
+    String[] plugins = conf.getStrings(YarnConfiguration.NM_RESOURCE_PLUGINS);
+
+    if (plugins != null) {
+      Map<String, ResourcePlugin> pluginMap = new HashMap<>();
+
+      // Initialize each plugins
+      for (String resourceName : plugins) {
+        resourceName = resourceName.trim();
+        if (!SUPPORTED_RESOURCE_PLUGINS.contains(resourceName)) {
+          String msg =
+              "Trying to initialize resource plugin with name=" + resourceName
+                  + ", it is not supported, list of supported plugins:"
+                  + StringUtils.join(",",
+                  SUPPORTED_RESOURCE_PLUGINS);
+          LOG.error(msg);
+          throw new YarnException(msg);
+        }
+
+        if (pluginMap.containsKey(resourceName)) {
+          // Duplicated items, ignore ...
+          continue;
+        }
+
+        ResourcePlugin plugin = null;
+        if (resourceName.equals(GPU_URI)) {
+          plugin = new GpuResourcePlugin();
+        }
+
+        if (plugin == null) {
+          throw new YarnException(
+              "This shouldn't happen, plugin=" + resourceName
+                  + " should be loaded and initialized");
+        }
+        plugin.initialize(context);
+        pluginMap.put(resourceName, plugin);
+      }
+
+      configuredPlugins = Collections.unmodifiableMap(pluginMap);
+    }
+  }
+
+  public synchronized void cleanup() throws YarnException {
+    for (ResourcePlugin plugin : configuredPlugins.values()) {
+      plugin.cleanup();
+    }
+  }
+
+  /**
+   * Get resource name (such as gpu/fpga) to plugin references.
+   * @return read-only map of resource name to plugins.
+   */
+  public synchronized Map<String, ResourcePlugin> getNameToPlugins() {
+    return configuredPlugins;
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java
new file mode 100644
index 0000000..61b8ce5
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java
@@ -0,0 +1,254 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableSet;
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.util.Shell;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.GpuDeviceInformation;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.GpuDeviceInformationParser;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.PerGpuDeviceInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+public class GpuDiscoverer {
+  public static final Logger LOG = LoggerFactory.getLogger(
+      GpuDiscoverer.class);
+  @VisibleForTesting
+  protected static final String DEFAULT_BINARY_NAME = "nvidia-smi";
+
+  // When executable path not set, try to search default dirs
+  // By default search /usr/bin, /bin, and /usr/local/nvidia/bin (when
+  // launched by nvidia-docker.
+  private static final Set<String> DEFAULT_BINARY_SEARCH_DIRS = ImmutableSet.of(
+      "/usr/bin", "/bin", "/usr/local/nvidia/bin");
+
+  // command should not run more than 10 sec.
+  private static final int MAX_EXEC_TIMEOUT_MS = 10 * 1000;
+  private static final int MAX_REPEATED_ERROR_ALLOWED = 10;
+  private static GpuDiscoverer instance;
+
+  static {
+    instance = new GpuDiscoverer();
+  }
+
+  private Configuration conf = null;
+  private String pathOfGpuBinary = null;
+  private Map<String, String> environment = new HashMap<>();
+  private GpuDeviceInformationParser parser = new GpuDeviceInformationParser();
+
+  private int numOfErrorExecutionSinceLastSucceed = 0;
+  GpuDeviceInformation lastDiscoveredGpuInformation = null;
+
+  private void validateConfOrThrowException() throws YarnException {
+    if (conf == null) {
+      throw new YarnException("Please initialize (call initialize) before use "
+          + GpuDiscoverer.class.getSimpleName());
+    }
+  }
+
+  /**
+   * Get GPU device information from system.
+   * This need to be called after initialize.
+   *
+   * Please note that this only works on *NIX platform, so external caller
+   * need to make sure this.
+   *
+   * @return GpuDeviceInformation
+   * @throws YarnException when any error happens
+   */
+  public synchronized GpuDeviceInformation getGpuDeviceInformation()
+      throws YarnException {
+    validateConfOrThrowException();
+
+    if (null == pathOfGpuBinary) {
+      throw new YarnException(
+          "Failed to find GPU discovery executable, please double check "
+              + YarnConfiguration.NM_GPU_PATH_TO_EXEC + " setting.");
+    }
+
+    if (numOfErrorExecutionSinceLastSucceed == MAX_REPEATED_ERROR_ALLOWED) {
+      String msg =
+          "Failed to execute GPU device information detection script for "
+              + MAX_REPEATED_ERROR_ALLOWED
+              + " times, skip following executions.";
+      LOG.error(msg);
+      throw new YarnException(msg);
+    }
+
+    String output;
+    try {
+      output = Shell.execCommand(environment,
+          new String[] { pathOfGpuBinary, "-x", "-q" }, MAX_EXEC_TIMEOUT_MS);
+      GpuDeviceInformation info = parser.parseXml(output);
+      numOfErrorExecutionSinceLastSucceed = 0;
+      lastDiscoveredGpuInformation = info;
+      return info;
+    } catch (IOException e) {
+      numOfErrorExecutionSinceLastSucceed++;
+      String msg =
+          "Failed to execute " + pathOfGpuBinary + " exception message:" + e
+              .getMessage() + ", continue ...";
+      if (LOG.isDebugEnabled()) {
+        LOG.debug(msg);
+      }
+      throw new YarnException(e);
+    } catch (YarnException e) {
+      numOfErrorExecutionSinceLastSucceed++;
+      String msg = "Failed to parse xml output" + e.getMessage();
+      if (LOG.isDebugEnabled()) {
+        LOG.warn(msg, e);
+      }
+      throw e;
+    }
+  }
+
+  /**
+   * Get list of minor device numbers of Gpu devices usable by YARN.
+   *
+   * @return List of minor device numbers of Gpu devices.
+   * @throws YarnException when any issue happens
+   */
+  public synchronized List<Integer> getMinorNumbersOfGpusUsableByYarn()
+      throws YarnException {
+    validateConfOrThrowException();
+
+    String allowedDevicesStr = conf.get(
+        YarnConfiguration.NM_GPU_ALLOWED_DEVICES,
+        YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES);
+
+    List<Integer> minorNumbers = new ArrayList<>();
+
+    if (allowedDevicesStr.equals(
+        YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
+      // Get gpu device information from system.
+      if (null == lastDiscoveredGpuInformation) {
+        String msg = YarnConfiguration.NM_GPU_ALLOWED_DEVICES + " is set to "
+            + YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES
+            + ", however automatically discovering "
+            + "GPU information failed, please check NodeManager log for more"
+            + " details, as an alternative, admin can specify "
+            + YarnConfiguration.NM_GPU_ALLOWED_DEVICES
+            + " manually to enable GPU isolation.";
+        LOG.error(msg);
+        throw new YarnException(msg);
+      }
+
+      if (lastDiscoveredGpuInformation.getGpus() != null) {
+        for (PerGpuDeviceInformation gpu : lastDiscoveredGpuInformation
+            .getGpus()) {
+          minorNumbers.add(gpu.getMinorNumber());
+        }
+      }
+    } else{
+      for (String s : allowedDevicesStr.split(",")) {
+        if (s.trim().length() > 0) {
+          minorNumbers.add(Integer.valueOf(s.trim()));
+        }
+      }
+      LOG.info("Allowed GPU devices with minor numbers:" + allowedDevicesStr);
+    }
+
+    return minorNumbers;
+  }
+
+  public synchronized void initialize(Configuration conf) throws YarnException {
+    this.conf = conf;
+    numOfErrorExecutionSinceLastSucceed = 0;
+    String pathToExecutable = conf.get(YarnConfiguration.NM_GPU_PATH_TO_EXEC,
+        YarnConfiguration.DEFAULT_NM_GPU_PATH_TO_EXEC);
+    if (pathToExecutable.isEmpty()) {
+      pathToExecutable = DEFAULT_BINARY_NAME;
+    }
+
+    // Validate file existence
+    File binaryPath = new File(pathToExecutable);
+
+    if (!binaryPath.exists()) {
+      // When binary not exist, use default setting.
+      boolean found = false;
+      for (String dir : DEFAULT_BINARY_SEARCH_DIRS) {
+        binaryPath = new File(dir, DEFAULT_BINARY_NAME);
+        if (binaryPath.exists()) {
+          found = true;
+          pathOfGpuBinary = binaryPath.getAbsolutePath();
+          break;
+        }
+      }
+
+      if (!found) {
+        LOG.warn("Failed to locate binary at:" + binaryPath.getAbsolutePath()
+            + ", please double check [" + YarnConfiguration.NM_GPU_PATH_TO_EXEC
+            + "] setting. Now use " + "default binary:" + DEFAULT_BINARY_NAME);
+      }
+    } else{
+      // If path specified by user is a directory, use
+      if (binaryPath.isDirectory()) {
+        binaryPath = new File(binaryPath, DEFAULT_BINARY_NAME);
+        LOG.warn("Specified path is a directory, use " + DEFAULT_BINARY_NAME
+            + " under the directory, updated path-to-executable:" + binaryPath
+            .getAbsolutePath());
+      }
+      // Validated
+      pathOfGpuBinary = binaryPath.getAbsolutePath();
+    }
+
+    // Try to discover GPU information once and print
+    try {
+      LOG.info("Trying to discover GPU information ...");
+      GpuDeviceInformation info = getGpuDeviceInformation();
+      LOG.info(info.toString());
+    } catch (YarnException e) {
+      String msg =
+          "Failed to discover GPU information from system, exception message:"
+              + e.getMessage() + " continue...";
+      LOG.warn(msg);
+    }
+  }
+
+  @VisibleForTesting
+  protected Map<String, String> getEnvironmentToRunCommand() {
+    return environment;
+  }
+
+  @VisibleForTesting
+  protected String getPathOfGpuBinary() {
+    return pathOfGpuBinary;
+  }
+
+  public static GpuDiscoverer getInstance() {
+    return instance;
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuNodeResourceUpdateHandler.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuNodeResourceUpdateHandler.java
new file mode 100644
index 0000000..f6bf506
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuNodeResourceUpdateHandler.java
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu;
+
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.NodeResourceUpdaterPlugin;
+import org.apache.hadoop.yarn.util.resource.ResourceUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.yarn.api.records.ResourceInformation.GPU_URI;
+
+public class GpuNodeResourceUpdateHandler extends NodeResourceUpdaterPlugin {
+  private static final Logger LOG =
+      LoggerFactory.getLogger(GpuNodeResourceUpdateHandler.class);
+
+  @Override
+  public void updateConfiguredResource(Resource res) throws YarnException {
+    LOG.info("Initializing configured GPU resources for the NodeManager.");
+
+    List<Integer> usableGpus =
+        GpuDiscoverer.getInstance().getMinorNumbersOfGpusUsableByYarn();
+    if (null == usableGpus || usableGpus.isEmpty()) {
+      LOG.info("Didn't find any usable GPUs on the NodeManager.");
+      // No gpu can be used by YARN.
+      return;
+    }
+
+    long nUsableGpus = usableGpus.size();
+
+    Map<String, ResourceInformation> configuredResourceTypes =
+        ResourceUtils.getResourceTypes();
+    if (!configuredResourceTypes.containsKey(GPU_URI)) {
+      throw new YarnException("Found " + nUsableGpus + " usable GPUs, however "
+          + GPU_URI
+          + " resource-type is not configured inside"
+          + " resource-types.xml, please configure it to enable GPU feature or"
+          + " remove " + GPU_URI + " from "
+          + YarnConfiguration.NM_RESOURCE_PLUGINS);
+    }
+
+    res.setResourceValue(GPU_URI, nUsableGpus);
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuResourcePlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuResourcePlugin.java
new file mode 100644
index 0000000..9576ce7
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuResourcePlugin.java
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu;
+
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.NodeResourceUpdaterPlugin;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePlugin;
+
+public class GpuResourcePlugin implements ResourcePlugin {
+  private ResourceHandler gpuResourceHandler = null;
+  private GpuNodeResourceUpdateHandler resourceDiscoverHandler = null;
+
+  @Override
+  public synchronized void initialize(Context context) throws YarnException {
+    resourceDiscoverHandler = new GpuNodeResourceUpdateHandler();
+    GpuDiscoverer.getInstance().initialize(context.getConf());
+  }
+
+  @Override
+  public synchronized ResourceHandler createResourceHandler(
+      Context context, CGroupsHandler cGroupsHandler,
+      PrivilegedOperationExecutor privilegedOperationExecutor) {
+    if (gpuResourceHandler == null) {
+      gpuResourceHandler = new GpuResourceHandlerImpl(context, cGroupsHandler,
+          privilegedOperationExecutor);
+    }
+
+    return gpuResourceHandler;
+  }
+
+  @Override
+  public synchronized NodeResourceUpdaterPlugin getNodeResourceHandlerInstance() {
+    return resourceDiscoverHandler;
+  }
+
+  @Override
+  public void cleanup() throws YarnException {
+    // Do nothing.
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformation.java
new file mode 100644
index 0000000..977032a
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformation.java
@@ -0,0 +1,72 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+import javax.xml.bind.annotation.XmlRootElement;
+import java.util.List;
+
+/**
+ * All GPU Device Information in the system.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+@XmlRootElement(name = "nvidia_smi_log")
+public class GpuDeviceInformation {
+  List<PerGpuDeviceInformation> gpus;
+
+  String driverVersion = "N/A";
+
+  // More fields like topology information could be added when needed.
+  // ...
+
+  @javax.xml.bind.annotation.XmlElement(name = "gpu")
+  public List<PerGpuDeviceInformation> getGpus() {
+    return gpus;
+  }
+
+  public void setGpus(List<PerGpuDeviceInformation> gpus) {
+    this.gpus = gpus;
+  }
+
+  @javax.xml.bind.annotation.XmlElement(name = "driver_version")
+  public String getDriverVersion() {
+    return driverVersion;
+  }
+
+  public void setDriverVersion(String driverVersion) {
+    this.driverVersion = driverVersion;
+  }
+
+  @Override
+  public String toString() {
+    StringBuilder sb = new StringBuilder();
+    sb.append("=== Gpus in the system ===\n").append("\tDriver Version:").append(
+        getDriverVersion()).append("\n");
+
+    if (gpus != null) {
+      for (PerGpuDeviceInformation gpu : gpus) {
+        sb.append("\t").append(gpu.toString()).append("\n");
+      }
+    }
+    return sb.toString();
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformationParser.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformationParser.java
new file mode 100644
index 0000000..1bd92f6
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformationParser.java
@@ -0,0 +1,87 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.SAXException;
+import org.xml.sax.XMLReader;
+
+import javax.xml.bind.JAXBContext;
+import javax.xml.bind.JAXBException;
+import javax.xml.bind.Unmarshaller;
+import javax.xml.parsers.ParserConfigurationException;
+import javax.xml.parsers.SAXParserFactory;
+import javax.xml.transform.sax.SAXSource;
+import java.io.StringReader;
+
+/**
+ * Parse XML and get GPU device information
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+public class GpuDeviceInformationParser {
+  private static final Logger LOG = LoggerFactory.getLogger(
+      GpuDeviceInformationParser.class);
+
+  private Unmarshaller unmarshaller = null;
+  private XMLReader xmlReader = null;
+
+  private void init()
+      throws SAXException, ParserConfigurationException, JAXBException {
+    SAXParserFactory spf = SAXParserFactory.newInstance();
+    // Disable external-dtd since by default nvidia-smi output contains
+    // <!DOCTYPE nvidia_smi_log SYSTEM "nvsmi_device_v8.dtd"> in header
+    spf.setFeature(
+        "http://apache.org/xml/features/nonvalidating/load-external-dtd",
+        false);
+    spf.setFeature("http://xml.org/sax/features/validation", false);
+
+    JAXBContext jaxbContext = JAXBContext.newInstance(
+        GpuDeviceInformation.class);
+
+    this.xmlReader = spf.newSAXParser().getXMLReader();
+    this.unmarshaller = jaxbContext.createUnmarshaller();
+  }
+
+  public synchronized GpuDeviceInformation parseXml(String xmlContent)
+      throws YarnException {
+    if (unmarshaller == null) {
+      try {
+        init();
+      } catch (SAXException | ParserConfigurationException | JAXBException e) {
+        LOG.error("Exception while initialize parser", e);
+        throw new YarnException(e);
+      }
+    }
+
+    InputSource inputSource = new InputSource(new StringReader(xmlContent));
+    SAXSource source = new SAXSource(xmlReader, inputSource);
+    try {
+      return (GpuDeviceInformation) unmarshaller.unmarshal(source);
+    } catch (JAXBException e) {
+      LOG.error("Exception while parsing xml", e);
+      throw new YarnException(e);
+    }
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java
new file mode 100644
index 0000000..f315313
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java
@@ -0,0 +1,165 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+import javax.xml.bind.annotation.XmlElement;
+import javax.xml.bind.annotation.XmlRootElement;
+import javax.xml.bind.annotation.adapters.XmlAdapter;
+
+/**
+ * Capture single GPU device information such as memory size, temperature,
+ * utilization.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+@XmlRootElement(name = "gpu")
+public class PerGpuDeviceInformation {
+
+  private String productName = "N/A";
+  private String uuid = "N/A";
+  private int minorNumber = -1;
+
+  private PerGpuUtilizations gpuUtilizations;
+  private PerGpuMemoryUsage gpuMemoryUsage;
+  private PerGpuTemperature temperature;
+
+  /**
+   * Convert formats like "34 C", "75.6 %" to float.
+   */
+  @InterfaceAudience.Private
+  @InterfaceStability.Unstable
+  static class StrToFloatBeforeSpaceAdapter extends
+      XmlAdapter<String, Float> {
+    @Override
+    public String marshal(Float v) throws Exception {
+      if (v == null) {
+        return "";
+      }
+      return String.valueOf(v);
+    }
+
+    @Override
+    public Float unmarshal(String v) throws Exception {
+      if (v == null) {
+        return -1f;
+      }
+
+      return Float.valueOf(v.split(" ")[0]);
+    }
+  }
+
+  /**
+   * Convert formats like "725 MiB" to long.
+   */
+  @InterfaceAudience.Private
+  @InterfaceStability.Unstable
+  static class StrToMemAdapter extends XmlAdapter<String, Long> {
+    @Override
+    public String marshal(Long v) throws Exception {
+      if (v == null) {
+        return "";
+      }
+      return String.valueOf(v) + " MiB";
+    }
+
+    @Override
+    public Long unmarshal(String v) throws Exception {
+      if (v == null) {
+        return -1L;
+      }
+      return Long.valueOf(v.split(" ")[0]);
+    }
+  }
+
+  @XmlElement(name = "temperature")
+  public PerGpuTemperature getTemperature() {
+    return temperature;
+  }
+
+  public void setTemperature(PerGpuTemperature temperature) {
+    this.temperature = temperature;
+  }
+
+  @XmlElement(name = "uuid")
+  public String getUuid() {
+    return uuid;
+  }
+
+  public void setUuid(String uuid) {
+    this.uuid = uuid;
+  }
+
+  @XmlElement(name = "product_name")
+  public String getProductName() {
+    return productName;
+  }
+
+  public void setProductName(String productName) {
+    this.productName = productName;
+  }
+
+  @XmlElement(name = "minor_number")
+  public int getMinorNumber() {
+    return minorNumber;
+  }
+
+  public void setMinorNumber(int minorNumber) {
+    this.minorNumber = minorNumber;
+  }
+
+  @XmlElement(name = "utilization")
+  public PerGpuUtilizations getGpuUtilizations() {
+    return gpuUtilizations;
+  }
+
+  public void setGpuUtilizations(PerGpuUtilizations utilizations) {
+    this.gpuUtilizations = utilizations;
+  }
+
+  @XmlElement(name = "bar1_memory_usage")
+  public PerGpuMemoryUsage getGpuMemoryUsage() {
+    return gpuMemoryUsage;
+  }
+
+  public void setGpuMemoryUsage(PerGpuMemoryUsage gpuMemoryUsage) {
+    this.gpuMemoryUsage = gpuMemoryUsage;
+  }
+
+
+  @Override
+  public String toString() {
+    StringBuilder sb = new StringBuilder();
+    sb.append("ProductName=").append(productName).append(", MinorNumber=")
+        .append(minorNumber);
+
+    if (getGpuMemoryUsage() != null) {
+      sb.append(", TotalMemory=").append(
+          getGpuMemoryUsage().getTotalMemoryMiB()).append("MiB");
+    }
+
+    if (getGpuUtilizations() != null) {
+      sb.append(", Utilization=").append(
+          getGpuUtilizations().getOverallGpuUtilization()).append("%");
+    }
+    return sb.toString();
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMemoryUsage.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMemoryUsage.java
new file mode 100644
index 0000000..3964c4e
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMemoryUsage.java
@@ -0,0 +1,58 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+import javax.xml.bind.annotation.XmlElement;
+import javax.xml.bind.annotation.XmlRootElement;
+import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter;
+
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+@XmlRootElement(name = "bar1_memory_usage")
+public class PerGpuMemoryUsage {
+  long usedMemoryMiB = -1L;
+  long availMemoryMiB = -1L;
+
+  @XmlJavaTypeAdapter(PerGpuDeviceInformation.StrToMemAdapter.class)
+  @XmlElement(name = "used")
+  public Long getUsedMemoryMiB() {
+    return usedMemoryMiB;
+  }
+
+  public void setUsedMemoryMiB(Long usedMemoryMiB) {
+    this.usedMemoryMiB = usedMemoryMiB;
+  }
+
+  @XmlJavaTypeAdapter(PerGpuDeviceInformation.StrToMemAdapter.class)
+  @XmlElement(name = "free")
+  public Long getAvailMemoryMiB() {
+    return availMemoryMiB;
+  }
+
+  public void setAvailMemoryMiB(Long availMemoryMiB) {
+    this.availMemoryMiB = availMemoryMiB;
+  }
+
+  public long getTotalMemoryMiB() {
+    return usedMemoryMiB + availMemoryMiB;
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuTemperature.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuTemperature.java
new file mode 100644
index 0000000..ccd60cb
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuTemperature.java
@@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+import javax.xml.bind.annotation.XmlElement;
+import javax.xml.bind.annotation.XmlRootElement;
+import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter;
+
+/**
+ * Temperature of GPU
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+@XmlRootElement(name = "temperature")
+public class PerGpuTemperature {
+  private float currentGpuTemp = Float.MIN_VALUE;
+  private float maxGpuTemp = Float.MIN_VALUE;
+  private float slowThresholdGpuTemp = Float.MIN_VALUE;
+
+  /**
+   * Get current celsius GPU temperature
+   * @return temperature
+   */
+  @XmlJavaTypeAdapter(PerGpuDeviceInformation.StrToFloatBeforeSpaceAdapter.class)
+  @XmlElement(name = "gpu_temp")
+  public Float getCurrentGpuTemp() {
+    return currentGpuTemp;
+  }
+
+  public void setCurrentGpuTemp(Float currentGpuTemp) {
+    this.currentGpuTemp = currentGpuTemp;
+  }
+
+  /**
+   * Get max possible celsius GPU temperature
+   * @return temperature
+   */
+  @XmlJavaTypeAdapter(PerGpuDeviceInformation.StrToFloatBeforeSpaceAdapter.class)
+  @XmlElement(name = "gpu_temp_max_threshold")
+  public Float getMaxGpuTemp() {
+    return maxGpuTemp;
+  }
+
+  public void setMaxGpuTemp(Float maxGpuTemp) {
+    this.maxGpuTemp = maxGpuTemp;
+  }
+
+  /**
+   * Get celsius GPU temperature which could make GPU runs slower
+   * @return temperature
+   */
+  @XmlJavaTypeAdapter(PerGpuDeviceInformation.StrToFloatBeforeSpaceAdapter.class)
+  @XmlElement(name = "gpu_temp_slow_threshold")
+  public Float getSlowThresholdGpuTemp() {
+    return slowThresholdGpuTemp;
+  }
+
+  public void setSlowThresholdGpuTemp(Float slowThresholdGpuTemp) {
+    this.slowThresholdGpuTemp = slowThresholdGpuTemp;
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuUtilizations.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuUtilizations.java
new file mode 100644
index 0000000..4ef218b
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuUtilizations.java
@@ -0,0 +1,50 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+import javax.xml.bind.annotation.XmlElement;
+import javax.xml.bind.annotation.XmlRootElement;
+import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter;
+
+/**
+ * GPU utilizations
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+@XmlRootElement(name = "utilization")
+public class PerGpuUtilizations {
+  private float overallGpuUtilization;
+
+  /**
+   * Overall percent GPU utilization
+   * @return utilization
+   */
+  @XmlJavaTypeAdapter(PerGpuDeviceInformation.StrToFloatBeforeSpaceAdapter.class)
+  @XmlElement(name = "gpu_util")
+  public Float getOverallGpuUtilization() {
+    return overallGpuUtilization;
+  }
+
+  public void setOverallGpuUtilization(Float overallGpuUtilization) {
+    this.overallGpuUtilization = overallGpuUtilization;
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerTestBase.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerTestBase.java
new file mode 100644
index 0000000..13b3ee9
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerTestBase.java
@@ -0,0 +1,164 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.metrics2.lib.DefaultMetricsSystem;
+import org.apache.hadoop.net.ServerSocketUtil;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.event.Dispatcher;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.factories.RecordFactory;
+import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
+import org.apache.hadoop.yarn.server.api.ResourceTracker;
+import org.apache.hadoop.yarn.server.api.protocolrecords.NodeHeartbeatRequest;
+import org.apache.hadoop.yarn.server.api.protocolrecords.NodeHeartbeatResponse;
+import org.apache.hadoop.yarn.server.api.protocolrecords.RegisterNodeManagerRequest;
+import org.apache.hadoop.yarn.server.api.protocolrecords.RegisterNodeManagerResponse;
+import org.apache.hadoop.yarn.server.api.protocolrecords.UnRegisterNodeManagerRequest;
+import org.apache.hadoop.yarn.server.api.protocolrecords.UnRegisterNodeManagerResponse;
+import org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.NodeHeartbeatResponsePBImpl;
+import org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.RegisterNodeManagerResponsePBImpl;
+import org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.UnRegisterNodeManagerResponsePBImpl;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl;
+import org.apache.hadoop.yarn.server.nodemanager.metrics.NodeManagerMetrics;
+import org.junit.Assert;
+import org.junit.Before;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+
+public class NodeManagerTestBase {
+  // temp fix until metrics system can auto-detect itself running in unit test:
+  static {
+    DefaultMetricsSystem.setMiniClusterMode(true);
+  }
+
+  protected static final Logger LOG =
+      LoggerFactory.getLogger(TestNodeStatusUpdater.class);
+  protected static final File basedir =
+      new File("target", TestNodeStatusUpdater.class.getName());
+  protected static final File nmLocalDir = new File(basedir, "nm0");
+  protected static final File tmpDir = new File(basedir, "tmpDir");
+  protected static final File remoteLogsDir = new File(basedir, "remotelogs");
+  protected static final File logsDir = new File(basedir, "logs");
+  protected static final RecordFactory recordFactory = RecordFactoryProvider
+      .getRecordFactory(null);
+  protected Configuration conf;
+
+  protected YarnConfiguration createNMConfig() throws IOException {
+    return createNMConfig(ServerSocketUtil.getPort(49170, 10));
+  }
+
+  protected YarnConfiguration createNMConfig(int port) throws IOException {
+    YarnConfiguration conf = new YarnConfiguration();
+    String localhostAddress = null;
+    try {
+      localhostAddress = InetAddress.getByName("localhost")
+          .getCanonicalHostName();
+    } catch (UnknownHostException e) {
+      Assert.fail("Unable to get localhost address: " + e.getMessage());
+    }
+    conf.setInt(YarnConfiguration.NM_PMEM_MB, 5 * 1024); // 5GB
+    conf.set(YarnConfiguration.NM_ADDRESS, localhostAddress + ":" + port);
+    conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, localhostAddress + ":"
+        + ServerSocketUtil.getPort(49160, 10));
+    conf.set(YarnConfiguration.NM_LOG_DIRS, logsDir.getAbsolutePath());
+    conf.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
+        remoteLogsDir.getAbsolutePath());
+    conf.set(YarnConfiguration.NM_LOCAL_DIRS, nmLocalDir.getAbsolutePath());
+    conf.setLong(YarnConfiguration.NM_LOG_RETAIN_SECONDS, 1);
+    return conf;
+  }
+
+  public static class BaseResourceTrackerForTest implements ResourceTracker {
+    @Override
+    public RegisterNodeManagerResponse registerNodeManager(
+        RegisterNodeManagerRequest request) throws YarnException, IOException {
+      return new RegisterNodeManagerResponsePBImpl();
+    }
+
+    @Override
+    public NodeHeartbeatResponse nodeHeartbeat(NodeHeartbeatRequest request)
+        throws YarnException, IOException {
+      return new NodeHeartbeatResponsePBImpl();
+    }
+
+    @Override
+    public UnRegisterNodeManagerResponse unRegisterNodeManager(
+        UnRegisterNodeManagerRequest request)
+        throws YarnException, IOException {
+      return new UnRegisterNodeManagerResponsePBImpl();
+    }
+  }
+
+  protected static class BaseNodeStatusUpdaterForTest extends NodeStatusUpdaterImpl {
+    public ResourceTracker resourceTracker;
+    protected Context context;
+
+    public BaseNodeStatusUpdaterForTest(Context context, Dispatcher dispatcher,
+        NodeHealthCheckerService healthChecker, NodeManagerMetrics metrics,
+        ResourceTracker resourceTracker) {
+      super(context, dispatcher, healthChecker, metrics);
+      this.context = context;
+      this.resourceTracker = resourceTracker;
+    }
+    @Override
+    protected ResourceTracker getRMClient() {
+      return resourceTracker;
+    }
+
+    @Override
+    protected void stopRMProxy() {
+      return;
+    }
+  }
+
+  public class MyContainerManager extends ContainerManagerImpl {
+    public boolean signaled = false;
+
+    public MyContainerManager(Context context, ContainerExecutor exec,
+        DeletionService deletionContext, NodeStatusUpdater nodeStatusUpdater,
+        NodeManagerMetrics metrics,
+        LocalDirsHandlerService dirsHandler) {
+      super(context, exec, deletionContext, nodeStatusUpdater,
+          metrics, dirsHandler);
+    }
+
+    @Override
+    public void handle(ContainerManagerEvent event) {
+      if (event.getType() == ContainerManagerEventType.SIGNAL_CONTAINERS) {
+        signaled = true;
+      }
+    }
+  }
+
+  @Before
+  public void setUp() throws IOException {
+    nmLocalDir.mkdirs();
+    tmpDir.mkdirs();
+    logsDir.mkdirs();
+    remoteLogsDir.mkdirs();
+    conf = createNMConfig();
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
index 2e9eff5..9b180c7 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
@@ -178,7 +178,7 @@ public class TestDefaultContainerExecutor {
     FileContext lfs = FileContext.getLocalFSFileContext(conf);
     DefaultContainerExecutor executor = new DefaultContainerExecutor(lfs);
     executor.setConf(conf);
-    executor.init();
+    executor.init(null);
 
     try {
       executor.createUserLocalDirs(localDirs, user);
@@ -317,7 +317,7 @@ public class TestDefaultContainerExecutor {
       Path workDir = localDir;
       Path pidFile = new Path(workDir, "pid.txt");
 
-      mockExec.init();
+      mockExec.init(null);
       mockExec.activateContainer(cId, pidFile);
       int ret = mockExec.launchContainer(new ContainerStartContext.Builder()
           .setContainer(container)
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
index f1194c9..7e1752b 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
@@ -116,7 +116,7 @@ public class TestDockerContainerExecutorWithMocks {
   public void testContainerInitSecure() throws IOException {
     dockerContainerExecutor.getConf().set(
       CommonConfigurationKeys.HADOOP_SECURITY_AUTHENTICATION, "kerberos");
-    dockerContainerExecutor.init();
+    dockerContainerExecutor.init(mock(Context.class));
   }
 
   @Test(expected = IllegalArgumentException.class)
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
index cf8d977..95c8f5e 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
@@ -628,7 +628,7 @@ public class TestLinuxContainerExecutor {
     LinuxContainerExecutor lce = new LinuxContainerExecutor();
     lce.setConf(conf);
     try {
-      lce.init();
+      lce.init(null);
     } catch (IOException e) {
       // expected if LCE isn't setup right, but not necessary for this test
     }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
index 79b88cf..249e017 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
@@ -426,7 +426,7 @@ public class TestLinuxContainerExecutorWithMocks {
   @Test
   public void testInit() throws Exception {
 
-    mockExec.init();
+    mockExec.init(mock(Context.class));
     assertEquals(Arrays.asList("--checksetup"), readMockParams());
     
   }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManager.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManager.java
index 9279711..b31215b 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManager.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManager.java
@@ -37,7 +37,7 @@ public class TestNodeManager {
   public static final class InvalidContainerExecutor extends
       DefaultContainerExecutor {
     @Override
-    public void init() throws IOException {
+    public void init(Context nmContext) throws IOException {
       throw new IOException("dummy executor init called");
     }
   }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
index 055dab4..533cf2a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
@@ -20,16 +20,14 @@ package org.apache.hadoop.yarn.server.nodemanager;
 
 import static org.apache.hadoop.yarn.server.utils.YarnServerBuilderUtils.newNodeHeartbeatResponse;
 import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
 
 import java.io.EOFException;
 import java.io.File;
 import java.io.IOException;
 import java.net.InetAddress;
 import java.net.InetSocketAddress;
-import java.net.UnknownHostException;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.Collections;
@@ -80,8 +78,6 @@ import org.apache.hadoop.yarn.event.Dispatcher;
 import org.apache.hadoop.yarn.event.EventHandler;
 import org.apache.hadoop.yarn.exceptions.YarnException;
 import org.apache.hadoop.yarn.exceptions.YarnRuntimeException;
-import org.apache.hadoop.yarn.factories.RecordFactory;
-import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
 import org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos.NodeHeartbeatResponseProto;
 import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
 import org.apache.hadoop.yarn.server.api.ResourceTracker;
@@ -117,41 +113,14 @@ import org.junit.Before;
 import org.junit.Test;
 
 @SuppressWarnings("rawtypes")
-public class TestNodeStatusUpdater {
-
-  // temp fix until metrics system can auto-detect itself running in unit test:
-  static {
-    DefaultMetricsSystem.setMiniClusterMode(true);
-  }
-
-  static final Logger LOG =
-       LoggerFactory.getLogger(TestNodeStatusUpdater.class);
-  static final File basedir =
-      new File("target", TestNodeStatusUpdater.class.getName());
-  static final File nmLocalDir = new File(basedir, "nm0");
-  static final File tmpDir = new File(basedir, "tmpDir");
-  static final File remoteLogsDir = new File(basedir, "remotelogs");
-  static final File logsDir = new File(basedir, "logs");
-  private static final RecordFactory recordFactory = RecordFactoryProvider
-      .getRecordFactory(null);
-
+public class TestNodeStatusUpdater extends NodeManagerTestBase {
   volatile int heartBeatID = 0;
   volatile Throwable nmStartError = null;
   private final List<NodeId> registeredNodes = new ArrayList<NodeId>();
   private boolean triggered = false;
-  private Configuration conf;
   private NodeManager nm;
   private AtomicBoolean assertionFailedInThread = new AtomicBoolean(false);
 
-  @Before
-  public void setUp() throws IOException {
-    nmLocalDir.mkdirs();
-    tmpDir.mkdirs();
-    logsDir.mkdirs();
-    remoteLogsDir.mkdirs();
-    conf = createNMConfig();
-  }
-
   @After
   public void tearDown() {
     this.registeredNodes.clear();
@@ -332,29 +301,7 @@ public class TestNodeStatusUpdater {
     }
   }
 
-  private class MyContainerManager extends ContainerManagerImpl {
-    public boolean signaled = false;
-
-    public MyContainerManager(Context context, ContainerExecutor exec,
-        DeletionService deletionContext, NodeStatusUpdater nodeStatusUpdater,
-        NodeManagerMetrics metrics,
-        LocalDirsHandlerService dirsHandler) {
-      super(context, exec, deletionContext, nodeStatusUpdater,
-          metrics, dirsHandler);
-    }
-
-    @Override
-    public void handle(ContainerManagerEvent event) {
-      if (event.getType() == ContainerManagerEventType.SIGNAL_CONTAINERS) {
-        signaled = true;
-      }
-    }
-  }
-
-  private class MyNodeStatusUpdater extends NodeStatusUpdaterImpl {
-    public ResourceTracker resourceTracker;
-    private Context context;
-
+  private class MyNodeStatusUpdater extends BaseNodeStatusUpdaterForTest {
     public MyNodeStatusUpdater(Context context, Dispatcher dispatcher,
         NodeHealthCheckerService healthChecker, NodeManagerMetrics metrics) {
       this(context, dispatcher, healthChecker, metrics, false);
@@ -363,19 +310,8 @@ public class TestNodeStatusUpdater {
     public MyNodeStatusUpdater(Context context, Dispatcher dispatcher,
         NodeHealthCheckerService healthChecker, NodeManagerMetrics metrics,
         boolean signalContainer) {
-      super(context, dispatcher, healthChecker, metrics);
-      this.context = context;
-      resourceTracker = new MyResourceTracker(this.context, signalContainer);
-    }
-
-    @Override
-    protected ResourceTracker getRMClient() {
-      return resourceTracker;
-    }
-
-    @Override
-    protected void stopRMProxy() {
-      return;
+      super(context, dispatcher, healthChecker, metrics,
+          new MyResourceTracker(context, signalContainer));
     }
   }
 
@@ -1818,7 +1754,6 @@ public class TestNodeStatusUpdater {
     Assert.assertTrue("Test failed with exception(s)" + exceptions,
         exceptions.isEmpty());
   }
-
   // Add new containers info into NM context each time node heart beats.
   private class MyNMContext extends NMContext {
 
@@ -1922,31 +1857,6 @@ public class TestNodeStatusUpdater {
         this.registeredNodes.size());
   }
 
-  private YarnConfiguration createNMConfig(int port) throws IOException {
-    YarnConfiguration conf = new YarnConfiguration();
-    String localhostAddress = null;
-    try {
-      localhostAddress = InetAddress.getByName("localhost")
-          .getCanonicalHostName();
-    } catch (UnknownHostException e) {
-      Assert.fail("Unable to get localhost address: " + e.getMessage());
-    }
-    conf.setInt(YarnConfiguration.NM_PMEM_MB, 5 * 1024); // 5GB
-    conf.set(YarnConfiguration.NM_ADDRESS, localhostAddress + ":" + port);
-    conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, localhostAddress + ":"
-        + ServerSocketUtil.getPort(49160, 10));
-    conf.set(YarnConfiguration.NM_LOG_DIRS, logsDir.getAbsolutePath());
-    conf.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
-      remoteLogsDir.getAbsolutePath());
-    conf.set(YarnConfiguration.NM_LOCAL_DIRS, nmLocalDir.getAbsolutePath());
-    conf.setLong(YarnConfiguration.NM_LOG_RETAIN_SECONDS, 1);
-    return conf;
-  }
-
-  private YarnConfiguration createNMConfig() throws IOException {
-    return createNMConfig(ServerSocketUtil.getPort(49170, 10));
-  }
-
   private NodeManager getNodeManager(final NodeAction nodeHeartBeatAction) {
     return new NodeManager() {
       @Override
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
index 3c432d3..4b4f356 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
@@ -18,26 +18,6 @@
 
 package org.apache.hadoop.yarn.server.nodemanager.amrmproxy;
 
-import java.io.IOException;
-import java.security.PrivilegedExceptionAction;
-import java.util.ArrayList;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-import java.util.TreeSet;
-import java.util.concurrent.Callable;
-import java.util.concurrent.ConcurrentLinkedQueue;
-import java.util.concurrent.ConcurrentMap;
-import java.util.concurrent.ExecutorCompletionService;
-import java.util.concurrent.ExecutorService;
-import java.util.concurrent.Executors;
-import java.util.concurrent.Future;
-import java.util.concurrent.TimeUnit;
-
-import org.apache.hadoop.yarn.server.nodemanager.ContainerStateTransitionListener;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.security.Credentials;
 import org.apache.hadoop.security.UserGroupInformation;
@@ -66,6 +46,7 @@ import org.apache.hadoop.yarn.server.api.protocolrecords.LogAggregationReport;
 import org.apache.hadoop.yarn.server.api.records.AppCollectorData;
 import org.apache.hadoop.yarn.server.api.records.NodeHealthStatus;
 import org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.ContainerStateTransitionListener;
 import org.apache.hadoop.yarn.server.nodemanager.Context;
 import org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService;
 import org.apache.hadoop.yarn.server.nodemanager.NodeManager.NMContext;
@@ -74,18 +55,37 @@ import org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdater;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManager;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMMemoryStateStoreService;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.RecoveredAMRMProxyState;
-import org.apache.hadoop.yarn.server.scheduler.OpportunisticContainerAllocator;
 import org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager;
 import org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM;
 import org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher;
+import org.apache.hadoop.yarn.server.scheduler.OpportunisticContainerAllocator;
 import org.apache.hadoop.yarn.server.security.ApplicationACLsManager;
 import org.apache.hadoop.yarn.util.Records;
 import org.junit.After;
 import org.junit.Assert;
 import org.junit.Before;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeSet;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutorCompletionService;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
 
 /**
  * Base class for all the AMRMProxyService test cases. It provides utility
@@ -805,5 +805,9 @@ public abstract class BaseAMRMProxyTest {
     public NMTimelinePublisher getNMTimelinePublisher() {
       return null;
     }
+
+    public ResourcePluginManager getResourcePluginManager() {
+      return null;
+    }
   }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestResourceHandlerModule.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestResourceHandlerModule.java
index e5414a5..0563694 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestResourceHandlerModule.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestResourceHandlerModule.java
@@ -22,6 +22,7 @@ package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resourc
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
 import org.junit.Assert;
 import org.junit.Before;
 import org.junit.Test;
@@ -30,6 +31,8 @@ import org.slf4j.LoggerFactory;
 
 import java.util.List;
 
+import static org.mockito.Mockito.mock;
+
 public class TestResourceHandlerModule {
   private static final Logger LOG =
        LoggerFactory.getLogger(TestResourceHandlerModule.class);
@@ -62,7 +65,7 @@ public class TestResourceHandlerModule {
 
       //Ensure that outbound bandwidth resource handler is present in the chain
       ResourceHandlerChain resourceHandlerChain = ResourceHandlerModule
-          .getConfiguredResourceHandlerChain(networkEnabledConf);
+          .getConfiguredResourceHandlerChain(networkEnabledConf, mock(Context.class));
       List<ResourceHandler> resourceHandlers = resourceHandlerChain
           .getResourceHandlerList();
       //Exactly one resource handler in chain
@@ -88,7 +91,8 @@ public class TestResourceHandlerModule {
     Assert.assertNotNull(handler);
 
     ResourceHandlerChain resourceHandlerChain =
-        ResourceHandlerModule.getConfiguredResourceHandlerChain(diskConf);
+        ResourceHandlerModule.getConfiguredResourceHandlerChain(diskConf,
+            mock(Context.class));
     List<ResourceHandler> resourceHandlers =
         resourceHandlerChain.getResourceHandlerList();
     // Exactly one resource handler in chain
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
new file mode 100644
index 0000000..1c4313c
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
@@ -0,0 +1,385 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.hadoop.yarn.api.protocolrecords.ResourceTypes;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ApplicationId;
+import org.apache.hadoop.yarn.api.records.ContainerId;
+import org.apache.hadoop.yarn.api.records.ContainerLaunchContext;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperation;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer;
+import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService;
+import org.apache.hadoop.yarn.util.resource.ResourceUtils;
+import org.apache.hadoop.yarn.util.resource.TestResourceUtils;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+import static org.mockito.Matchers.any;
+import static org.mockito.Matchers.anyList;
+import static org.mockito.Matchers.anyListOf;
+import static org.mockito.Matchers.anyString;
+import static org.mockito.Matchers.eq;
+import static org.mockito.Mockito.doThrow;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+public class TestGpuResourceHandler {
+  private CGroupsHandler mockCGroupsHandler;
+  private PrivilegedOperationExecutor mockPrivilegedExecutor;
+  private GpuResourceHandlerImpl gpuResourceHandler;
+  private NMStateStoreService mockNMStateStore;
+  private ConcurrentHashMap<ContainerId, Container> runningContainersMap;
+
+  @Before
+  public void setup() {
+    TestResourceUtils.addNewTypesToResources(ResourceInformation.GPU_URI);
+
+    mockCGroupsHandler = mock(CGroupsHandler.class);
+    mockPrivilegedExecutor = mock(PrivilegedOperationExecutor.class);
+    mockNMStateStore = mock(NMStateStoreService.class);
+
+    Context nmctx = mock(Context.class);
+    when(nmctx.getNMStateStore()).thenReturn(mockNMStateStore);
+    runningContainersMap = new ConcurrentHashMap<>();
+    when(nmctx.getContainers()).thenReturn(runningContainersMap);
+
+    gpuResourceHandler = new GpuResourceHandlerImpl(nmctx, mockCGroupsHandler,
+        mockPrivilegedExecutor);
+  }
+
+  @Test
+  public void testBootStrap() throws Exception {
+    Configuration conf = new YarnConfiguration();
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0");
+
+    GpuDiscoverer.getInstance().initialize(conf);
+
+    gpuResourceHandler.bootstrap(conf);
+    verify(mockCGroupsHandler, times(1)).initializeCGroupController(
+        CGroupsHandler.CGroupController.DEVICES);
+  }
+
+  private static ContainerId getContainerId(int id) {
+    return ContainerId.newContainerId(ApplicationAttemptId
+        .newInstance(ApplicationId.newInstance(1234L, 1), 1), id);
+  }
+
+  private static Container mockContainerWithGpuRequest(int id,
+      int numGpuRequest) {
+    Container c = mock(Container.class);
+    when(c.getContainerId()).thenReturn(getContainerId(id));
+
+    Resource res = Resource.newInstance(1024, 1);
+    ResourceMappings resMapping = new ResourceMappings();
+
+    res.setResourceValue(ResourceInformation.GPU_URI, numGpuRequest);
+    when(c.getResource()).thenReturn(res);
+    when(c.getResourceMappings()).thenReturn(resMapping);
+    return c;
+  }
+
+  private void verifyDeniedDevices(ContainerId containerId,
+      List<Integer> deniedDevices)
+      throws ResourceHandlerException, PrivilegedOperationException {
+    verify(mockCGroupsHandler, times(1)).createCGroup(
+        CGroupsHandler.CGroupController.DEVICES, containerId.toString());
+
+    if (null != deniedDevices && !deniedDevices.isEmpty()) {
+      verify(mockPrivilegedExecutor, times(1)).executePrivilegedOperation(
+          new PrivilegedOperation(PrivilegedOperation.OperationType.GPU, Arrays
+              .asList(GpuResourceHandlerImpl.CONTAINER_ID_CLI_OPTION,
+                  containerId.toString(),
+                  GpuResourceHandlerImpl.EXCLUDED_GPUS_CLI_OPTION,
+                  StringUtils.join(",", deniedDevices))), true);
+    }
+  }
+
+  @Test
+  public void testAllocation() throws Exception {
+    Configuration conf = new YarnConfiguration();
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,3,4");
+    GpuDiscoverer.getInstance().initialize(conf);
+
+    gpuResourceHandler.bootstrap(conf);
+    Assert.assertEquals(4,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+
+    /* Start container 1, asks 3 containers */
+    gpuResourceHandler.preStart(mockContainerWithGpuRequest(1, 3));
+
+    // Only device=4 will be blocked.
+    verifyDeniedDevices(getContainerId(1), Arrays.asList(4));
+
+    /* Start container 2, asks 2 containers. Excepted to fail */
+    boolean failedToAllocate = false;
+    try {
+      gpuResourceHandler.preStart(mockContainerWithGpuRequest(2, 2));
+    } catch (ResourceHandlerException e) {
+      failedToAllocate = true;
+    }
+    Assert.assertTrue(failedToAllocate);
+
+    /* Start container 3, ask 1 container, succeeded */
+    gpuResourceHandler.preStart(mockContainerWithGpuRequest(3, 1));
+
+    // devices = 0/1/3 will be blocked
+    verifyDeniedDevices(getContainerId(3), Arrays.asList(0, 1, 3));
+
+    /* Start container 4, ask 0 container, succeeded */
+    gpuResourceHandler.preStart(mockContainerWithGpuRequest(4, 0));
+
+    // All devices will be blocked
+    verifyDeniedDevices(getContainerId(4), Arrays.asList(0, 1, 3, 4));
+
+    /* Release container-1, expect cgroups deleted */
+    gpuResourceHandler.postComplete(getContainerId(1));
+
+    verify(mockCGroupsHandler, times(1)).createCGroup(
+        CGroupsHandler.CGroupController.DEVICES, getContainerId(1).toString());
+    Assert.assertEquals(3,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+
+    /* Release container-3, expect cgroups deleted */
+    gpuResourceHandler.postComplete(getContainerId(3));
+
+    verify(mockCGroupsHandler, times(1)).createCGroup(
+        CGroupsHandler.CGroupController.DEVICES, getContainerId(3).toString());
+    Assert.assertEquals(4,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+  }
+
+  @SuppressWarnings("unchecked")
+  @Test
+  public void testAssignedGpuWillBeCleanedupWhenStoreOpFails()
+      throws Exception {
+    Configuration conf = new YarnConfiguration();
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,3,4");
+    GpuDiscoverer.getInstance().initialize(conf);
+
+    gpuResourceHandler.bootstrap(conf);
+    Assert.assertEquals(4,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+
+    doThrow(new IOException("Exception ...")).when(mockNMStateStore)
+        .storeAssignedResources(
+        any(ContainerId.class), anyString(), anyList());
+
+    boolean exception = false;
+    /* Start container 1, asks 3 containers */
+    try {
+      gpuResourceHandler.preStart(mockContainerWithGpuRequest(1, 3));
+    } catch (ResourceHandlerException e) {
+      exception = true;
+    }
+
+    Assert.assertTrue("preStart should throw exception", exception);
+
+    // After preStart, we still have 4 available GPU since the store op fails.
+    Assert.assertEquals(4,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+  }
+
+  @Test
+  public void testAllocationWithoutAllowedGpus() throws Exception {
+    Configuration conf = new YarnConfiguration();
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, " ");
+    GpuDiscoverer.getInstance().initialize(conf);
+
+    gpuResourceHandler.bootstrap(conf);
+    Assert.assertEquals(0,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+
+    /* Start container 1, asks 0 containers */
+    gpuResourceHandler.preStart(mockContainerWithGpuRequest(1, 0));
+    verifyDeniedDevices(getContainerId(1), Collections.<Integer>emptyList());
+
+    /* Start container 2, asks 1 containers. Excepted to fail */
+    boolean failedToAllocate = false;
+    try {
+      gpuResourceHandler.preStart(mockContainerWithGpuRequest(2, 1));
+    } catch (ResourceHandlerException e) {
+      failedToAllocate = true;
+    }
+    Assert.assertTrue(failedToAllocate);
+
+    /* Release container 1, expect cgroups deleted */
+    gpuResourceHandler.postComplete(getContainerId(1));
+
+    verify(mockCGroupsHandler, times(1)).createCGroup(
+        CGroupsHandler.CGroupController.DEVICES, getContainerId(1).toString());
+    Assert.assertEquals(0,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+  }
+
+  @Test
+  public void testAllocationStored() throws Exception {
+    Configuration conf = new YarnConfiguration();
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,3,4");
+    GpuDiscoverer.getInstance().initialize(conf);
+
+    gpuResourceHandler.bootstrap(conf);
+    Assert.assertEquals(4,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+
+    /* Start container 1, asks 3 containers */
+    Container container = mockContainerWithGpuRequest(1, 3);
+    gpuResourceHandler.preStart(container);
+
+    verify(mockNMStateStore).storeAssignedResources(getContainerId(1),
+        ResourceInformation.GPU_URI,
+        Arrays.<Serializable>asList("0", "1", "3"));
+
+    Assert.assertEquals(3, container.getResourceMappings()
+        .getAssignedResources(ResourceInformation.GPU_URI).size());
+
+    // Only device=4 will be blocked.
+    verifyDeniedDevices(getContainerId(1), Arrays.asList(4));
+
+    /* Start container 2, ask 0 container, succeeded */
+    container = mockContainerWithGpuRequest(2, 0);
+    gpuResourceHandler.preStart(container);
+
+    verifyDeniedDevices(getContainerId(2), Arrays.asList(0, 1, 3, 4));
+    Assert.assertEquals(0, container.getResourceMappings()
+        .getAssignedResources(ResourceInformation.GPU_URI).size());
+
+    // Store assigned resource will not be invoked.
+    verify(mockNMStateStore, never()).storeAssignedResources(
+        eq(getContainerId(2)), eq(ResourceInformation.GPU_URI),
+        anyListOf(Serializable.class));
+  }
+
+  @Test
+  public void testRecoverResourceAllocation() throws Exception {
+    Configuration conf = new YarnConfiguration();
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,3,4");
+    GpuDiscoverer.getInstance().initialize(conf);
+
+    gpuResourceHandler.bootstrap(conf);
+    Assert.assertEquals(4,
+        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+
+    Container nmContainer = mock(Container.class);
+    ResourceMappings rmap = new ResourceMappings();
+    ResourceMappings.AssignedResources ar =
+        new ResourceMappings.AssignedResources();
+    ar.updateAssignedResources(Arrays.<Serializable>asList("1", "3"));
+    rmap.addAssignedResources(ResourceInformation.GPU_URI, ar);
+    when(nmContainer.getResourceMappings()).thenReturn(rmap);
+
+    runningContainersMap.put(getContainerId(1), nmContainer);
+
+    // TEST CASE
+    // Reacquire container restore state of GPU Resource Allocator.
+    gpuResourceHandler.reacquireContainer(getContainerId(1));
+
+    Map<Integer, ContainerId> deviceAllocationMapping =
+        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
+    Assert.assertEquals(2, deviceAllocationMapping.size());
+    Assert.assertTrue(
+        deviceAllocationMapping.keySet().containsAll(Arrays.asList(1, 3)));
+    Assert.assertEquals(deviceAllocationMapping.get(1), getContainerId(1));
+
+    // TEST CASE
+    // Try to reacquire a container but requested device is not in allowed list.
+    nmContainer = mock(Container.class);
+    rmap = new ResourceMappings();
+    ar = new ResourceMappings.AssignedResources();
+    // id=5 is not in allowed list.
+    ar.updateAssignedResources(Arrays.<Serializable>asList("4", "5"));
+    rmap.addAssignedResources(ResourceInformation.GPU_URI, ar);
+    when(nmContainer.getResourceMappings()).thenReturn(rmap);
+
+    runningContainersMap.put(getContainerId(2), nmContainer);
+
+    boolean caughtException = false;
+    try {
+      gpuResourceHandler.reacquireContainer(getContainerId(1));
+    } catch (ResourceHandlerException e) {
+      caughtException = true;
+    }
+    Assert.assertTrue(
+        "Should fail since requested device Id is not in allowed list",
+        caughtException);
+
+    // Make sure internal state not changed.
+    deviceAllocationMapping =
+        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
+    Assert.assertEquals(2, deviceAllocationMapping.size());
+    Assert.assertTrue(
+        deviceAllocationMapping.keySet().containsAll(Arrays.asList(1, 3)));
+    Assert.assertEquals(deviceAllocationMapping.get(1), getContainerId(1));
+
+    // TEST CASE
+    // Try to reacquire a container but requested device is already assigned.
+    nmContainer = mock(Container.class);
+    rmap = new ResourceMappings();
+    ar = new ResourceMappings.AssignedResources();
+    // id=3 is already assigned
+    ar.updateAssignedResources(Arrays.<Serializable>asList("4", "3"));
+    rmap.addAssignedResources("gpu", ar);
+    when(nmContainer.getResourceMappings()).thenReturn(rmap);
+
+    runningContainersMap.put(getContainerId(2), nmContainer);
+
+    caughtException = false;
+    try {
+      gpuResourceHandler.reacquireContainer(getContainerId(1));
+    } catch (ResourceHandlerException e) {
+      caughtException = true;
+    }
+    Assert.assertTrue(
+        "Should fail since requested device Id is not in allowed list",
+        caughtException);
+
+    // Make sure internal state not changed.
+    deviceAllocationMapping =
+        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
+    Assert.assertEquals(2, deviceAllocationMapping.size());
+    Assert.assertTrue(
+        deviceAllocationMapping.keySet().containsAll(Arrays.asList(1, 3)));
+    Assert.assertEquals(deviceAllocationMapping.get(1), getContainerId(1));
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java
index 318ae6b..a147afb 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java
@@ -70,7 +70,7 @@ public class TestContainersMonitorResourceChange {
 
   private static class MockExecutor extends ContainerExecutor {
     @Override
-    public void init() throws IOException {
+    public void init(Context nmContext) throws IOException {
     }
     @Override
     public void startLocalizer(LocalizerStartContext ctx)
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/TestResourcePluginManager.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/TestResourcePluginManager.java
new file mode 100644
index 0000000..bcadf76
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/TestResourcePluginManager.java
@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.service.ServiceOperations;
+import org.apache.hadoop.yarn.api.records.ContainerId;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.event.Dispatcher;
+import org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.Context;
+import org.apache.hadoop.yarn.server.nodemanager.DeletionService;
+import org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService;
+import org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService;
+import org.apache.hadoop.yarn.server.nodemanager.NodeManager;
+import org.apache.hadoop.yarn.server.nodemanager.NodeManagerTestBase;
+import org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdater;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperation;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.NodeResourceUpdaterPlugin;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePlugin;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager;
+import org.apache.hadoop.yarn.server.security.ApplicationACLsManager;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.mockito.Matchers.any;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+public class TestResourcePluginManager extends NodeManagerTestBase {
+  private NodeManager nm;
+
+  ResourcePluginManager stubResourcePluginmanager() {
+    // Stub ResourcePluginManager
+    final ResourcePluginManager rpm = mock(ResourcePluginManager.class);
+    Map<String, ResourcePlugin> plugins = new HashMap<>();
+
+    // First resource plugin
+    ResourcePlugin resourcePlugin = mock(ResourcePlugin.class);
+    NodeResourceUpdaterPlugin nodeResourceUpdaterPlugin = mock(
+        NodeResourceUpdaterPlugin.class);
+    when(resourcePlugin.getNodeResourceHandlerInstance()).thenReturn(
+        nodeResourceUpdaterPlugin);
+    plugins.put("resource1", resourcePlugin);
+
+    // Second resource plugin
+    resourcePlugin = mock(ResourcePlugin.class);
+    when(resourcePlugin.createResourceHandler(any(Context.class), any(
+        CGroupsHandler.class), any(PrivilegedOperationExecutor.class)))
+        .thenReturn(new CustomizedResourceHandler());
+    plugins.put("resource2", resourcePlugin);
+    when(rpm.getNameToPlugins()).thenReturn(plugins);
+    return rpm;
+  }
+
+  @After
+  public void tearDown() {
+    if (nm != null) {
+      try {
+        ServiceOperations.stop(nm);
+      } catch (Throwable t) {
+        // ignore
+      }
+    }
+  }
+
+  private class CustomizedResourceHandler implements ResourceHandler {
+
+    @Override
+    public List<PrivilegedOperation> bootstrap(Configuration configuration)
+        throws ResourceHandlerException {
+      return null;
+    }
+
+    @Override
+    public List<PrivilegedOperation> preStart(Container container)
+        throws ResourceHandlerException {
+      return null;
+    }
+
+    @Override
+    public List<PrivilegedOperation> reacquireContainer(ContainerId containerId)
+        throws ResourceHandlerException {
+      return null;
+    }
+
+    @Override
+    public List<PrivilegedOperation> postComplete(ContainerId containerId)
+        throws ResourceHandlerException {
+      return null;
+    }
+
+    @Override
+    public List<PrivilegedOperation> teardown()
+        throws ResourceHandlerException {
+      return null;
+    }
+  }
+
+  private class MyMockNM extends NodeManager {
+    private final ResourcePluginManager rpm;
+
+    public MyMockNM(ResourcePluginManager rpm) {
+      this.rpm = rpm;
+    }
+
+    @Override
+    protected NodeStatusUpdater createNodeStatusUpdater(Context context,
+        Dispatcher dispatcher, NodeHealthCheckerService healthChecker) {
+      ((NodeManager.NMContext)context).setResourcePluginManager(rpm);
+      return new BaseNodeStatusUpdaterForTest(context, dispatcher, healthChecker,
+          metrics, new BaseResourceTrackerForTest());
+    }
+
+    @Override
+    protected ContainerManagerImpl createContainerManager(Context context,
+        ContainerExecutor exec, DeletionService del,
+        NodeStatusUpdater nodeStatusUpdater,
+        ApplicationACLsManager aclsManager,
+        LocalDirsHandlerService diskhandler) {
+      return new MyContainerManager(context, exec, del, nodeStatusUpdater,
+      metrics, diskhandler);
+    }
+
+    @Override
+    protected ResourcePluginManager createResourcePluginManager() {
+      return rpm;
+    }
+  }
+
+  public class MyLCE extends LinuxContainerExecutor {
+    private PrivilegedOperationExecutor poe = mock(PrivilegedOperationExecutor.class);
+
+    @Override
+    protected PrivilegedOperationExecutor getPrivilegedOperationExecutor() {
+      return poe;
+    }
+  }
+
+  /*
+   * Make sure ResourcePluginManager is initialized during NM start up.
+   */
+  @Test(timeout = 30000)
+  public void testResourcePluginManagerInitialization() throws Exception {
+    final ResourcePluginManager rpm = stubResourcePluginmanager();
+    nm = new MyMockNM(rpm);
+
+    YarnConfiguration conf = createNMConfig();
+    nm.init(conf);
+    verify(rpm, times(1)).initialize(
+        any(Context.class));
+  }
+
+  /*
+   * Make sure ResourcePluginManager is invoked during NM update.
+   */
+  @Test(timeout = 30000)
+  public void testNodeStatusUpdaterWithResourcePluginsEnabled() throws Exception {
+    final ResourcePluginManager rpm = stubResourcePluginmanager();
+
+    nm = new MyMockNM(rpm);
+
+    YarnConfiguration conf = createNMConfig();
+    nm.init(conf);
+    nm.start();
+
+    NodeResourceUpdaterPlugin nodeResourceUpdaterPlugin =
+        rpm.getNameToPlugins().get("resource1")
+            .getNodeResourceHandlerInstance();
+
+    verify(nodeResourceUpdaterPlugin, times(1)).updateConfiguredResource(
+        any(Resource.class));
+  }
+
+  /*
+   * Make sure ResourcePluginManager is used to initialize ResourceHandlerChain
+   */
+  @Test(timeout = 30000)
+  public void testLinuxContainerExecutorWithResourcePluginsEnabled() throws Exception {
+    final ResourcePluginManager rpm = stubResourcePluginmanager();
+    final LinuxContainerExecutor lce = new MyLCE();
+
+    nm = new NodeManager() {
+      @Override
+      protected NodeStatusUpdater createNodeStatusUpdater(Context context,
+          Dispatcher dispatcher, NodeHealthCheckerService healthChecker) {
+        ((NMContext)context).setResourcePluginManager(rpm);
+        return new BaseNodeStatusUpdaterForTest(context, dispatcher, healthChecker,
+            metrics, new BaseResourceTrackerForTest());
+      }
+
+      @Override
+      protected ContainerManagerImpl createContainerManager(Context context,
+          ContainerExecutor exec, DeletionService del,
+          NodeStatusUpdater nodeStatusUpdater,
+          ApplicationACLsManager aclsManager,
+          LocalDirsHandlerService diskhandler) {
+        return new MyContainerManager(context, exec, del, nodeStatusUpdater,
+            metrics, diskhandler);
+      }
+
+      @Override
+      protected ContainerExecutor createContainerExecutor(Configuration conf) {
+        ((NMContext)this.getNMContext()).setResourcePluginManager(rpm);
+        lce.setConf(conf);
+        return lce;
+      }
+    };
+
+    YarnConfiguration conf = createNMConfig();
+
+    nm.init(conf);
+    nm.start();
+
+    ResourceHandler handler = lce.getResourceHandler();
+    Assert.assertNotNull(handler);
+    Assert.assertTrue(handler instanceof ResourceHandlerChain);
+
+    boolean newHandlerAdded = false;
+    for (ResourceHandler h : ((ResourceHandlerChain) handler)
+        .getResourceHandlerList()) {
+      if (h instanceof CustomizedResourceHandler) {
+        newHandlerAdded = true;
+        break;
+      }
+    }
+    Assert.assertTrue("New ResourceHandler should be added", newHandlerAdded);
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java
new file mode 100644
index 0000000..83bace2
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java
@@ -0,0 +1,123 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.GpuDeviceInformation;
+import org.junit.Assert;
+import org.junit.Assume;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.List;
+
+public class TestGpuDiscoverer {
+  private String getTestParentFolder() {
+    File f = new File("target/temp/" + TestGpuDiscoverer.class.getName());
+    return f.getAbsolutePath();
+  }
+
+  private void touchFile(File f) throws IOException {
+    new FileOutputStream(f).close();
+  }
+
+  @Before
+  public void before() throws IOException {
+    String folder = getTestParentFolder();
+    File f = new File(folder);
+    FileUtils.deleteDirectory(f);
+    f.mkdirs();
+  }
+
+  @Test
+  public void testLinuxGpuResourceDiscoverPluginConfig() throws Exception {
+    // Only run this on demand.
+    Assume.assumeTrue(Boolean.valueOf(
+        System.getProperty("RunLinuxGpuResourceDiscoverPluginConfigTest")));
+
+    // test case 1, check default setting.
+    Configuration conf = new Configuration(false);
+    GpuDiscoverer plugin = new GpuDiscoverer();
+    plugin.initialize(conf);
+    Assert.assertEquals(GpuDiscoverer.DEFAULT_BINARY_NAME,
+        plugin.getPathOfGpuBinary());
+    Assert.assertNotNull(plugin.getEnvironmentToRunCommand().get("PATH"));
+    Assert.assertTrue(
+        plugin.getEnvironmentToRunCommand().get("PATH").contains("nvidia"));
+
+    // test case 2, check mandatory set path.
+    File fakeBinary = new File(getTestParentFolder(),
+        GpuDiscoverer.DEFAULT_BINARY_NAME);
+    touchFile(fakeBinary);
+    conf.set(YarnConfiguration.NM_GPU_PATH_TO_EXEC, getTestParentFolder());
+    plugin = new GpuDiscoverer();
+    plugin.initialize(conf);
+    Assert.assertEquals(fakeBinary.getAbsolutePath(),
+        plugin.getPathOfGpuBinary());
+    Assert.assertNull(plugin.getEnvironmentToRunCommand().get("PATH"));
+
+    // test case 3, check mandatory set path, but binary doesn't exist so default
+    // path will be used.
+    fakeBinary.delete();
+    plugin = new GpuDiscoverer();
+    plugin.initialize(conf);
+    Assert.assertEquals(GpuDiscoverer.DEFAULT_BINARY_NAME,
+        plugin.getPathOfGpuBinary());
+    Assert.assertTrue(
+        plugin.getEnvironmentToRunCommand().get("PATH").contains("nvidia"));
+  }
+
+  @Test
+  public void testGpuDiscover() throws YarnException {
+    // Since this is more of a performance unit test, only run if
+    // RunUserLimitThroughput is set (-DRunUserLimitThroughput=true)
+    Assume.assumeTrue(
+        Boolean.valueOf(System.getProperty("runGpuDiscoverUnitTest")));
+    Configuration conf = new Configuration(false);
+    GpuDiscoverer plugin = new GpuDiscoverer();
+    plugin.initialize(conf);
+    GpuDeviceInformation info = plugin.getGpuDeviceInformation();
+
+    Assert.assertTrue(info.getGpus().size() > 0);
+    Assert.assertEquals(plugin.getMinorNumbersOfGpusUsableByYarn().size(),
+        info.getGpus().size());
+  }
+
+  @Test
+  public void getNumberOfUsableGpusFromConfig() throws YarnException {
+    Configuration conf = new Configuration(false);
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,2,4");
+    GpuDiscoverer plugin = new GpuDiscoverer();
+    plugin.initialize(conf);
+
+    List<Integer> minorNumbers = plugin.getMinorNumbersOfGpusUsableByYarn();
+    Assert.assertEquals(4, minorNumbers.size());
+
+    Assert.assertTrue(0 == minorNumbers.get(0));
+    Assert.assertTrue(1 == minorNumbers.get(1));
+    Assert.assertTrue(2 == minorNumbers.get(2));
+    Assert.assertTrue(4 == minorNumbers.get(3));
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/TestGpuDeviceInformationParser.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/TestGpuDeviceInformationParser.java
new file mode 100644
index 0000000..e22597d
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/TestGpuDeviceInformationParser.java
@@ -0,0 +1,50 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+
+public class TestGpuDeviceInformationParser {
+  @Test
+  public void testParse() throws IOException, YarnException {
+    File f = new File("src/test/resources/nvidia-smi-sample-xml-output");
+    String s = FileUtils.readFileToString(f, "UTF-8");
+
+    GpuDeviceInformationParser parser = new GpuDeviceInformationParser();
+
+    GpuDeviceInformation info = parser.parseXml(s);
+    Assert.assertEquals("375.66", info.getDriverVersion());
+    Assert.assertEquals(2, info.getGpus().size());
+    PerGpuDeviceInformation gpu1 = info.getGpus().get(1);
+    Assert.assertEquals("Tesla P100-PCIE-12GB", gpu1.getProductName());
+    Assert.assertEquals(16384, gpu1.getGpuMemoryUsage().getTotalMemoryMiB());
+    Assert.assertEquals(10.3f,
+        gpu1.getGpuUtilizations().getOverallGpuUtilization(), 1e-6);
+    Assert.assertEquals(34f, gpu1.getTemperature().getCurrentGpuTemp(), 1e-6);
+    Assert.assertEquals(85f, gpu1.getTemperature().getMaxGpuTemp(), 1e-6);
+    Assert.assertEquals(82f, gpu1.getTemperature().getSlowThresholdGpuTemp(),
+        1e-6);
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-xml-output b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-xml-output
new file mode 100644
index 0000000..5ccb722
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-xml-output
@@ -0,0 +1,547 @@
+<?xml version="1.0" ?>
+<!DOCTYPE nvidia_smi_log SYSTEM "nvsmi_device_v8.dtd">
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<nvidia_smi_log>
+	<timestamp>Wed Sep  6 21:52:51 2017</timestamp>
+	<driver_version>375.66</driver_version>
+	<attached_gpus>2</attached_gpus>
+	<gpu id="0000:04:00.0">
+		<product_name>Tesla P100-PCIE-12GB</product_name>
+		<product_brand>Tesla</product_brand>
+		<display_mode>Disabled</display_mode>
+		<display_active>Disabled</display_active>
+		<persistence_mode>Disabled</persistence_mode>
+		<accounting_mode>Disabled</accounting_mode>
+		<accounting_mode_buffer_size>1920</accounting_mode_buffer_size>
+		<driver_model>
+			<current_dm>N/A</current_dm>
+			<pending_dm>N/A</pending_dm>
+		</driver_model>
+		<serial>0320717030197</serial>
+		<uuid>GPU-28604e81-21ec-cc48-6759-bf2648b22e16</uuid>
+		<minor_number>0</minor_number>
+		<vbios_version>86.00.3A.00.02</vbios_version>
+		<multigpu_board>No</multigpu_board>
+		<board_id>0x400</board_id>
+		<gpu_part_number>900-2H400-0110-030</gpu_part_number>
+		<inforom_version>
+			<img_version>H400.0202.00.01</img_version>
+			<oem_object>1.1</oem_object>
+			<ecc_object>4.1</ecc_object>
+			<pwr_object>N/A</pwr_object>
+		</inforom_version>
+		<gpu_operation_mode>
+			<current_gom>N/A</current_gom>
+			<pending_gom>N/A</pending_gom>
+		</gpu_operation_mode>
+		<gpu_virtualization_mode>
+			<virtualization_mode>None</virtualization_mode>
+		</gpu_virtualization_mode>
+		<pci>
+			<pci_bus>04</pci_bus>
+			<pci_device>00</pci_device>
+			<pci_domain>0000</pci_domain>
+			<pci_device_id>15F710DE</pci_device_id>
+			<pci_bus_id>0000:04:00.0</pci_bus_id>
+			<pci_sub_system_id>11DA10DE</pci_sub_system_id>
+			<pci_gpu_link_info>
+				<pcie_gen>
+					<max_link_gen>3</max_link_gen>
+					<current_link_gen>3</current_link_gen>
+				</pcie_gen>
+				<link_widths>
+					<max_link_width>16x</max_link_width>
+					<current_link_width>16x</current_link_width>
+				</link_widths>
+			</pci_gpu_link_info>
+			<pci_bridge_chip>
+				<bridge_chip_type>N/A</bridge_chip_type>
+				<bridge_chip_fw>N/A</bridge_chip_fw>
+			</pci_bridge_chip>
+			<replay_counter>0</replay_counter>
+			<tx_util>0 KB/s</tx_util>
+			<rx_util>0 KB/s</rx_util>
+		</pci>
+		<fan_speed>N/A</fan_speed>
+		<performance_state>P0</performance_state>
+		<clocks_throttle_reasons>
+			<clocks_throttle_reason_gpu_idle>Active</clocks_throttle_reason_gpu_idle>
+			<clocks_throttle_reason_applications_clocks_setting>Not Active</clocks_throttle_reason_applications_clocks_setting>
+			<clocks_throttle_reason_sw_power_cap>Not Active</clocks_throttle_reason_sw_power_cap>
+			<clocks_throttle_reason_hw_slowdown>Not Active</clocks_throttle_reason_hw_slowdown>
+			<clocks_throttle_reason_sync_boost>Not Active</clocks_throttle_reason_sync_boost>
+			<clocks_throttle_reason_unknown>Not Active</clocks_throttle_reason_unknown>
+		</clocks_throttle_reasons>
+		<fb_memory_usage>
+			<total>12193 MiB</total>
+			<used>0 MiB</used>
+			<free>12193 MiB</free>
+		</fb_memory_usage>
+		<bar1_memory_usage>
+			<total>16384 MiB</total>
+			<used>2 MiB</used>
+			<free>16382 MiB</free>
+		</bar1_memory_usage>
+		<compute_mode>Default</compute_mode>
+		<utilization>
+			<gpu_util>0 %</gpu_util>
+			<memory_util>0 %</memory_util>
+			<encoder_util>0 %</encoder_util>
+			<decoder_util>0 %</decoder_util>
+		</utilization>
+		<encoder_stats>
+			<session_count>0</session_count>
+			<average_fps>0</average_fps>
+			<average_latency>0 ms</average_latency>
+		</encoder_stats>
+		<ecc_mode>
+			<current_ecc>Enabled</current_ecc>
+			<pending_ecc>Enabled</pending_ecc>
+		</ecc_mode>
+		<ecc_errors>
+			<volatile>
+				<single_bit>
+					<device_memory>0</device_memory>
+					<register_file>0</register_file>
+					<l1_cache>N/A</l1_cache>
+					<l2_cache>0</l2_cache>
+					<texture_memory>0</texture_memory>
+					<texture_shm>0</texture_shm>
+					<total>0</total>
+				</single_bit>
+				<double_bit>
+					<device_memory>0</device_memory>
+					<register_file>0</register_file>
+					<l1_cache>N/A</l1_cache>
+					<l2_cache>0</l2_cache>
+					<texture_memory>0</texture_memory>
+					<texture_shm>0</texture_shm>
+					<total>0</total>
+				</double_bit>
+			</volatile>
+			<aggregate>
+				<single_bit>
+					<device_memory>0</device_memory>
+					<register_file>0</register_file>
+					<l1_cache>N/A</l1_cache>
+					<l2_cache>0</l2_cache>
+					<texture_memory>0</texture_memory>
+					<texture_shm>0</texture_shm>
+					<total>0</total>
+				</single_bit>
+				<double_bit>
+					<device_memory>0</device_memory>
+					<register_file>0</register_file>
+					<l1_cache>N/A</l1_cache>
+					<l2_cache>0</l2_cache>
+					<texture_memory>0</texture_memory>
+					<texture_shm>0</texture_shm>
+					<total>0</total>
+				</double_bit>
+			</aggregate>
+		</ecc_errors>
+		<retired_pages>
+			<multiple_single_bit_retirement>
+				<retired_count>0</retired_count>
+				<retired_page_addresses>
+				</retired_page_addresses>
+			</multiple_single_bit_retirement>
+			<double_bit_retirement>
+				<retired_count>0</retired_count>
+				<retired_page_addresses>
+				</retired_page_addresses>
+			</double_bit_retirement>
+			<pending_retirement>No</pending_retirement>
+		</retired_pages>
+		<temperature>
+			<gpu_temp>31 C</gpu_temp>
+			<gpu_temp_max_threshold>85 C</gpu_temp_max_threshold>
+			<gpu_temp_slow_threshold>82 C</gpu_temp_slow_threshold>
+		</temperature>
+		<power_readings>
+			<power_state>P0</power_state>
+			<power_management>Supported</power_management>
+			<power_draw>24.84 W</power_draw>
+			<power_limit>250.00 W</power_limit>
+			<default_power_limit>250.00 W</default_power_limit>
+			<enforced_power_limit>250.00 W</enforced_power_limit>
+			<min_power_limit>125.00 W</min_power_limit>
+			<max_power_limit>250.00 W</max_power_limit>
+		</power_readings>
+		<clocks>
+			<graphics_clock>405 MHz</graphics_clock>
+			<sm_clock>405 MHz</sm_clock>
+			<mem_clock>715 MHz</mem_clock>
+			<video_clock>835 MHz</video_clock>
+		</clocks>
+		<applications_clocks>
+			<graphics_clock>1189 MHz</graphics_clock>
+			<mem_clock>715 MHz</mem_clock>
+		</applications_clocks>
+		<default_applications_clocks>
+			<graphics_clock>1189 MHz</graphics_clock>
+			<mem_clock>715 MHz</mem_clock>
+		</default_applications_clocks>
+		<max_clocks>
+			<graphics_clock>1328 MHz</graphics_clock>
+			<sm_clock>1328 MHz</sm_clock>
+			<mem_clock>715 MHz</mem_clock>
+			<video_clock>1328 MHz</video_clock>
+		</max_clocks>
+		<clock_policy>
+			<auto_boost>N/A</auto_boost>
+			<auto_boost_default>N/A</auto_boost_default>
+		</clock_policy>
+		<supported_clocks>
+			<supported_mem_clock>
+				<value>715 MHz</value>
+				<supported_graphics_clock>1328 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1316 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1303 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1290 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1278 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1265 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1252 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1240 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1227 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1215 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1202 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1189 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1177 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1164 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1151 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1139 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1126 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1113 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1101 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1088 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1075 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1063 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1050 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1037 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1025 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1012 MHz</supported_graphics_clock>
+				<supported_graphics_clock>999 MHz</supported_graphics_clock>
+				<supported_graphics_clock>987 MHz</supported_graphics_clock>
+				<supported_graphics_clock>974 MHz</supported_graphics_clock>
+				<supported_graphics_clock>961 MHz</supported_graphics_clock>
+				<supported_graphics_clock>949 MHz</supported_graphics_clock>
+				<supported_graphics_clock>936 MHz</supported_graphics_clock>
+				<supported_graphics_clock>923 MHz</supported_graphics_clock>
+				<supported_graphics_clock>911 MHz</supported_graphics_clock>
+				<supported_graphics_clock>898 MHz</supported_graphics_clock>
+				<supported_graphics_clock>885 MHz</supported_graphics_clock>
+				<supported_graphics_clock>873 MHz</supported_graphics_clock>
+				<supported_graphics_clock>860 MHz</supported_graphics_clock>
+				<supported_graphics_clock>847 MHz</supported_graphics_clock>
+				<supported_graphics_clock>835 MHz</supported_graphics_clock>
+				<supported_graphics_clock>822 MHz</supported_graphics_clock>
+				<supported_graphics_clock>810 MHz</supported_graphics_clock>
+				<supported_graphics_clock>797 MHz</supported_graphics_clock>
+				<supported_graphics_clock>784 MHz</supported_graphics_clock>
+				<supported_graphics_clock>772 MHz</supported_graphics_clock>
+				<supported_graphics_clock>759 MHz</supported_graphics_clock>
+				<supported_graphics_clock>746 MHz</supported_graphics_clock>
+				<supported_graphics_clock>734 MHz</supported_graphics_clock>
+				<supported_graphics_clock>721 MHz</supported_graphics_clock>
+				<supported_graphics_clock>708 MHz</supported_graphics_clock>
+				<supported_graphics_clock>696 MHz</supported_graphics_clock>
+				<supported_graphics_clock>683 MHz</supported_graphics_clock>
+				<supported_graphics_clock>670 MHz</supported_graphics_clock>
+				<supported_graphics_clock>658 MHz</supported_graphics_clock>
+				<supported_graphics_clock>645 MHz</supported_graphics_clock>
+				<supported_graphics_clock>632 MHz</supported_graphics_clock>
+				<supported_graphics_clock>620 MHz</supported_graphics_clock>
+				<supported_graphics_clock>607 MHz</supported_graphics_clock>
+				<supported_graphics_clock>594 MHz</supported_graphics_clock>
+				<supported_graphics_clock>582 MHz</supported_graphics_clock>
+				<supported_graphics_clock>569 MHz</supported_graphics_clock>
+				<supported_graphics_clock>556 MHz</supported_graphics_clock>
+				<supported_graphics_clock>544 MHz</supported_graphics_clock>
+			</supported_mem_clock>
+		</supported_clocks>
+		<processes>
+		</processes>
+		<accounted_processes>
+		</accounted_processes>
+	</gpu>
+
+	<gpu id="0000:82:00.0">
+		<product_name>Tesla P100-PCIE-12GB</product_name>
+		<product_brand>Tesla</product_brand>
+		<display_mode>Disabled</display_mode>
+		<display_active>Disabled</display_active>
+		<persistence_mode>Disabled</persistence_mode>
+		<accounting_mode>Disabled</accounting_mode>
+		<accounting_mode_buffer_size>1920</accounting_mode_buffer_size>
+		<driver_model>
+			<current_dm>N/A</current_dm>
+			<pending_dm>N/A</pending_dm>
+		</driver_model>
+		<serial>0320717031755</serial>
+		<uuid>GPU-46915a82-3fd2-8e11-ae26-a80b607c04f3</uuid>
+		<minor_number>1</minor_number>
+		<vbios_version>86.00.3A.00.02</vbios_version>
+		<multigpu_board>No</multigpu_board>
+		<board_id>0x8200</board_id>
+		<gpu_part_number>900-2H400-0110-030</gpu_part_number>
+		<inforom_version>
+			<img_version>H400.0202.00.01</img_version>
+			<oem_object>1.1</oem_object>
+			<ecc_object>4.1</ecc_object>
+			<pwr_object>N/A</pwr_object>
+		</inforom_version>
+		<gpu_operation_mode>
+			<current_gom>N/A</current_gom>
+			<pending_gom>N/A</pending_gom>
+		</gpu_operation_mode>
+		<gpu_virtualization_mode>
+			<virtualization_mode>None</virtualization_mode>
+		</gpu_virtualization_mode>
+		<pci>
+			<pci_bus>82</pci_bus>
+			<pci_device>00</pci_device>
+			<pci_domain>0000</pci_domain>
+			<pci_device_id>15F710DE</pci_device_id>
+			<pci_bus_id>0000:82:00.0</pci_bus_id>
+			<pci_sub_system_id>11DA10DE</pci_sub_system_id>
+			<pci_gpu_link_info>
+				<pcie_gen>
+					<max_link_gen>3</max_link_gen>
+					<current_link_gen>3</current_link_gen>
+				</pcie_gen>
+				<link_widths>
+					<max_link_width>16x</max_link_width>
+					<current_link_width>16x</current_link_width>
+				</link_widths>
+			</pci_gpu_link_info>
+			<pci_bridge_chip>
+				<bridge_chip_type>N/A</bridge_chip_type>
+				<bridge_chip_fw>N/A</bridge_chip_fw>
+			</pci_bridge_chip>
+			<replay_counter>0</replay_counter>
+			<tx_util>0 KB/s</tx_util>
+			<rx_util>0 KB/s</rx_util>
+		</pci>
+		<fan_speed>N/A</fan_speed>
+		<performance_state>P0</performance_state>
+		<clocks_throttle_reasons>
+			<clocks_throttle_reason_gpu_idle>Active</clocks_throttle_reason_gpu_idle>
+			<clocks_throttle_reason_applications_clocks_setting>Not Active</clocks_throttle_reason_applications_clocks_setting>
+			<clocks_throttle_reason_sw_power_cap>Not Active</clocks_throttle_reason_sw_power_cap>
+			<clocks_throttle_reason_hw_slowdown>Not Active</clocks_throttle_reason_hw_slowdown>
+			<clocks_throttle_reason_sync_boost>Not Active</clocks_throttle_reason_sync_boost>
+			<clocks_throttle_reason_unknown>Not Active</clocks_throttle_reason_unknown>
+		</clocks_throttle_reasons>
+		<fb_memory_usage>
+			<total>12193 MiB</total>
+			<used>0 MiB</used>
+			<free>12193 MiB</free>
+		</fb_memory_usage>
+		<bar1_memory_usage>
+			<total>16384 MiB</total>
+			<used>2 MiB</used>
+			<free>16382 MiB</free>
+		</bar1_memory_usage>
+		<compute_mode>Default</compute_mode>
+		<utilization>
+			<gpu_util>10.3 %</gpu_util>
+			<memory_util>0 %</memory_util>
+			<encoder_util>0 %</encoder_util>
+			<decoder_util>0 %</decoder_util>
+		</utilization>
+		<encoder_stats>
+			<session_count>0</session_count>
+			<average_fps>0</average_fps>
+			<average_latency>0 ms</average_latency>
+		</encoder_stats>
+		<ecc_mode>
+			<current_ecc>Enabled</current_ecc>
+			<pending_ecc>Enabled</pending_ecc>
+		</ecc_mode>
+		<ecc_errors>
+			<volatile>
+				<single_bit>
+					<device_memory>0</device_memory>
+					<register_file>0</register_file>
+					<l1_cache>N/A</l1_cache>
+					<l2_cache>0</l2_cache>
+					<texture_memory>0</texture_memory>
+					<texture_shm>0</texture_shm>
+					<total>0</total>
+				</single_bit>
+				<double_bit>
+					<device_memory>0</device_memory>
+					<register_file>0</register_file>
+					<l1_cache>N/A</l1_cache>
+					<l2_cache>0</l2_cache>
+					<texture_memory>0</texture_memory>
+					<texture_shm>0</texture_shm>
+					<total>0</total>
+				</double_bit>
+			</volatile>
+			<aggregate>
+				<single_bit>
+					<device_memory>0</device_memory>
+					<register_file>0</register_file>
+					<l1_cache>N/A</l1_cache>
+					<l2_cache>0</l2_cache>
+					<texture_memory>0</texture_memory>
+					<texture_shm>0</texture_shm>
+					<total>0</total>
+				</single_bit>
+				<double_bit>
+					<device_memory>0</device_memory>
+					<register_file>0</register_file>
+					<l1_cache>N/A</l1_cache>
+					<l2_cache>0</l2_cache>
+					<texture_memory>0</texture_memory>
+					<texture_shm>0</texture_shm>
+					<total>0</total>
+				</double_bit>
+			</aggregate>
+		</ecc_errors>
+		<retired_pages>
+			<multiple_single_bit_retirement>
+				<retired_count>0</retired_count>
+				<retired_page_addresses>
+				</retired_page_addresses>
+			</multiple_single_bit_retirement>
+			<double_bit_retirement>
+				<retired_count>0</retired_count>
+				<retired_page_addresses>
+				</retired_page_addresses>
+			</double_bit_retirement>
+			<pending_retirement>No</pending_retirement>
+		</retired_pages>
+		<temperature>
+			<gpu_temp>34 C</gpu_temp>
+			<gpu_temp_max_threshold>85 C</gpu_temp_max_threshold>
+			<gpu_temp_slow_threshold>82 C</gpu_temp_slow_threshold>
+		</temperature>
+		<power_readings>
+			<power_state>P0</power_state>
+			<power_management>Supported</power_management>
+			<power_draw>25.54 W</power_draw>
+			<power_limit>250.00 W</power_limit>
+			<default_power_limit>250.00 W</default_power_limit>
+			<enforced_power_limit>250.00 W</enforced_power_limit>
+			<min_power_limit>125.00 W</min_power_limit>
+			<max_power_limit>250.00 W</max_power_limit>
+		</power_readings>
+		<clocks>
+			<graphics_clock>405 MHz</graphics_clock>
+			<sm_clock>405 MHz</sm_clock>
+			<mem_clock>715 MHz</mem_clock>
+			<video_clock>835 MHz</video_clock>
+		</clocks>
+		<applications_clocks>
+			<graphics_clock>1189 MHz</graphics_clock>
+			<mem_clock>715 MHz</mem_clock>
+		</applications_clocks>
+		<default_applications_clocks>
+			<graphics_clock>1189 MHz</graphics_clock>
+			<mem_clock>715 MHz</mem_clock>
+		</default_applications_clocks>
+		<max_clocks>
+			<graphics_clock>1328 MHz</graphics_clock>
+			<sm_clock>1328 MHz</sm_clock>
+			<mem_clock>715 MHz</mem_clock>
+			<video_clock>1328 MHz</video_clock>
+		</max_clocks>
+		<clock_policy>
+			<auto_boost>N/A</auto_boost>
+			<auto_boost_default>N/A</auto_boost_default>
+		</clock_policy>
+		<supported_clocks>
+			<supported_mem_clock>
+				<value>715 MHz</value>
+				<supported_graphics_clock>1328 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1316 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1303 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1290 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1278 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1265 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1252 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1240 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1227 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1215 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1202 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1189 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1177 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1164 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1151 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1139 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1126 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1113 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1101 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1088 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1075 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1063 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1050 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1037 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1025 MHz</supported_graphics_clock>
+				<supported_graphics_clock>1012 MHz</supported_graphics_clock>
+				<supported_graphics_clock>999 MHz</supported_graphics_clock>
+				<supported_graphics_clock>987 MHz</supported_graphics_clock>
+				<supported_graphics_clock>974 MHz</supported_graphics_clock>
+				<supported_graphics_clock>961 MHz</supported_graphics_clock>
+				<supported_graphics_clock>949 MHz</supported_graphics_clock>
+				<supported_graphics_clock>936 MHz</supported_graphics_clock>
+				<supported_graphics_clock>923 MHz</supported_graphics_clock>
+				<supported_graphics_clock>911 MHz</supported_graphics_clock>
+				<supported_graphics_clock>898 MHz</supported_graphics_clock>
+				<supported_graphics_clock>885 MHz</supported_graphics_clock>
+				<supported_graphics_clock>873 MHz</supported_graphics_clock>
+				<supported_graphics_clock>860 MHz</supported_graphics_clock>
+				<supported_graphics_clock>847 MHz</supported_graphics_clock>
+				<supported_graphics_clock>835 MHz</supported_graphics_clock>
+				<supported_graphics_clock>822 MHz</supported_graphics_clock>
+				<supported_graphics_clock>810 MHz</supported_graphics_clock>
+				<supported_graphics_clock>797 MHz</supported_graphics_clock>
+				<supported_graphics_clock>784 MHz</supported_graphics_clock>
+				<supported_graphics_clock>772 MHz</supported_graphics_clock>
+				<supported_graphics_clock>759 MHz</supported_graphics_clock>
+				<supported_graphics_clock>746 MHz</supported_graphics_clock>
+				<supported_graphics_clock>734 MHz</supported_graphics_clock>
+				<supported_graphics_clock>721 MHz</supported_graphics_clock>
+				<supported_graphics_clock>708 MHz</supported_graphics_clock>
+				<supported_graphics_clock>696 MHz</supported_graphics_clock>
+				<supported_graphics_clock>683 MHz</supported_graphics_clock>
+				<supported_graphics_clock>670 MHz</supported_graphics_clock>
+				<supported_graphics_clock>658 MHz</supported_graphics_clock>
+				<supported_graphics_clock>645 MHz</supported_graphics_clock>
+				<supported_graphics_clock>632 MHz</supported_graphics_clock>
+				<supported_graphics_clock>620 MHz</supported_graphics_clock>
+				<supported_graphics_clock>607 MHz</supported_graphics_clock>
+				<supported_graphics_clock>594 MHz</supported_graphics_clock>
+				<supported_graphics_clock>582 MHz</supported_graphics_clock>
+				<supported_graphics_clock>569 MHz</supported_graphics_clock>
+				<supported_graphics_clock>556 MHz</supported_graphics_clock>
+				<supported_graphics_clock>544 MHz</supported_graphics_clock>
+			</supported_mem_clock>
+		</supported_clocks>
+		<processes>
+		</processes>
+		<accounted_processes>
+		</accounted_processes>
+	</gpu>
+
+</nvidia_smi_log>
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 04/20: YARN-9175. Null resources check in ResourceInfo for branch-3.0

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit b32e2a7d307157318f8e518eedc5b3ee4c53dc57
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Thu Jan 3 15:58:10 2019 -0500

    YARN-9175. Null resources check in ResourceInfo for branch-3.0
    
    (cherry picked from commit a0291a015c1af0ea1282849bd8fb32824d7452fa)
---
 .../hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java  | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java
index e13980a..dd80d20 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceInfo.java
@@ -62,7 +62,7 @@ public class ResourceInfo {
 
   @Override
   public String toString() {
-    return resources.toString();
+    return getResource().toString();
   }
 
   public void setMemory(int memory) {
@@ -82,6 +82,9 @@ public class ResourceInfo {
   }
 
   public Resource getResource() {
+    if (resources == null) {
+      resources = Resource.newInstance(memory, vCores);
+    }
     return Resource.newInstance(resources);
   }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 18/20: YARN-7383. Node resource is not parsed correctly for resource names containing dot. Contributed by Gergely Novák.

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 6239fafe0140ad5bba1c35f468730e62554f908d
Author: Sunil G <su...@apache.org>
AuthorDate: Wed Dec 13 22:00:07 2017 +0530

    YARN-7383. Node resource is not parsed correctly for resource names containing dot. Contributed by Gergely Novák.
---
 .../apache/hadoop/yarn/util/resource/ResourceUtils.java   | 15 ++++++---------
 .../hadoop/yarn/util/resource/TestResourceUtils.java      |  5 ++++-
 .../test/resources/resource-types/node-resources-2.xml    |  5 +++++
 .../test/resources/resource-types/resource-types-4.xml    |  7 ++++++-
 4 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
index abf58a6..65eb5a2 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
@@ -461,21 +461,18 @@ public class ResourceUtils {
     for (Map.Entry<String, String> entry : conf) {
       String key = entry.getKey();
       String value = entry.getValue();
-
-      if (key.startsWith(YarnConfiguration.NM_RESOURCES_PREFIX)) {
-        addResourceInformation(key, value, nodeResources);
-      }
+      addResourceTypeInformation(key, value, nodeResources);
     }
 
     return nodeResources;
   }
 
-  private static void addResourceInformation(String prop, String value,
+  private static void addResourceTypeInformation(String prop, String value,
       Map<String, ResourceInformation> nodeResources) {
-    String[] parts = prop.split("\\.");
-    LOG.info("Found resource entry " + prop);
-    if (parts.length == 4) {
-      String resourceType = parts[3];
+    if (prop.startsWith(YarnConfiguration.NM_RESOURCES_PREFIX)) {
+      LOG.info("Found resource entry " + prop);
+      String resourceType = prop.substring(
+          YarnConfiguration.NM_RESOURCES_PREFIX.length());
       if (!nodeResources.containsKey(resourceType)) {
         nodeResources
             .put(resourceType, ResourceInformation.newInstance(resourceType));
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
index 80555ca..b511705 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
@@ -124,9 +124,10 @@ public class TestResourceUtils {
         new ResourceFileInformation("resource-types-3.xml", 3);
     testFile3.resourceNameUnitsMap.put("resource2", "");
     ResourceFileInformation testFile4 =
-        new ResourceFileInformation("resource-types-4.xml", 4);
+        new ResourceFileInformation("resource-types-4.xml", 5);
     testFile4.resourceNameUnitsMap.put("resource1", "G");
     testFile4.resourceNameUnitsMap.put("resource2", "m");
+    testFile4.resourceNameUnitsMap.put("yarn.io/gpu", "");
 
     ResourceFileInformation[] tests = {testFile1, testFile2, testFile3,
         testFile4};
@@ -292,6 +293,8 @@ public class TestResourceUtils {
         ResourceInformation.newInstance("resource1", "Gi", 5L));
     test3Resources.setResourceInformation("resource2",
         ResourceInformation.newInstance("resource2", "m", 2L));
+    test3Resources.setResourceInformation("yarn.io/gpu",
+        ResourceInformation.newInstance("yarn.io/gpu", "", 1));
     testRun.put("node-resources-2.xml", test3Resources);
 
     for (Map.Entry<String, Resource> entry : testRun.entrySet()) {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/resource-types/node-resources-2.xml b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/resource-types/node-resources-2.xml
index 9d9b3dc..382d5dd 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/resource-types/node-resources-2.xml
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/resource-types/node-resources-2.xml
@@ -36,4 +36,9 @@ limitations under the License. See accompanying LICENSE file.
    <value>2m</value>
  </property>
 
+ <property>
+   <name>yarn.nodemanager.resource-type.yarn.io/gpu</name>
+   <value>1</value>
+ </property>
+
 </configuration>
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/resource-types/resource-types-4.xml b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/resource-types/resource-types-4.xml
index c84316a..ea8d2bd 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/resource-types/resource-types-4.xml
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/resource-types/resource-types-4.xml
@@ -18,7 +18,7 @@ limitations under the License. See accompanying LICENSE file.
 
  <property>
    <name>yarn.resource-types</name>
-   <value>resource1,resource2</value>
+   <value>resource1,resource2,yarn.io/gpu</value>
  </property>
 
  <property>
@@ -31,4 +31,9 @@ limitations under the License. See accompanying LICENSE file.
    <value>m</value>
  </property>
 
+ <property>
+   <name>yarn.resource-types.yarn.io/gpu.units</name>
+   <value></value>
+ </property>
+
 </configuration>


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 14/20: YARN-9397. Fix empty NMResourceInfo object test failures in branch-2

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit f279f92bb1b9d89c4968bf168096615eff27b081
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Mon Mar 18 13:44:27 2019 -0700

    YARN-9397. Fix empty NMResourceInfo object test failures in branch-2
---
 .../yarn/server/nodemanager/webapp/TestNMWebServices.java      | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
index 2f1577f..980eae9 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
@@ -457,23 +457,23 @@ public class TestNMWebServices extends JerseyTestBase {
     assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
 
     // Access resource-2 should fail (empty NMResourceInfo returned).
-    JSONObject json = response.getEntity(JSONObject.class);
-    assertEquals(0, json.length());
+    String resp = response.getEntity(String.class);
+    assertEquals("null", resp);
 
     // Access resource-3 should fail (unknown plugin)
     response = r.path("ws").path("v1").path("node").path(
         "resources").path("resource-3").accept(MediaType.APPLICATION_JSON).get(
         ClientResponse.class);
     assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
-    json = response.getEntity(JSONObject.class);
-    assertEquals(0, json.length());
+    resp = response.getEntity(String.class);
+    assertEquals("null", resp);
 
     // Access resource-1 should success
     response = r.path("ws").path("v1").path("node").path(
         "resources").path("resource-1").accept(MediaType.APPLICATION_JSON).get(
         ClientResponse.class);
     assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
-    json = response.getEntity(JSONObject.class);
+    JSONObject json = response.getEntity(JSONObject.class);
     assertEquals(1000, Long.parseLong(json.get("a").toString()));
 
     // Access resource-1 should success (encoded yarn.io/Fresource-1).


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 05/20: YARN-9187. Backport YARN-6852 for GPU-specific native changes to branch-2

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit f0dcb31f3cb1b012cff14f0475f3ecffa6930c6c
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Wed Jan 9 16:21:43 2019 -0500

    YARN-9187. Backport YARN-6852 for GPU-specific native changes to branch-2
---
 .../src/CMakeLists.txt                             |   8 +-
 .../container-executor/impl/container-executor.h   |   2 +
 .../src/main/native/container-executor/impl/main.c |  11 +
 .../impl/modules/cgroups/cgroups-operations.c      | 161 +++++++++++++++
 .../impl/modules/cgroups/cgroups-operations.h      |  55 +++++
 .../impl/modules/gpu/gpu-module.c                  | 229 +++++++++++++++++++++
 .../impl/modules/gpu/gpu-module.h                  |  45 ++++
 .../test/modules/cgroups/test-cgroups-module.cc    | 121 +++++++++++
 .../test/modules/gpu/test-gpu-module.cc            | 203 ++++++++++++++++++
 .../test/test-container-executor.c                 |   1 -
 10 files changed, 833 insertions(+), 3 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt
index 0b1c3e9..e9f8aff 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt
@@ -101,9 +101,11 @@ add_library(container
     main/native/container-executor/impl/container-executor.c
     main/native/container-executor/impl/get_executable.c
     main/native/container-executor/impl/utils/string-utils.c
+    main/native/container-executor/impl/utils/docker-util.c
     main/native/container-executor/impl/utils/path-utils.c
+    main/native/container-executor/impl/modules/cgroups/cgroups-operations.c
     main/native/container-executor/impl/modules/common/module-configs.c
-    main/native/container-executor/impl/utils/docker-util.c
+    main/native/container-executor/impl/modules/gpu/gpu-module.c
 )
 
 add_executable(container-executor
@@ -135,6 +137,8 @@ add_executable(cetest
         main/native/container-executor/test/utils/test-string-utils.cc
         main/native/container-executor/test/utils/test-path-utils.cc
         main/native/container-executor/test/test_util.cc
-        main/native/container-executor/test/utils/test_docker_util.cc)
+        main/native/container-executor/test/utils/test_docker_util.cc
+        main/native/container-executor/test/modules/cgroups/test-cgroups-module.cc
+        main/native/container-executor/test/modules/gpu/test-gpu-module.cc)
 target_link_libraries(cetest gtest container)
 output_directory(cetest test)
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
index 956b38c..a78b077 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
@@ -285,3 +285,5 @@ int execute_regex_match(const char *regex_str, const char *input);
  * Return 0 on success.
  */
 int validate_docker_image_name(const char *image_name);
+
+struct configuration* get_cfg();
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
index 930dabe..9cf34a0 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
@@ -22,6 +22,8 @@
 #include "util.h"
 #include "get_executable.h"
 #include "utils/string-utils.h"
+#include "modules/gpu/gpu-module.h"
+#include "modules/cgroups/cgroups-operations.h"
 
 #include <errno.h>
 #include <grp.h>
@@ -241,6 +243,14 @@ static int validate_arguments(int argc, char **argv , int *operation) {
     return INVALID_ARGUMENT_NUMBER;
   }
 
+  /*
+   * Check if it is a known module, if yes, redirect to module
+   */
+  if (strcmp("--module-gpu", argv[1]) == 0) {
+    return handle_gpu_request(&update_cgroups_parameters, "gpu", argc - 1,
+           &argv[1]);
+  }
+
   if (strcmp("--checksetup", argv[1]) == 0) {
     *operation = CHECK_SETUP;
     return 0;
@@ -325,6 +335,7 @@ static int validate_arguments(int argc, char **argv , int *operation) {
         return FEATURE_DISABLED;
     }
   }
+
   /* Now we have to validate 'run as user' operations that don't use
     a 'long option' - we should fix this at some point. The validation/argument
     parsing here is extensive enough that it done in a separate function */
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c
new file mode 100644
index 0000000..b234109
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c
@@ -0,0 +1,161 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "configuration.h"
+#include "container-executor.h"
+#include "utils/string-utils.h"
+#include "utils/path-utils.h"
+#include "modules/common/module-configs.h"
+#include "modules/common/constants.h"
+#include "modules/cgroups/cgroups-operations.h"
+#include "util.h"
+
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/stat.h>
+
+#define MAX_PATH_LEN 4096
+
+static const struct section* cgroup_cfg_section = NULL;
+
+void reload_cgroups_configuration() {
+  cgroup_cfg_section = get_configuration_section(CGROUPS_SECTION_NAME, get_cfg());
+}
+
+char* get_cgroups_path_to_write(
+    const char* hierarchy_name,
+    const char* param_name,
+    const char* group_id) {
+  int failed = 0;
+  char* buffer = NULL;
+  const char* cgroups_root = get_section_value(CGROUPS_ROOT_KEY,
+     cgroup_cfg_section);
+  const char* yarn_hierarchy_name = get_section_value(
+     CGROUPS_YARN_HIERARCHY_KEY, cgroup_cfg_section);
+
+  // Make sure it is defined.
+  if (!cgroups_root || cgroups_root[0] == 0) {
+    fprintf(ERRORFILE, "%s is not defined in container-executor.cfg\n",
+      CGROUPS_ROOT_KEY);
+    failed = 1;
+    goto cleanup;
+  }
+
+  // Make sure it is defined.
+  if (!yarn_hierarchy_name || yarn_hierarchy_name[0] == 0) {
+    fprintf(ERRORFILE, "%s is not defined in container-executor.cfg\n",
+      CGROUPS_YARN_HIERARCHY_KEY);
+    failed = 1;
+    goto cleanup;
+  }
+
+  buffer = malloc(MAX_PATH_LEN + 1);
+  if (!buffer) {
+    fprintf(ERRORFILE, "Failed to allocate memory for output path.\n");
+    failed = 1;
+    goto cleanup;
+  }
+
+  // Make a path.
+  // CGroups path should not be too long.
+  if (snprintf(buffer, MAX_PATH_LEN, "%s/%s/%s/%s/%s.%s",
+    cgroups_root, hierarchy_name, yarn_hierarchy_name,
+    group_id, hierarchy_name, param_name) < 0) {
+    fprintf(ERRORFILE, "Failed to print output path.\n");
+    failed = 1;
+    goto cleanup;
+  }
+
+cleanup:
+  if (failed) {
+    if (buffer) {
+      free(buffer);
+    }
+    return NULL;
+  }
+  return buffer;
+}
+
+int update_cgroups_parameters(
+   const char* hierarchy_name,
+   const char* param_name,
+   const char* group_id,
+   const char* value) {
+#ifndef __linux
+  fprintf(ERRORFILE, "Failed to update cgroups parameters, not supported\n");
+  return -1;
+#endif
+  int failure = 0;
+
+  if (!cgroup_cfg_section) {
+    reload_cgroups_configuration();
+  }
+
+  char* full_path = get_cgroups_path_to_write(hierarchy_name, param_name,
+    group_id);
+
+  if (!full_path) {
+    fprintf(ERRORFILE,
+      "Failed to get cgroups path to write, it should be a configuration issue");
+    failure = 1;
+    goto cleanup;
+  }
+
+  if (!verify_path_safety(full_path)) {
+    failure = 1;
+    goto cleanup;
+  }
+
+  // Make sure file exists
+  struct stat sb;
+  if (stat(full_path, &sb) != 0) {
+    fprintf(ERRORFILE, "CGroups: Could not find file to write, %s", full_path);
+    failure = 1;
+    goto cleanup;
+  }
+
+  fprintf(ERRORFILE, "CGroups: Updating cgroups, path=%s, value=%s",
+    full_path, value);
+
+  // Write values to file
+  FILE *f;
+  f = fopen(full_path, "a");
+  if (!f) {
+    fprintf(ERRORFILE, "CGroups: Failed to open cgroups file, %s", full_path);
+    failure = 1;
+    goto cleanup;
+  }
+  if (fprintf(f, "%s", value) < 0) {
+    fprintf(ERRORFILE, "CGroups: Failed to write cgroups file, %s", full_path);
+    fclose(f);
+    failure = 1;
+    goto cleanup;
+  }
+  if (fclose(f) != 0) {
+    fprintf(ERRORFILE, "CGroups: Failed to close cgroups file, %s", full_path);
+    failure = 1;
+    goto cleanup;
+  }
+
+cleanup:
+  if (full_path) {
+    free(full_path);
+  }
+  return -failure;
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.h b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.h
new file mode 100644
index 0000000..cf80bcf
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.h
@@ -0,0 +1,55 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef _CGROUPS_OPERATIONS_H_
+#define _CGROUPS_OPERATIONS_H_
+
+#define CGROUPS_SECTION_NAME "cgroups"
+#define CGROUPS_ROOT_KEY "root"
+#define CGROUPS_YARN_HIERARCHY_KEY "yarn-hierarchy"
+
+/**
+ * Handle update CGroups parameter update requests:
+ * - hierarchy_name: e.g. devices / cpu,cpuacct
+ * - param_name: e.g. deny
+ * - group_id: e.g. container_x_y
+ * - value: e.g. "a *:* rwm"
+ *
+ * return 0 if succeeded
+ */
+int update_cgroups_parameters(
+   const char* hierarchy_name,
+   const char* param_name,
+   const char* group_id,
+   const char* value);
+
+ /**
+  * Get CGroups path to update. Visible for testing.
+  * Return 0 if succeeded
+  */
+ char* get_cgroups_path_to_write(
+    const char* hierarchy_name,
+    const char* param_name,
+    const char* group_id);
+
+ /**
+  * Reload config from filesystem, visible for testing.
+  */
+ void reload_cgroups_configuration();
+
+#endif
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.c b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.c
new file mode 100644
index 0000000..f96645d
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.c
@@ -0,0 +1,229 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "configuration.h"
+#include "container-executor.h"
+#include "utils/string-utils.h"
+#include "modules/gpu/gpu-module.h"
+#include "modules/cgroups/cgroups-operations.h"
+#include "modules/common/module-configs.h"
+#include "modules/common/constants.h"
+#include "util.h"
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <getopt.h>
+#include <unistd.h>
+
+#define EXCLUDED_GPUS_OPTION "excluded_gpus"
+#define CONTAINER_ID_OPTION "container_id"
+#define DEFAULT_NVIDIA_MAJOR_NUMBER 195
+#define MAX_CONTAINER_ID_LEN 128
+
+static const struct section* cfg_section;
+
+static int internal_handle_gpu_request(
+    update_cgroups_parameters_func update_cgroups_parameters_func_p,
+    size_t n_minor_devices_to_block, int minor_devices[],
+    const char* container_id) {
+  char* allowed_minor_numbers_str = NULL;
+  int* allowed_minor_numbers = NULL;
+  size_t n_allowed_minor_numbers = 0;
+  int return_code = 0;
+
+  if (n_minor_devices_to_block == 0) {
+    // no device to block, just return;
+    return 0;
+  }
+
+  // Get major device number from cfg, if not set, major number of (Nvidia)
+  // will be the default value.
+  int major_device_number;
+  char* major_number_str = get_section_value(GPU_MAJOR_NUMBER_CONFIG_KEY,
+     cfg_section);
+  if (!major_number_str || 0 == major_number_str[0]) {
+    // Default major number of Nvidia devices
+    major_device_number = DEFAULT_NVIDIA_MAJOR_NUMBER;
+  } else {
+    major_device_number = strtol(major_number_str, NULL, 0);
+  }
+
+  // Get allowed minor device numbers from cfg, if not set, means all minor
+  // devices can be used by YARN
+  allowed_minor_numbers_str = get_section_value(
+      GPU_ALLOWED_DEVICES_MINOR_NUMBERS,
+      cfg_section);
+  if (!allowed_minor_numbers_str || 0 == allowed_minor_numbers_str[0]) {
+    allowed_minor_numbers = NULL;
+  } else {
+    int rc = get_numbers_split_by_comma(allowed_minor_numbers_str,
+                                        &allowed_minor_numbers,
+                                        &n_allowed_minor_numbers);
+    if (0 != rc) {
+      fprintf(ERRORFILE,
+          "Failed to get allowed minor device numbers from cfg, value=%s\n",
+          allowed_minor_numbers_str);
+      return_code = -1;
+      goto cleanup;
+    }
+
+    // Make sure we're trying to black devices allowed in config
+    for (int i = 0; i < n_minor_devices_to_block; i++) {
+      int found = 0;
+      for (int j = 0; j < n_allowed_minor_numbers; j++) {
+        if (minor_devices[i] == allowed_minor_numbers[j]) {
+          found = 1;
+          break;
+        }
+      }
+
+      if (!found) {
+        fprintf(ERRORFILE,
+          "Trying to blacklist device with minor-number=%d which is not on allowed list\n",
+          minor_devices[i]);
+        return_code = -1;
+        goto cleanup;
+      }
+    }
+  }
+
+  // Use cgroup helpers to blacklist devices
+  for (int i = 0; i < n_minor_devices_to_block; i++) {
+    char param_value[128];
+    memset(param_value, 0, sizeof(param_value));
+    snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
+             major_device_number, i);
+
+    int rc = update_cgroups_parameters_func_p("devices", "deny",
+      container_id, param_value);
+
+    if (0 != rc) {
+      fprintf(ERRORFILE, "CGroups: Failed to update cgroups\n");
+      return_code = -1;
+      goto cleanup;
+    }
+  }
+
+cleanup:
+  if (major_number_str) {
+    free(major_number_str);
+  }
+  if (allowed_minor_numbers) {
+    free(allowed_minor_numbers);
+  }
+  if (allowed_minor_numbers_str) {
+    free(allowed_minor_numbers_str);
+  }
+
+  return return_code;
+}
+
+void reload_gpu_configuration() {
+  cfg_section = get_configuration_section(GPU_MODULE_SECTION_NAME, get_cfg());
+}
+
+/*
+ * Format of GPU request commandline:
+ *
+ * c-e gpu --excluded_gpus 0,1,3 --container_id container_x_y
+ */
+int handle_gpu_request(update_cgroups_parameters_func func,
+    const char* module_name, int module_argc, char** module_argv) {
+  if (!cfg_section) {
+    reload_gpu_configuration();
+  }
+
+  if (!module_enabled(cfg_section, GPU_MODULE_SECTION_NAME)) {
+    fprintf(ERRORFILE,
+      "Please make sure gpu module is enabled before using it.\n");
+    return -1;
+  }
+
+  static struct option long_options[] = {
+    {EXCLUDED_GPUS_OPTION, required_argument, 0, 'e' },
+    {CONTAINER_ID_OPTION, required_argument, 0, 'c' },
+    {0, 0, 0, 0}
+  };
+
+  int rc = 0;
+  int c = 0;
+  int option_index = 0;
+
+  int* minor_devices = NULL;
+  char container_id[MAX_CONTAINER_ID_LEN];
+  memset(container_id, 0, sizeof(container_id));
+  size_t n_minor_devices_to_block = 0;
+  int failed = 0;
+
+  optind = 1;
+  while((c = getopt_long(module_argc, module_argv, "e:c:",
+                         long_options, &option_index)) != -1) {
+    switch(c) {
+      case 'e':
+        rc = get_numbers_split_by_comma(optarg, &minor_devices,
+          &n_minor_devices_to_block);
+        if (0 != rc) {
+          fprintf(ERRORFILE,
+            "Failed to get minor devices number from command line, value=%s\n",
+            optarg);
+          failed = 1;
+          goto cleanup;
+        }
+        break;
+      case 'c':
+        if (!validate_container_id(optarg)) {
+          fprintf(ERRORFILE,
+            "Specified container_id=%s is invalid\n", optarg);
+          failed = 1;
+          goto cleanup;
+        }
+        strncpy(container_id, optarg, MAX_CONTAINER_ID_LEN);
+        break;
+      default:
+        fprintf(ERRORFILE,
+          "Unknown option in gpu command character %d %c, optionindex = %d\n",
+          c, c, optind);
+        failed = 1;
+        goto cleanup;
+    }
+  }
+
+  if (0 == container_id[0]) {
+    fprintf(ERRORFILE,
+      "[%s] --container_id must be specified.\n", __func__);
+    failed = 1;
+    goto cleanup;
+  }
+
+  if (!minor_devices) {
+     // Minor devices is null, skip following call.
+     fprintf(ERRORFILE, "is not specified, skip cgroups call.\n");
+     goto cleanup;
+  }
+
+  failed = internal_handle_gpu_request(func, n_minor_devices_to_block,
+         minor_devices,
+         container_id);
+
+cleanup:
+  if (minor_devices) {
+    free(minor_devices);
+  }
+  return failed;
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.h b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.h
new file mode 100644
index 0000000..59d4c7e
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.h
@@ -0,0 +1,45 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifdef __FreeBSD__
+#define _WITH_GETLINE
+#endif
+
+#ifndef _MODULES_GPU_GPU_MUDULE_H_
+#define _MODULES_GPU_GPU_MUDULE_H_
+
+#define GPU_MAJOR_NUMBER_CONFIG_KEY "gpu.major-device-number"
+#define GPU_ALLOWED_DEVICES_MINOR_NUMBERS "gpu.allowed-device-minor-numbers"
+#define GPU_MODULE_SECTION_NAME "gpu"
+
+// For unit test stubbing
+typedef int (*update_cgroups_parameters_func)(const char*, const char*,
+   const char*, const char*);
+
+/**
+ * Handle gpu requests
+ */
+int handle_gpu_request(update_cgroups_parameters_func func,
+   const char* module_name, int module_argc, char** module_argv);
+
+/**
+ * Reload config from filesystem, visible for testing.
+ */
+void reload_gpu_configuration();
+
+#endif
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/cgroups/test-cgroups-module.cc b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/cgroups/test-cgroups-module.cc
new file mode 100644
index 0000000..8ffbe88
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/cgroups/test-cgroups-module.cc
@@ -0,0 +1,121 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <gtest/gtest.h>
+#include <sstream>
+
+extern "C" {
+#include "configuration.h"
+#include "container-executor.h"
+#include "modules/cgroups/cgroups-operations.h"
+#include "test/test-container-executor-common.h"
+#include "util.h"
+}
+
+namespace ContainerExecutor {
+
+class TestCGroupsModule : public ::testing::Test {
+protected:
+  virtual void SetUp() {
+    if (mkdirs(TEST_ROOT, 0755) != 0) {
+      fprintf(ERRORFILE, "Failed to mkdir TEST_ROOT: %s\n", TEST_ROOT);
+      exit(1);
+    }
+    LOGFILE = stdout;
+    ERRORFILE = stderr;
+  }
+
+  virtual void TearDown() {}
+};
+
+TEST_F(TestCGroupsModule, test_cgroups_get_path_without_define_root) {
+  // Write config file.
+  const char *filename = TEST_ROOT "/test_cgroups_get_path_without_root.cfg";
+  FILE *file = fopen(filename, "w");
+  if (file == NULL) {
+    printf("FAIL: Could not open configuration file: %s\n", filename);
+    exit(1);
+  }
+  fprintf(file, "[cgroups]\n");
+  fprintf(file, "yarn-hierarchy=yarn\n");
+  fclose(file);
+
+  // Read config file
+  read_executor_config(filename);
+  reload_cgroups_configuration();
+
+  char* path = get_cgroups_path_to_write("devices", "deny", "container_1");
+
+  ASSERT_TRUE(NULL == path) << "Should fail.\n";
+}
+
+TEST_F(TestCGroupsModule, test_cgroups_get_path_without_define_yarn_hierarchy) {
+  // Write config file.
+  const char *filename = TEST_ROOT "/test_cgroups_get_path_without_root.cfg";
+  FILE *file = fopen(filename, "w");
+
+  ASSERT_TRUE(file) << "FAIL: Could not open configuration file: " << filename
+                    << "\n";
+  fprintf(file, "[cgroups]\n");
+  fprintf(file, "root=/sys/fs/cgroups\n");
+  fclose(file);
+
+  // Read config file
+  read_executor_config(filename);
+  reload_cgroups_configuration();
+  char* path = get_cgroups_path_to_write("devices", "deny", "container_1");
+
+  ASSERT_TRUE(NULL == path) << "Should fail.\n";
+}
+
+TEST_F(TestCGroupsModule, test_cgroups_get_path_succeeded) {
+  // Write config file.
+  const char *filename = TEST_ROOT "/test_cgroups_get_path.cfg";
+  FILE *file = fopen(filename, "w");
+
+  ASSERT_TRUE(file) << "FAIL: Could not open configuration file\n";
+  fprintf(file, "[cgroups]\n");
+  fprintf(file, "root=/sys/fs/cgroups \n");
+  fprintf(file, "yarn-hierarchy=yarn \n");
+  fclose(file);
+
+  // Read config file
+  read_executor_config(filename);
+  reload_cgroups_configuration();
+
+  char* path = get_cgroups_path_to_write("devices", "deny", "container_1");
+  ASSERT_TRUE(NULL != path) << "Should success.\n";
+
+  const char *EXPECTED =
+      "/sys/fs/cgroups/devices/yarn/container_1/devices.deny";
+
+  ASSERT_STREQ(EXPECTED, path)
+      << "Return cgroup-path-to-write is not expected\n";
+}
+} // namespace ContainerExecutor
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/gpu/test-gpu-module.cc b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/gpu/test-gpu-module.cc
new file mode 100644
index 0000000..7e41fb4
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/gpu/test-gpu-module.cc
@@ -0,0 +1,203 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <vector>
+
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <gtest/gtest.h>
+#include <sstream>
+
+extern "C" {
+#include "configuration.h"
+#include "container-executor.h"
+#include "modules/cgroups/cgroups-operations.h"
+#include "modules/gpu/gpu-module.h"
+#include "test/test-container-executor-common.h"
+#include "util.h"
+}
+
+namespace ContainerExecutor {
+
+class TestGpuModule : public ::testing::Test {
+protected:
+  virtual void SetUp() {
+    if (mkdirs(TEST_ROOT, 0755) != 0) {
+      fprintf(ERRORFILE, "Failed to mkdir TEST_ROOT: %s\n", TEST_ROOT);
+      exit(1);
+    }
+    LOGFILE = stdout;
+    ERRORFILE = stderr;
+  }
+
+  virtual void TearDown() {
+
+  }
+};
+
+static std::vector<const char*> cgroups_parameters_invoked;
+
+static int mock_update_cgroups_parameters(
+   const char* controller_name,
+   const char* param_name,
+   const char* group_id,
+   const char* value) {
+  char* buf = (char*) malloc(128);
+  strcpy(buf, controller_name);
+  cgroups_parameters_invoked.push_back(buf);
+
+  buf = (char*) malloc(128);
+  strcpy(buf, param_name);
+  cgroups_parameters_invoked.push_back(buf);
+
+  buf = (char*) malloc(128);
+  strcpy(buf, group_id);
+  cgroups_parameters_invoked.push_back(buf);
+
+  buf = (char*) malloc(128);
+  strcpy(buf, value);
+  cgroups_parameters_invoked.push_back(buf);
+  return 0;
+}
+
+static void verify_param_updated_to_cgroups(
+    int argc, const char** argv) {
+  ASSERT_EQ(argc, cgroups_parameters_invoked.size());
+
+  int offset = 0;
+  while (offset < argc) {
+    ASSERT_STREQ(argv[offset], cgroups_parameters_invoked[offset]);
+    offset++;
+  }
+}
+
+static void write_and_load_gpu_module_to_cfg(const char* cfg_filepath, int enabled) {
+  FILE *file = fopen(cfg_filepath, "w");
+  if (file == NULL) {
+    printf("FAIL: Could not open configuration file: %s\n", cfg_filepath);
+    exit(1);
+  }
+  fprintf(file, "[gpu]\n");
+  if (enabled) {
+    fprintf(file, "module.enabled=true\n");
+  } else {
+    fprintf(file, "module.enabled=false\n");
+  }
+  fclose(file);
+
+  // Read config file
+  read_executor_config(cfg_filepath);
+  reload_gpu_configuration();
+}
+
+static void test_gpu_module_enabled_disabled(int enabled) {
+  // Write config file.
+  const char *filename = TEST_ROOT "/test_cgroups_module_enabled_disabled.cfg";
+  write_and_load_gpu_module_to_cfg(filename, enabled);
+
+  char* argv[] = { (char*) "--module-gpu", (char*) "--excluded_gpus", (char*) "0,1",
+                   (char*) "--container_id",
+                   (char*) "container_1498064906505_0001_01_000001" };
+
+  int rc = handle_gpu_request(&mock_update_cgroups_parameters,
+              "gpu", 5, argv);
+
+  int EXPECTED_RC;
+  if (enabled) {
+    EXPECTED_RC = 0;
+  } else {
+    EXPECTED_RC = -1;
+  }
+  ASSERT_EQ(EXPECTED_RC, rc);
+}
+
+TEST_F(TestGpuModule, test_verify_gpu_module_calls_cgroup_parameter) {
+  // Write config file.
+  const char *filename = TEST_ROOT "/test_verify_gpu_module_calls_cgroup_parameter.cfg";
+  write_and_load_gpu_module_to_cfg(filename, 1);
+
+  char* container_id = (char*) "container_1498064906505_0001_01_000001";
+  char* argv[] = { (char*) "--module-gpu", (char*) "--excluded_gpus", (char*) "0,1",
+                   (char*) "--container_id",
+                   container_id };
+
+  /* Test case 1: block 2 devices */
+  cgroups_parameters_invoked.clear();
+  int rc = handle_gpu_request(&mock_update_cgroups_parameters,
+     "gpu", 5, argv);
+  ASSERT_EQ(0, rc) << "Should success.\n";
+
+  // Verify cgroups parameters
+  const char* expected_cgroups_argv[] = { "devices", "deny", container_id, "c 195:0 rwm",
+    "devices", "deny", container_id, "c 195:1 rwm"};
+  verify_param_updated_to_cgroups(8, expected_cgroups_argv);
+
+  /* Test case 2: block 0 devices */
+  cgroups_parameters_invoked.clear();
+  char* argv_1[] = { (char*) "--module-gpu", (char*) "--container_id", container_id };
+  rc = handle_gpu_request(&mock_update_cgroups_parameters,
+     "gpu", 3, argv_1);
+  ASSERT_EQ(0, rc) << "Should success.\n";
+
+  // Verify cgroups parameters
+  verify_param_updated_to_cgroups(0, NULL);
+}
+
+TEST_F(TestGpuModule, test_illegal_cli_parameters) {
+  // Write config file.
+  const char *filename = TEST_ROOT "/test_illegal_cli_parameters.cfg";
+  write_and_load_gpu_module_to_cfg(filename, 1);
+
+  // Illegal container id - 1
+  char* argv[] = { (char*) "--module-gpu", (char*) "--excluded_gpus", (char*) "0,1",
+                   (char*) "--container_id", (char*) "xxxx" };
+  int rc = handle_gpu_request(&mock_update_cgroups_parameters,
+     "gpu", 5, argv);
+  ASSERT_NE(0, rc) << "Should fail.\n";
+
+  // Illegal container id - 2
+  char* argv_1[] = { (char*) "--module-gpu", (char*) "--excluded_gpus", (char*) "0,1",
+                   (char*) "--container_id", (char*) "container_1" };
+  rc = handle_gpu_request(&mock_update_cgroups_parameters,
+     "gpu", 5, argv_1);
+  ASSERT_NE(0, rc) << "Should fail.\n";
+
+  // Illegal container id - 3
+  char* argv_2[] = { (char*) "--module-gpu", (char*) "--excluded_gpus", (char*) "0,1" };
+  rc = handle_gpu_request(&mock_update_cgroups_parameters,
+     "gpu", 3, argv_2);
+  ASSERT_NE(0, rc) << "Should fail.\n";
+}
+
+TEST_F(TestGpuModule, test_gpu_module_disabled) {
+  test_gpu_module_enabled_disabled(0);
+}
+
+TEST_F(TestGpuModule, test_gpu_module_enabled) {
+  test_gpu_module_enabled_disabled(1);
+}
+} // namespace ContainerExecutor
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
index 9e85b3f..235ea77 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
@@ -1392,7 +1392,6 @@ int main(int argc, char **argv) {
 #endif
 
   test_trim_function();
-  run("rm -fr " TEST_ROOT);
   printf("\nFinished tests\n");
 
   free(current_username);


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 19/20: YARN-8183. Fix ConcurrentModificationException inside RMAppAttemptMetrics#convertAtomicLongMaptoLongMap. (Suma Shivaprasad via wangda)

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 4d9f4e792e97728a551b52631e1d4ebcac232594
Author: Wangda Tan <wa...@apache.org>
AuthorDate: Tue Apr 24 17:42:17 2018 -0700

    YARN-8183. Fix ConcurrentModificationException inside RMAppAttemptMetrics#convertAtomicLongMaptoLongMap. (Suma Shivaprasad via wangda)
    
    Change-Id: I347871d672001653a3afe2e99adefd74e0d798cd
    (cherry picked from commit bb3c504764f807fccba7f28298a12e2296f284cb)
    (cherry picked from commit 3043a93d461fd8b9ccc2ff4b8d17e5430ed77615)
---
 .../resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java       | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
index 0982ef9..e68c5d7 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
@@ -20,6 +20,7 @@ package org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt;
 
 import java.util.HashMap;
 import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.concurrent.atomic.AtomicLong;
@@ -53,8 +54,8 @@ public class RMAppAttemptMetrics {
   
   private ReadLock readLock;
   private WriteLock writeLock;
-  private Map<String, AtomicLong> resourceUsageMap = new HashMap<>();
-  private Map<String, AtomicLong> preemptedResourceMap = new HashMap<>();
+  private Map<String, AtomicLong> resourceUsageMap = new ConcurrentHashMap<>();
+  private Map<String, AtomicLong> preemptedResourceMap = new ConcurrentHashMap<>();
   private RMContext rmContext;
 
   private int[][] localityStatistics =
@@ -97,7 +98,7 @@ public class RMAppAttemptMetrics {
   public Resource getResourcePreempted() {
     try {
       readLock.lock();
-      return resourcePreempted;
+      return Resource.newInstance(resourcePreempted);
     } finally {
       readLock.unlock();
     }
@@ -229,7 +230,7 @@ public class RMAppAttemptMetrics {
   }
 
   public Resource getApplicationAttemptHeadroom() {
-    return applicationHeadroom;
+    return Resource.newInstance(applicationHeadroom);
   }
 
   public void setApplicationAttemptHeadRoom(Resource headRoom) {


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 06/20: YARN-9180. Port YARN-7033 NM recovery of assigned resources to branch-2

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 4a1c7e6aade2b5d2621c5e09be2597dc6a73cd04
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Fri Feb 1 15:20:50 2019 -0800

    YARN-9180. Port YARN-7033 NM recovery of assigned resources to branch-2
---
 .../containermanager/container/Container.java      |   7 +
 .../containermanager/container/ContainerImpl.java  |  13 ++
 .../container/ResourceMappings.java                | 124 ++++++++++++++++
 .../recovery/NMLeveldbStateStoreService.java       |  42 ++++++
 .../recovery/NMNullStateStoreService.java          |   7 +
 .../nodemanager/recovery/NMStateStoreService.java  |  23 +++
 .../TestContainerManagerRecovery.java              | 163 +++++++++++++++------
 .../recovery/NMMemoryStateStoreService.java        |  14 ++
 .../recovery/TestNMLeveldbStateStoreService.java   | 122 ++++++++++-----
 .../server/nodemanager/webapp/MockContainer.java   |   6 +
 10 files changed, 436 insertions(+), 85 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
index b9d1e31..b5e3aa1 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
@@ -98,4 +98,11 @@ public interface Container extends EventHandler<ContainerEvent> {
   void sendPauseEvent(String description);
 
   Priority getPriority();
+
+  /**
+   * Get assigned resource mappings to the container.
+   *
+   * @return Resource Mappings of the container
+   */
+  ResourceMappings getResourceMappings();
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
index 4675716..e6c7bce 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
@@ -188,6 +188,7 @@ public class ContainerImpl implements Container {
   private boolean recoveredAsKilled = false;
   private Context context;
   private ResourceSet resourceSet;
+  private ResourceMappings resourceMappings;
 
   public ContainerImpl(Configuration conf, Dispatcher dispatcher,
       ContainerLaunchContext launchContext, Credentials creds,
@@ -245,6 +246,7 @@ public class ContainerImpl implements Container {
     stateMachine = stateMachineFactory.make(this, ContainerState.NEW,
         context.getContainerStateTransitionListener());
     this.resourceSet = new ResourceSet();
+    this.resourceMappings = new ResourceMappings();
   }
 
   private static ContainerRetryContext configureRetryContext(
@@ -285,6 +287,7 @@ public class ContainerImpl implements Container {
     this.remainingRetryAttempts = rcs.getRemainingRetryAttempts();
     this.workDir = rcs.getWorkDir();
     this.logDir = rcs.getLogDir();
+    this.resourceMappings = rcs.getResourceMappings();
   }
 
   private static final ContainerDiagnosticsUpdateTransition UPDATE_DIAGNOSTICS_TRANSITION =
@@ -2172,4 +2175,14 @@ public class ContainerImpl implements Container {
   public Priority getPriority() {
     return containerTokenIdentifier.getPriority();
   }
+
+  /**
+   * Get assigned resource mappings to the container.
+   *
+   * @return Resource Mappings of the container
+   */
+  @Override
+  public ResourceMappings getResourceMappings() {
+    return resourceMappings;
+  }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ResourceMappings.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ResourceMappings.java
new file mode 100644
index 0000000..d673341
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ResourceMappings.java
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.container;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.commons.io.IOUtils;
+
+/**
+ * This class is used to store assigned resource to a single container by
+ * resource types.
+ *
+ * Assigned resource could be list of String
+ *
+ * For example, we can assign container to:
+ * "numa": ["numa0"]
+ * "gpu": ["0", "1", "2", "3"]
+ * "fpga": ["1", "3"]
+ *
+ * This will be used for NM restart container recovery.
+ */
+public class ResourceMappings {
+
+  private Map<String, AssignedResources> assignedResourcesMap = new HashMap<>();
+
+  /**
+   * Get all resource mappings.
+   * @param resourceType resourceType
+   * @return map of resource mapping
+   */
+  public List<Serializable> getAssignedResources(String resourceType) {
+    AssignedResources ar = assignedResourcesMap.get(resourceType);
+    if (null == ar) {
+      return Collections.emptyList();
+    }
+    return ar.getAssignedResources();
+  }
+
+  /**
+   * Adds the resources for a given resource type.
+   *
+   * @param resourceType Resource Type
+   * @param assigned Assigned resources to add
+   */
+  public void addAssignedResources(String resourceType,
+      AssignedResources assigned) {
+    assignedResourcesMap.put(resourceType, assigned);
+  }
+
+  /**
+   * Stores resources assigned to a container for a given resource type.
+   */
+  public static class AssignedResources implements Serializable {
+    private static final long serialVersionUID = -1059491941955757926L;
+    private List<Serializable> resources = Collections.emptyList();
+
+    public List<Serializable> getAssignedResources() {
+      return Collections.unmodifiableList(resources);
+    }
+
+    public void updateAssignedResources(List<Serializable> list) {
+      this.resources = new ArrayList<>(list);
+    }
+
+    @SuppressWarnings("unchecked")
+    public static AssignedResources fromBytes(byte[] bytes)
+        throws IOException {
+      ObjectInputStream ois = null;
+      List<Serializable> resources;
+      try {
+        ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
+        ois = new ObjectInputStream(bis);
+        resources = (List<Serializable>) ois.readObject();
+      } catch (ClassNotFoundException e) {
+        throw new IOException(e);
+      } finally {
+        IOUtils.closeQuietly(ois);
+      }
+      AssignedResources ar = new AssignedResources();
+      ar.updateAssignedResources(resources);
+      return ar;
+    }
+
+    public byte[] toBytes() throws IOException {
+      ObjectOutputStream oos = null;
+      byte[] bytes;
+      try {
+        ByteArrayOutputStream bos = new ByteArrayOutputStream();
+        oos = new ObjectOutputStream(bos);
+        oos.writeObject(resources);
+        bytes = bos.toByteArray();
+      } finally {
+        IOUtils.closeQuietly(oos);
+      }
+      return bytes;
+    }
+  }
+}
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
index 129fa8f..6aec1be 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
@@ -28,6 +28,7 @@ import org.slf4j.LoggerFactory;
 
 import java.io.File;
 import java.io.IOException;
+import java.io.Serializable;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.HashMap;
@@ -43,6 +44,7 @@ import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.util.StringUtils;
 import org.apache.hadoop.util.Time;
 import org.apache.hadoop.yarn.api.protocolrecords.StartContainerRequest;
 import org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl;
@@ -62,6 +64,7 @@ import org.apache.hadoop.yarn.proto.YarnServiceProtos.StartContainerRequestProto
 import org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos.ContainerTokenIdentifierProto;
 import org.apache.hadoop.yarn.server.api.records.MasterKey;
 import org.apache.hadoop.yarn.server.api.records.impl.pb.MasterKeyPBImpl;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 import org.apache.hadoop.yarn.server.records.Version;
 import org.apache.hadoop.yarn.server.records.impl.pb.VersionPBImpl;
 import org.apache.hadoop.yarn.server.utils.BuilderUtils;
@@ -148,6 +151,9 @@ public class NMLeveldbStateStoreService extends NMStateStoreService {
 
   private static final String AMRMPROXY_KEY_PREFIX = "AMRMProxy/";
 
+  private static final String CONTAINER_ASSIGNED_RESOURCES_KEY_SUFFIX =
+      "/assignedResources_";
+
   private static final byte[] EMPTY_VALUE = new byte[0];
 
   private DB db;
@@ -309,6 +315,13 @@ public class NMLeveldbStateStoreService extends NMStateStoreService {
         rcs.setWorkDir(asString(entry.getValue()));
       } else if (suffix.equals(CONTAINER_LOG_DIR_KEY_SUFFIX)) {
         rcs.setLogDir(asString(entry.getValue()));
+      } else if (suffix.startsWith(CONTAINER_ASSIGNED_RESOURCES_KEY_SUFFIX)) {
+        String resourceType = suffix.substring(
+            CONTAINER_ASSIGNED_RESOURCES_KEY_SUFFIX.length());
+        ResourceMappings.AssignedResources assignedResources =
+            ResourceMappings.AssignedResources.fromBytes(entry.getValue());
+        rcs.getResourceMappings().addAssignedResources(resourceType,
+            assignedResources);
       } else {
         LOG.warn("the container " + containerId
             + " will be killed because of the unknown key " + key
@@ -1166,6 +1179,35 @@ public class NMLeveldbStateStoreService extends NMStateStoreService {
     }
   }
 
+  @Override
+  public void storeAssignedResources(ContainerId containerId,
+      String resourceType, List<Serializable> assignedResources)
+      throws IOException {
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("storeAssignedResources: containerId=" + containerId
+          + ", assignedResources=" + StringUtils.join(",", assignedResources));
+    }
+
+    String keyResChng = CONTAINERS_KEY_PREFIX + containerId.toString()
+        + CONTAINER_ASSIGNED_RESOURCES_KEY_SUFFIX + resourceType;
+    try {
+      WriteBatch batch = db.createWriteBatch();
+      try {
+        ResourceMappings.AssignedResources res =
+            new ResourceMappings.AssignedResources();
+        res.updateAssignedResources(assignedResources);
+
+        // New value will overwrite old values for the same key
+        batch.put(bytes(keyResChng), res.toBytes());
+        db.write(batch);
+      } finally {
+        batch.close();
+      }
+    } catch (DBException e) {
+      throw new IOException(e);
+    }
+  }
+
   @SuppressWarnings("deprecation")
   private void cleanupDeprecatedFinishedApps() {
     try {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
index aaf6fb2..6e3707b 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
@@ -19,6 +19,7 @@
 package org.apache.hadoop.yarn.server.nodemanager.recovery;
 
 import java.io.IOException;
+import java.io.Serializable;
 import java.util.List;
 
 import org.apache.hadoop.conf.Configuration;
@@ -267,6 +268,12 @@ public class NMNullStateStoreService extends NMStateStoreService {
   }
 
   @Override
+  public void storeAssignedResources(ContainerId containerId,
+      String resourceType, List<Serializable> assignedResources)
+      throws IOException {
+  }
+
+  @Override
   protected void initStorage(Configuration conf) throws IOException {
   }
 
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
index 1cdbd27..a929fe2 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
@@ -19,6 +19,7 @@
 package org.apache.hadoop.yarn.server.nodemanager.recovery;
 
 import java.io.IOException;
+import java.io.Serializable;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
@@ -43,6 +44,7 @@ import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.Localize
 import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.LogDeleterProto;
 import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
 import org.apache.hadoop.yarn.server.api.records.MasterKey;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 
 @Private
 @Unstable
@@ -90,6 +92,7 @@ public abstract class NMStateStoreService extends AbstractService {
     private RecoveredContainerType recoveryType =
         RecoveredContainerType.RECOVER;
     private long startTime;
+    private ResourceMappings resMappings = new ResourceMappings();
 
     public RecoveredContainerStatus getStatus() {
       return status;
@@ -174,6 +177,14 @@ public abstract class NMStateStoreService extends AbstractService {
     public void setRecoveryType(RecoveredContainerType recoveryType) {
       this.recoveryType = recoveryType;
     }
+
+    public ResourceMappings getResourceMappings() {
+      return resMappings;
+    }
+
+    public void setResourceMappings(ResourceMappings mappings) {
+      this.resMappings = mappings;
+    }
   }
 
   public static class LocalResourceTrackerState {
@@ -718,6 +729,18 @@ public abstract class NMStateStoreService extends AbstractService {
   public abstract void removeAMRMProxyAppContext(ApplicationAttemptId attempt)
       throws IOException;
 
+  /**
+   * Store the assigned resources to a container.
+   *
+   * @param containerId Container Id
+   * @param resourceType Resource Type
+   * @param assignedResources Assigned resources
+   * @throws IOException if fails
+   */
+  public abstract void storeAssignedResources(ContainerId containerId,
+      String resourceType, List<Serializable> assignedResources)
+      throws IOException;
+
   protected abstract void initStorage(Configuration conf) throws IOException;
 
   protected abstract void startStorage() throws IOException;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
index 8980a49..6241055 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
@@ -31,6 +31,7 @@ import static org.mockito.Mockito.verify;
 import java.io.File;
 import java.io.IOException;
 import java.io.PrintWriter;
+import java.io.Serializable;
 import java.nio.ByteBuffer;
 import java.security.PrivilegedExceptionAction;
 import java.util.ArrayList;
@@ -91,6 +92,7 @@ import org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Ap
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService;
@@ -110,6 +112,7 @@ import org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerIn
 import org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher;
 import org.apache.hadoop.yarn.server.security.ApplicationACLsManager;
 import org.apache.hadoop.yarn.util.timeline.TimelineUtils;
+import org.junit.Assert;
 import org.junit.Before;
 import org.junit.Test;
 
@@ -457,7 +460,7 @@ public class TestContainerManagerRecovery extends BaseContainerManagerTest {
     NMStateStoreService stateStore = new NMMemoryStateStoreService();
     stateStore.init(conf);
     stateStore.start();
-    Context context = createContext(conf, stateStore);
+    context = createContext(conf, stateStore);
     ContainerManagerImpl cm = createContainerManager(context, delSrvc);
     ((NMContext) context).setContainerManager(cm);
     cm.init(conf);
@@ -467,55 +470,12 @@ public class TestContainerManagerRecovery extends BaseContainerManagerTest {
     ApplicationAttemptId attemptId =
         ApplicationAttemptId.newInstance(appId, 1);
     ContainerId cid = ContainerId.newContainerId(attemptId, 1);
-    Map<String, String> containerEnv = new HashMap<>();
-    setFlowContext(containerEnv, "app_name1", appId);
-    Map<String, ByteBuffer> serviceData = Collections.emptyMap();
-    Credentials containerCreds = new Credentials();
-    DataOutputBuffer dob = new DataOutputBuffer();
-    containerCreds.writeTokenStorageToStream(dob);
-    ByteBuffer containerTokens = ByteBuffer.wrap(dob.getData(), 0,
-        dob.getLength());
-    Map<ApplicationAccessType, String> acls = Collections.emptyMap();
-    File tmpDir = new File("target",
-        this.getClass().getSimpleName() + "-tmpDir");
-    File scriptFile = Shell.appendScriptExtension(tmpDir, "scriptFile");
-    PrintWriter fileWriter = new PrintWriter(scriptFile);
-    if (Shell.WINDOWS) {
-      fileWriter.println("@ping -n 100 127.0.0.1 >nul");
-    } else {
-      fileWriter.write("\numask 0");
-      fileWriter.write("\nexec sleep 100");
-    }
-    fileWriter.close();
-    FileContext localFS = FileContext.getLocalFSFileContext();
-    URL resource_alpha =
-        URL.fromPath(localFS
-            .makeQualified(new Path(scriptFile.getAbsolutePath())));
-    LocalResource rsrc_alpha = RecordFactoryProvider
-        .getRecordFactory(null).newRecordInstance(LocalResource.class);
-    rsrc_alpha.setResource(resource_alpha);
-    rsrc_alpha.setSize(-1);
-    rsrc_alpha.setVisibility(LocalResourceVisibility.APPLICATION);
-    rsrc_alpha.setType(LocalResourceType.FILE);
-    rsrc_alpha.setTimestamp(scriptFile.lastModified());
-    String destinationFile = "dest_file";
-    Map<String, LocalResource> localResources = new HashMap<>();
-    localResources.put(destinationFile, rsrc_alpha);
-    List<String> commands =
-        Arrays.asList(Shell.getRunScriptCommand(scriptFile));
-    ContainerLaunchContext clc = ContainerLaunchContext.newInstance(
-        localResources, containerEnv, commands, serviceData,
-        containerTokens, acls);
-    StartContainersResponse startResponse = startContainer(
-        context, cm, cid, clc, null);
-    assertTrue(startResponse.getFailedRequests().isEmpty());
-    assertEquals(1, context.getApplications().size());
+
+    commonLaunchContainer(appId, cid, cm);
+
     Application app = context.getApplications().get(appId);
     assertNotNull(app);
-    // make sure the container reaches RUNNING state
-    waitForNMContainerState(cm, cid,
-        org.apache.hadoop.yarn.server.nodemanager
-            .containermanager.container.ContainerState.RUNNING);
+
     Resource targetResource = Resource.newInstance(2048, 2);
     ContainerUpdateResponse updateResponse =
         updateContainers(context, cm, cid, targetResource);
@@ -539,6 +499,62 @@ public class TestContainerManagerRecovery extends BaseContainerManagerTest {
   }
 
   @Test
+  public void testResourceMappingRecoveryForContainer() throws Exception {
+    conf.setBoolean(YarnConfiguration.NM_RECOVERY_ENABLED, true);
+    conf.setBoolean(YarnConfiguration.NM_RECOVERY_SUPERVISED, true);
+    NMStateStoreService stateStore = new NMMemoryStateStoreService();
+    stateStore.init(conf);
+    stateStore.start();
+    context = createContext(conf, stateStore);
+    ContainerManagerImpl cm = createContainerManager(context, delSrvc);
+    ((NMContext) context).setContainerManager(cm);
+    cm.init(conf);
+    cm.start();
+
+    // add an application by starting a container
+    ApplicationId appId = ApplicationId.newInstance(0, 1);
+    ApplicationAttemptId attemptId =
+        ApplicationAttemptId.newInstance(appId, 1);
+    ContainerId cid = ContainerId.newContainerId(attemptId, 1);
+
+    commonLaunchContainer(appId, cid, cm);
+
+    Application app = context.getApplications().get(appId);
+    assertNotNull(app);
+
+    // store resource mapping of the container
+    List<Serializable> gpuResources =
+        Arrays.<Serializable>asList("1", "2", "3");
+    stateStore.storeAssignedResources(cid, "gpu", gpuResources);
+    List<Serializable> numaResources = Arrays.<Serializable>asList("numa1");
+    stateStore.storeAssignedResources(cid, "numa", numaResources);
+    List<Serializable> fpgaResources =
+        Arrays.<Serializable>asList("fpga1", "fpga2");
+    stateStore.storeAssignedResources(cid, "fpga", fpgaResources);
+
+    cm.stop();
+    context = createContext(conf, stateStore);
+    cm = createContainerManager(context);
+    ((NMContext) context).setContainerManager(cm);
+    cm.init(conf);
+    cm.start();
+    assertEquals(1, context.getApplications().size());
+    app = context.getApplications().get(appId);
+    assertNotNull(app);
+
+    Container nmContainer = context.getContainers().get(cid);
+    Assert.assertNotNull(nmContainer);
+    ResourceMappings resourceMappings = nmContainer.getResourceMappings();
+    List<Serializable> assignedResource = resourceMappings
+        .getAssignedResources("gpu");
+    Assert.assertTrue(assignedResource.equals(gpuResources));
+    Assert.assertTrue(
+        resourceMappings.getAssignedResources("numa").equals(numaResources));
+    Assert.assertTrue(
+        resourceMappings.getAssignedResources("fpga").equals(fpgaResources));
+  }
+
+  @Test
   public void testContainerCleanupOnShutdown() throws Exception {
     ApplicationId appId = ApplicationId.newInstance(0, 1);
     ApplicationAttemptId attemptId =
@@ -610,6 +626,57 @@ public class TestContainerManagerRecovery extends BaseContainerManagerTest {
     verify(cm, never()).handle(isA(CMgrCompletedAppsEvent.class));
   }
 
+  private void commonLaunchContainer(ApplicationId appId, ContainerId cid,
+      ContainerManagerImpl cm) throws Exception {
+    Map<String, String> containerEnv = new HashMap<>();
+    setFlowContext(containerEnv, "app_name1", appId);
+    Map<String, ByteBuffer> serviceData = Collections.emptyMap();
+    Credentials containerCreds = new Credentials();
+    DataOutputBuffer dob = new DataOutputBuffer();
+    containerCreds.writeTokenStorageToStream(dob);
+    ByteBuffer containerTokens = ByteBuffer.wrap(dob.getData(), 0,
+        dob.getLength());
+    Map<ApplicationAccessType, String> acls = Collections.emptyMap();
+    File tmpDir = new File("target",
+        this.getClass().getSimpleName() + "-tmpDir");
+    File scriptFile = Shell.appendScriptExtension(tmpDir, "scriptFile");
+    PrintWriter fileWriter = new PrintWriter(scriptFile);
+    if (Shell.WINDOWS) {
+      fileWriter.println("@ping -n 100 127.0.0.1 >nul");
+    } else {
+      fileWriter.write("\numask 0");
+      fileWriter.write("\nexec sleep 100");
+    }
+    fileWriter.close();
+    FileContext localFS = FileContext.getLocalFSFileContext();
+    URL resource_alpha =
+        URL.fromPath(localFS
+            .makeQualified(new Path(scriptFile.getAbsolutePath())));
+    LocalResource rsrc_alpha = RecordFactoryProvider
+        .getRecordFactory(null).newRecordInstance(LocalResource.class);
+    rsrc_alpha.setResource(resource_alpha);
+    rsrc_alpha.setSize(-1);
+    rsrc_alpha.setVisibility(LocalResourceVisibility.APPLICATION);
+    rsrc_alpha.setType(LocalResourceType.FILE);
+    rsrc_alpha.setTimestamp(scriptFile.lastModified());
+    String destinationFile = "dest_file";
+    Map<String, LocalResource> localResources = new HashMap<>();
+    localResources.put(destinationFile, rsrc_alpha);
+    List<String> commands =
+        Arrays.asList(Shell.getRunScriptCommand(scriptFile));
+    ContainerLaunchContext clc = ContainerLaunchContext.newInstance(
+        localResources, containerEnv, commands, serviceData,
+        containerTokens, acls);
+    StartContainersResponse startResponse = startContainer(
+        context, cm, cid, clc, null);
+    assertTrue(startResponse.getFailedRequests().isEmpty());
+    assertEquals(1, context.getApplications().size());
+    // make sure the container reaches RUNNING state
+    waitForNMContainerState(cm, cid,
+        org.apache.hadoop.yarn.server.nodemanager
+            .containermanager.container.ContainerState.RUNNING);
+  }
+
   private ContainerManagerImpl createContainerManager(Context context,
       DeletionService delSrvc) {
     return new ContainerManagerImpl(context, exec, delSrvc,
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
index 0e46234..5d424ad 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
@@ -19,6 +19,7 @@
 package org.apache.hadoop.yarn.server.nodemanager.recovery;
 
 import java.io.IOException;
+import java.io.Serializable;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.HashMap;
@@ -42,6 +43,7 @@ import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.LogDelet
 import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
 import org.apache.hadoop.yarn.server.api.records.MasterKey;
 import org.apache.hadoop.yarn.server.api.records.impl.pb.MasterKeyPBImpl;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 
 
 import org.apache.hadoop.yarn.server.utils.BuilderUtils;
@@ -124,6 +126,7 @@ public class NMMemoryStateStoreService extends NMStateStoreService {
       rcsCopy.setRemainingRetryAttempts(rcs.getRemainingRetryAttempts());
       rcsCopy.setWorkDir(rcs.getWorkDir());
       rcsCopy.setLogDir(rcs.getLogDir());
+      rcsCopy.setResourceMappings(rcs.getResourceMappings());
       result.add(rcsCopy);
     }
     return result;
@@ -511,6 +514,17 @@ public class NMMemoryStateStoreService extends NMStateStoreService {
     amrmProxyState.getAppContexts().remove(attempt);
   }
 
+  @Override
+  public void storeAssignedResources(ContainerId containerId,
+      String resourceType, List<Serializable> assignedResources)
+      throws IOException {
+    ResourceMappings.AssignedResources ar =
+        new ResourceMappings.AssignedResources();
+    ar.updateAssignedResources(assignedResources);
+    containerStates.get(containerId).getResourceMappings()
+        .addAssignedResources(resourceType, ar);
+  }
+
   private static class TrackerState {
     Map<Path, LocalResourceProto> inProgressMap =
         new HashMap<Path, LocalResourceProto>();
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
index a507938..270b8af 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
@@ -32,6 +32,7 @@ import static org.mockito.Mockito.verify;
 
 import java.io.File;
 import java.io.IOException;
+import java.io.Serializable;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.Arrays;
@@ -1003,46 +1004,12 @@ public class TestNMLeveldbStateStoreService {
         .loadContainersState();
     assertTrue(recoveredContainers.isEmpty());
 
-    // create a container request
     ApplicationId appId = ApplicationId.newInstance(1234, 3);
     ApplicationAttemptId appAttemptId = ApplicationAttemptId.newInstance(appId,
         4);
     ContainerId containerId = ContainerId.newContainerId(appAttemptId, 5);
-    LocalResource lrsrc = LocalResource.newInstance(
-        URL.newInstance("hdfs", "somehost", 12345, "/some/path/to/rsrc"),
-        LocalResourceType.FILE, LocalResourceVisibility.APPLICATION, 123L,
-        1234567890L);
-    Map<String, LocalResource> localResources =
-        new HashMap<String, LocalResource>();
-    localResources.put("rsrc", lrsrc);
-    Map<String, String> env = new HashMap<String, String>();
-    env.put("somevar", "someval");
-    List<String> containerCmds = new ArrayList<String>();
-    containerCmds.add("somecmd");
-    containerCmds.add("somearg");
-    Map<String, ByteBuffer> serviceData = new HashMap<String, ByteBuffer>();
-    serviceData.put("someservice",
-        ByteBuffer.wrap(new byte[] { 0x1, 0x2, 0x3 }));
-    ByteBuffer containerTokens = ByteBuffer
-        .wrap(new byte[] { 0x7, 0x8, 0x9, 0xa });
-    Map<ApplicationAccessType, String> acls =
-        new HashMap<ApplicationAccessType, String>();
-    acls.put(ApplicationAccessType.VIEW_APP, "viewuser");
-    acls.put(ApplicationAccessType.MODIFY_APP, "moduser");
-    ContainerLaunchContext clc = ContainerLaunchContext.newInstance(
-        localResources, env, containerCmds,
-        serviceData, containerTokens, acls);
-    Resource containerRsrc = Resource.newInstance(1357, 3);
-    ContainerTokenIdentifier containerTokenId = new ContainerTokenIdentifier(
-        containerId, "host", "user", containerRsrc, 9876543210L, 42, 2468,
-        Priority.newInstance(7), 13579);
-    Token containerToken = Token.newInstance(containerTokenId.getBytes(),
-        ContainerTokenIdentifier.KIND.toString(), "password".getBytes(),
-        "tokenservice");
-    StartContainerRequest containerReq = StartContainerRequest.newInstance(clc,
-        containerToken);
-
-    stateStore.storeContainer(containerId, 0, 0, containerReq);
+    StartContainerRequest startContainerRequest = storeMockContainer(
+        containerId);
 
     // add a invalid key
     byte[] invalidKey = ("ContainerManager/containers/"
@@ -1055,7 +1022,7 @@ public class TestNMLeveldbStateStoreService {
     assertEquals(RecoveredContainerStatus.REQUESTED, rcs.getStatus());
     assertEquals(ContainerExitStatus.INVALID, rcs.getExitCode());
     assertEquals(false, rcs.getKilled());
-    assertEquals(containerReq, rcs.getStartRequest());
+    assertEquals(startContainerRequest, rcs.getStartRequest());
     assertTrue(rcs.getDiagnostics().isEmpty());
     assertEquals(RecoveredContainerType.KILL, rcs.getRecoveryType());
     // assert unknown keys are cleaned up finally
@@ -1163,6 +1130,87 @@ public class TestNMLeveldbStateStoreService {
     }
   }
 
+  @Test
+  public void testStateStoreForResourceMapping() throws IOException {
+    // test empty when no state
+    List<RecoveredContainerState> recoveredContainers = stateStore
+        .loadContainersState();
+    assertTrue(recoveredContainers.isEmpty());
+
+    ApplicationId appId = ApplicationId.newInstance(1234, 3);
+    ApplicationAttemptId appAttemptId = ApplicationAttemptId.newInstance(appId,
+        4);
+    ContainerId containerId = ContainerId.newContainerId(appAttemptId, 5);
+    storeMockContainer(containerId);
+
+    // Store ResourceMapping
+    stateStore.storeAssignedResources(containerId, "gpu",
+        Arrays.<Serializable>asList("1", "2", "3"));
+    // This will overwrite above
+    List<Serializable> gpuRes1 = Arrays.<Serializable>asList("1", "2", "4");
+    stateStore.storeAssignedResources(containerId, "gpu", gpuRes1);
+    List<Serializable> fpgaRes =
+        Arrays.<Serializable>asList("3", "4", "5", "6");
+    stateStore.storeAssignedResources(containerId, "fpga", fpgaRes);
+    List<Serializable> numaRes = Arrays.<Serializable>asList("numa1");
+    stateStore.storeAssignedResources(containerId, "numa", numaRes);
+
+    // add a invalid key
+    restartStateStore();
+    recoveredContainers = stateStore.loadContainersState();
+    assertEquals(1, recoveredContainers.size());
+    RecoveredContainerState rcs = recoveredContainers.get(0);
+    List<Serializable> res = rcs.getResourceMappings()
+        .getAssignedResources("gpu");
+    Assert.assertTrue(res.equals(gpuRes1));
+
+    res = rcs.getResourceMappings().getAssignedResources("fpga");
+    Assert.assertTrue(res.equals(fpgaRes));
+
+    res = rcs.getResourceMappings().getAssignedResources("numa");
+    Assert.assertTrue(res.equals(numaRes));
+  }
+
+  private StartContainerRequest storeMockContainer(ContainerId containerId)
+      throws IOException {
+    // create a container request
+    LocalResource lrsrc = LocalResource.newInstance(
+        URL.newInstance("hdfs", "somehost", 12345, "/some/path/to/rsrc"),
+        LocalResourceType.FILE, LocalResourceVisibility.APPLICATION, 123L,
+        1234567890L);
+    Map<String, LocalResource> localResources =
+        new HashMap<String, LocalResource>();
+    localResources.put("rsrc", lrsrc);
+    Map<String, String> env = new HashMap<String, String>();
+    env.put("somevar", "someval");
+    List<String> containerCmds = new ArrayList<String>();
+    containerCmds.add("somecmd");
+    containerCmds.add("somearg");
+    Map<String, ByteBuffer> serviceData = new HashMap<String, ByteBuffer>();
+    serviceData.put("someservice",
+        ByteBuffer.wrap(new byte[] { 0x1, 0x2, 0x3 }));
+    ByteBuffer containerTokens = ByteBuffer
+        .wrap(new byte[] { 0x7, 0x8, 0x9, 0xa });
+    Map<ApplicationAccessType, String> acls =
+        new HashMap<ApplicationAccessType, String>();
+    acls.put(ApplicationAccessType.VIEW_APP, "viewuser");
+    acls.put(ApplicationAccessType.MODIFY_APP, "moduser");
+    ContainerLaunchContext clc = ContainerLaunchContext.newInstance(
+        localResources, env, containerCmds,
+        serviceData, containerTokens, acls);
+    Resource containerRsrc = Resource.newInstance(1357, 3);
+    ContainerTokenIdentifier containerTokenId = new ContainerTokenIdentifier(
+        containerId, "host", "user", containerRsrc, 9876543210L, 42, 2468,
+        Priority.newInstance(7), 13579);
+    Token containerToken = Token.newInstance(containerTokenId.getBytes(),
+        ContainerTokenIdentifier.KIND.toString(), "password".getBytes(),
+        "tokenservice");
+    StartContainerRequest containerReq = StartContainerRequest.newInstance(clc,
+        containerToken);
+    stateStore.storeContainer(containerId, 0, 0, containerReq);
+    return containerReq;
+  }
+
   private static class NMTokenSecretManagerForTest extends
       BaseNMTokenSecretManager {
     public MasterKey generateKey() {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java
index b9c6fff..29c2038 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java
@@ -37,6 +37,7 @@ import org.apache.hadoop.yarn.server.api.protocolrecords.NMContainerStatus;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEvent;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerState;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceSet;
 import org.apache.hadoop.yarn.server.utils.BuilderUtils;
 
@@ -242,4 +243,9 @@ public class MockContainer implements Container {
   public long getContainerStartTime() {
     return 0;
   }
+
+  @Override
+  public ResourceMappings getResourceMappings() {
+    return null;
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 01/20: YARN-9188. Port YARN-7136 to branch-2

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 0d54ad7c1fb5b4b3048d7c37d90d6b2e5c0d0b61
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Wed Jan 9 16:01:06 2019 -0500

    YARN-9188. Port YARN-7136 to branch-2
---
 .../hadoop-yarn/dev-support/findbugs-exclude.xml   |   2 +-
 .../apache/hadoop/yarn/api/records/Resource.java   | 178 +++++++-------
 .../yarn/api/records/ResourceInformation.java      |  15 +-
 ...{BaseResource.java => LightWeightResource.java} | 104 +++++---
 .../hadoop/yarn/util/resource/ResourceUtils.java   |  23 +-
 .../yarn/api/records/impl/pb/ResourcePBImpl.java   |  19 +-
 .../util/resource/DominantResourceCalculator.java  |  75 +++---
 .../hadoop/yarn/util/resource/Resources.java       |  30 ++-
 .../yarn/util/resource/TestResourceUtils.java      |   2 +
 .../rmapp/attempt/RMAppAttemptImpl.java            |   2 -
 .../hadoop/yarn/server/resourcemanager/MockRM.java |   6 +-
 .../scheduler/capacity/TestCapacityScheduler.java  | 137 -----------
 .../capacity/TestCapacitySchedulerPerf.java        | 265 +++++++++++++++++++++
 .../apache/hadoop/yarn/server/MiniYARNCluster.java |   7 +-
 14 files changed, 524 insertions(+), 341 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml b/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
index e086fbe..45aa868 100644
--- a/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
+++ b/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
@@ -629,7 +629,7 @@
   </Match>
 
   <Match>
-    <Class name="org.apache.hadoop.yarn.api.records.impl.BaseResource" />
+    <Class name="org.apache.hadoop.yarn.api.records.Resource" />
     <Method name="getResources" />
     <Bug pattern="EI_EXPOSE_REP" />
   </Match>
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
index f3a5bc2..37b50f2 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
@@ -27,7 +27,7 @@ import org.apache.hadoop.classification.InterfaceStability;
 import org.apache.hadoop.classification.InterfaceStability.Evolving;
 import org.apache.hadoop.classification.InterfaceStability.Stable;
 import org.apache.hadoop.yarn.api.ApplicationMasterProtocol;
-import org.apache.hadoop.yarn.api.records.impl.BaseResource;
+import org.apache.hadoop.yarn.api.records.impl.LightWeightResource;
 import org.apache.hadoop.yarn.exceptions.ResourceNotFoundException;
 import org.apache.hadoop.yarn.util.Records;
 import org.apache.hadoop.yarn.util.resource.ResourceUtils;
@@ -59,8 +59,15 @@ import org.apache.hadoop.yarn.util.resource.ResourceUtils;
 @Stable
 public abstract class Resource implements Comparable<Resource> {
 
-  protected static final String MEMORY = ResourceInformation.MEMORY_MB.getName();
-  protected static final String VCORES = ResourceInformation.VCORES.getName();
+  protected ResourceInformation[] resources = null;
+
+  // Number of mandatory resources, this is added to avoid invoke
+  // MandatoryResources.values().length, since values() internally will
+  // copy array, etc.
+  protected static final int NUM_MANDATORY_RESOURCES = 2;
+
+  protected static final int MEMORY_INDEX = 0;
+  protected static final int VCORES_INDEX = 1;
 
   @Public
   @Stable
@@ -71,7 +78,7 @@ public abstract class Resource implements Comparable<Resource> {
       ret.setVirtualCores(vCores);
       return ret;
     }
-    return new BaseResource(memory, vCores);
+    return new LightWeightResource(memory, vCores);
   }
 
   @Public
@@ -83,7 +90,7 @@ public abstract class Resource implements Comparable<Resource> {
       ret.setVirtualCores(vCores);
       return ret;
     }
-    return new BaseResource(memory, vCores);
+    return new LightWeightResource(memory, vCores);
   }
 
   @InterfaceAudience.Private
@@ -201,7 +208,9 @@ public abstract class Resource implements Comparable<Resource> {
    */
   @Public
   @Evolving
-  public abstract ResourceInformation[] getResources();
+  public ResourceInformation[] getResources() {
+    return resources;
+  }
 
   /**
    * Get ResourceInformation for a specified resource.
@@ -215,7 +224,6 @@ public abstract class Resource implements Comparable<Resource> {
   public ResourceInformation getResourceInformation(String resource)
       throws ResourceNotFoundException {
     Integer index = ResourceUtils.getResourceTypeIndex().get(resource);
-    ResourceInformation[] resources = getResources();
     if (index != null) {
       return resources[index];
     }
@@ -236,12 +244,13 @@ public abstract class Resource implements Comparable<Resource> {
   @Evolving
   public ResourceInformation getResourceInformation(int index)
       throws ResourceNotFoundException {
-    ResourceInformation[] resources = getResources();
-    if (index < 0 || index >= resources.length) {
-      throw new ResourceNotFoundException("Unknown resource at index '" + index
-          + "'. Vaid resources are: " + Arrays.toString(resources));
+    ResourceInformation ri = null;
+    try {
+      ri = resources[index];
+    } catch (ArrayIndexOutOfBoundsException e) {
+      throwExceptionWhenArrayOutOfBound(index);
     }
-    return resources[index];
+    return ri;
   }
 
   /**
@@ -271,11 +280,11 @@ public abstract class Resource implements Comparable<Resource> {
   public void setResourceInformation(String resource,
       ResourceInformation resourceInformation)
       throws ResourceNotFoundException {
-    if (resource.equals(MEMORY)) {
+    if (resource.equals(ResourceInformation.MEMORY_URI)) {
       this.setMemorySize(resourceInformation.getValue());
       return;
     }
-    if (resource.equals(VCORES)) {
+    if (resource.equals(ResourceInformation.VCORES_URI)) {
       this.setVirtualCores((int) resourceInformation.getValue());
       return;
     }
@@ -298,7 +307,6 @@ public abstract class Resource implements Comparable<Resource> {
   public void setResourceInformation(int index,
       ResourceInformation resourceInformation)
       throws ResourceNotFoundException {
-    ResourceInformation[] resources = getResources();
     if (index < 0 || index >= resources.length) {
       throw new ResourceNotFoundException("Unknown resource at index '" + index
           + "'. Valid resources are " + Arrays.toString(resources));
@@ -318,11 +326,11 @@ public abstract class Resource implements Comparable<Resource> {
   @Evolving
   public void setResourceValue(String resource, long value)
       throws ResourceNotFoundException {
-    if (resource.equals(MEMORY)) {
+    if (resource.equals(ResourceInformation.MEMORY_URI)) {
       this.setMemorySize(value);
       return;
     }
-    if (resource.equals(VCORES)) {
+    if (resource.equals(ResourceInformation.VCORES_URI)) {
       this.setVirtualCores((int)value);
       return;
     }
@@ -346,27 +354,21 @@ public abstract class Resource implements Comparable<Resource> {
   @Evolving
   public void setResourceValue(int index, long value)
       throws ResourceNotFoundException {
-    ResourceInformation[] resources = getResources();
-    if (index < 0 || index >= resources.length) {
-      throw new ResourceNotFoundException("Unknown resource at index '" + index
-          + "'. Valid resources are " + Arrays.toString(resources));
+    try {
+      resources[index].setValue(value);
+    } catch (ArrayIndexOutOfBoundsException e) {
+      throwExceptionWhenArrayOutOfBound(index);
     }
-    resources[index].setValue(value);
   }
 
-  @Override
-  public int hashCode() {
-    final int prime = 263167;
-
-    int result = (int) (939769357
-        + getMemorySize()); // prime * result = 939769357 initially
-    result = prime * result + getVirtualCores();
-    for (ResourceInformation entry : getResources()) {
-      if (!entry.getName().equals(MEMORY) && !entry.getName().equals(VCORES)) {
-        result = prime * result + entry.hashCode();
-      }
-    }
-    return result;
+  private void throwExceptionWhenArrayOutOfBound(int index) {
+    String exceptionMsg = String.format(
+        "Trying to access ResourceInformation for given index=%d. "
+            + "Acceptable index range is [0,%d), please check double check "
+            + "configured resources in resource-types.xml",
+        index, ResourceUtils.getNumberOfKnownResourceTypes());
+
+    throw new ResourceNotFoundException(exceptionMsg);
   }
 
   @Override
@@ -381,20 +383,15 @@ public abstract class Resource implements Comparable<Resource> {
       return false;
     }
     Resource other = (Resource) obj;
-    if (getMemorySize() != other.getMemorySize()
-        || getVirtualCores() != other.getVirtualCores()) {
-      return false;
-    }
 
-    ResourceInformation[] myVectors = getResources();
     ResourceInformation[] otherVectors = other.getResources();
 
-    if (myVectors.length != otherVectors.length) {
+    if (resources.length != otherVectors.length) {
       return false;
     }
 
-    for (int i = 0; i < myVectors.length; i++) {
-      ResourceInformation a = myVectors[i];
+    for (int i = 0; i < resources.length; i++) {
+      ResourceInformation a = resources[i];
       ResourceInformation b = otherVectors[i];
       if ((a != b) && ((a == null) || !a.equals(b))) {
         return false;
@@ -404,63 +401,70 @@ public abstract class Resource implements Comparable<Resource> {
   }
 
   @Override
+  public int compareTo(Resource other) {
+    ResourceInformation[] otherResources = other.getResources();
+
+    int arrLenThis = this.resources.length;
+    int arrLenOther = otherResources.length;
+
+    // compare memory and vcores first(in that order) to preserve
+    // existing behaviour
+    for (int i = 0; i < arrLenThis; i++) {
+      ResourceInformation otherEntry;
+      try {
+        otherEntry = otherResources[i];
+      } catch (ArrayIndexOutOfBoundsException e) {
+        // For two vectors with different size and same prefix. Shorter vector
+        // goes first.
+        return 1;
+      }
+      ResourceInformation entry = resources[i];
+
+      long diff = entry.compareTo(otherEntry);
+      if (diff > 0) {
+        return 1;
+      } else if (diff < 0) {
+        return -1;
+      }
+    }
+
+    if (arrLenThis < arrLenOther) {
+      return -1;
+    }
+
+    return 0;
+  }
+
+  @Override
   public String toString() {
     StringBuilder sb = new StringBuilder();
+
     sb.append("<memory:").append(getMemorySize()).append(", vCores:")
         .append(getVirtualCores());
-    for (ResourceInformation entry : getResources()) {
-      if (entry.getName().equals(MEMORY)
-          && entry.getUnits()
-          .equals(ResourceInformation.MEMORY_MB.getUnits())) {
-        continue;
-      }
-      if (entry.getName().equals(VCORES)
-          && entry.getUnits()
-          .equals(ResourceInformation.VCORES.getUnits())) {
+
+    for (int i = 2; i < resources.length; i++) {
+      ResourceInformation ri = resources[i];
+      if (ri.getValue() == 0) {
         continue;
       }
-      sb.append(", ").append(entry.getName()).append(": ")
-          .append(entry.getValue())
-          .append(entry.getUnits());
+      sb.append(", ");
+      sb.append(ri.getName()).append(": ")
+          .append(ri.getValue());
+      sb.append(ri.getUnits());
     }
+
     sb.append(">");
     return sb.toString();
   }
 
   @Override
-  public int compareTo(Resource other) {
-    ResourceInformation[] thisResources = this.getResources();
-    ResourceInformation[] otherResources = other.getResources();
-
-    // compare memory and vcores first(in that order) to preserve
-    // existing behaviour
-    long diff = this.getMemorySize() - other.getMemorySize();
-    if (diff == 0) {
-      diff = this.getVirtualCores() - other.getVirtualCores();
-    }
-    if (diff == 0) {
-      diff = thisResources.length - otherResources.length;
-      if (diff == 0) {
-        int maxLength = ResourceUtils.getResourceTypesArray().length;
-        for (int i = 0; i < maxLength; i++) {
-          // For memory and vcores, we can skip the loop as it's already
-          // compared.
-          if (i < 2) {
-            continue;
-          }
-
-          ResourceInformation entry = thisResources[i];
-          ResourceInformation otherEntry = otherResources[i];
-          if (entry.getName().equals(otherEntry.getName())) {
-            diff = entry.compareTo(otherEntry);
-            if (diff != 0) {
-              break;
-            }
-          }
-        }
-      }
+  public int hashCode() {
+    final int prime = 47;
+    long result = 0;
+    for (ResourceInformation entry : resources) {
+      result = prime * result + entry.hashCode();
     }
-    return Long.compare(diff, 0);
+    return (int) result;
   }
 
   /**
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceInformation.java
index 4717d82..0cc1e9c 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceInformation.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceInformation.java
@@ -18,6 +18,7 @@
 
 package org.apache.hadoop.yarn.api.records;
 
+import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.yarn.api.protocolrecords.ResourceTypes;
 import org.apache.hadoop.yarn.util.UnitsConversionUtil;
 
@@ -34,8 +35,8 @@ public class ResourceInformation implements Comparable<ResourceInformation> {
   private long minimumAllocation;
   private long maximumAllocation;
 
-  private static final String MEMORY_URI = "memory-mb";
-  private static final String VCORES_URI = "vcores";
+  public static final String MEMORY_URI = "memory-mb";
+  public static final String VCORES_URI = "vcores";
 
   public static final ResourceInformation MEMORY_MB =
       ResourceInformation.newInstance(MEMORY_URI, "Mi");
@@ -84,6 +85,16 @@ public class ResourceInformation implements Comparable<ResourceInformation> {
   }
 
   /**
+   * Checking if a unit included by KNOWN_UNITS is an expensive operation. This
+   * can be avoided in critical path in RM.
+   * @param rUnits units for the resource
+   */
+  @InterfaceAudience.Private
+  public void setUnitsWithoutValidation(String rUnits) {
+    this.units = rUnits;
+  }
+
+  /**
    * Get the resource type.
    *
    * @return the resource type
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/BaseResource.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
similarity index 55%
rename from hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/BaseResource.java
rename to hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
index b5cc4d6..b80e133 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/BaseResource.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
@@ -18,19 +18,24 @@
 
 package org.apache.hadoop.yarn.api.records.impl;
 
-import org.apache.hadoop.classification.InterfaceAudience.Public;
+import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceStability.Unstable;
+import org.apache.hadoop.yarn.api.protocolrecords.ResourceTypes;
 import org.apache.hadoop.yarn.api.records.Resource;
 import org.apache.hadoop.yarn.api.records.ResourceInformation;
 
-import java.util.Arrays;
+import static org.apache.hadoop.yarn.api.records.ResourceInformation.MEMORY_MB;
+import static org.apache.hadoop.yarn.api.records.ResourceInformation.MEMORY_URI;
+import static org.apache.hadoop.yarn.api.records.ResourceInformation.VCORES_URI;
 
 /**
  * <p>
- * <code>BaseResource</code> extends Resource to handle base resources such
+ * <code>LightResource</code> extends Resource to handle base resources such
  * as memory and CPU.
  * TODO: We have a long term plan to use AbstractResource when additional
  * resource types are to be handled as well.
+ * This will be used to speed up internal calculation to avoid creating
+ * costly PB-backed Resource object: <code>ResourcePBImpl</code>
  * </p>
  *
  * <p>
@@ -54,48 +59,34 @@ import java.util.Arrays;
  *
  * @see Resource
  */
-@Public
+@InterfaceAudience.Private
 @Unstable
-public class BaseResource extends Resource {
+public class LightWeightResource extends Resource {
 
   private ResourceInformation memoryResInfo;
   private ResourceInformation vcoresResInfo;
-  protected ResourceInformation[] resources = null;
-  protected ResourceInformation[] readOnlyResources = null;
 
-  // Number of mandatory resources, this is added to avoid invoke
-  // MandatoryResources.values().length, since values() internally will
-  // copy array, etc.
-  private static final int NUM_MANDATORY_RESOURCES = 2;
+  public LightWeightResource(long memory, long vcores) {
+    this.memoryResInfo = LightWeightResource.newDefaultInformation(MEMORY_URI,
+        MEMORY_MB.getUnits(), memory);
+    this.vcoresResInfo = LightWeightResource.newDefaultInformation(VCORES_URI,
+        "", vcores);
 
-  protected enum MandatoryResources {
-    MEMORY(0), VCORES(1);
-
-    private final int id;
-
-    MandatoryResources(int id) {
-      this.id = id;
-    }
-
-    public int getId() {
-      return this.id;
-    }
-  }
-
-  public BaseResource() {
-    // Base constructor.
+    resources = new ResourceInformation[NUM_MANDATORY_RESOURCES];
+    resources[MEMORY_INDEX] = memoryResInfo;
+    resources[VCORES_INDEX] = vcoresResInfo;
   }
 
-  public BaseResource(long memory, long vcores) {
-    this.memoryResInfo = ResourceInformation.newInstance(MEMORY,
-        ResourceInformation.MEMORY_MB.getUnits(), memory);
-    this.vcoresResInfo = ResourceInformation.newInstance(VCORES, "", vcores);
-
-    resources = new ResourceInformation[NUM_MANDATORY_RESOURCES];
-    readOnlyResources = new ResourceInformation[NUM_MANDATORY_RESOURCES];
-    resources[MandatoryResources.MEMORY.id] = memoryResInfo;
-    resources[MandatoryResources.VCORES.id] = vcoresResInfo;
-    readOnlyResources = Arrays.copyOf(resources, resources.length);
+  private static ResourceInformation newDefaultInformation(String name,
+      String unit, long value) {
+    ResourceInformation ri = new ResourceInformation();
+    ri.setName(name);
+    ri.setValue(value);
+    ri.setResourceType(ResourceTypes.COUNTABLE);
+    ri.setUnitsWithoutValidation(unit);
+    ri.setMinimumAllocation(0);
+    ri.setMaximumAllocation(Long.MAX_VALUE);
+    return ri;
   }
 
   @Override
@@ -131,7 +122,42 @@ public class BaseResource extends Resource {
   }
 
   @Override
-  public ResourceInformation[] getResources() {
-    return readOnlyResources;
+  public boolean equals(Object obj) {
+    if (this == obj) {
+      return true;
+    }
+    if (obj == null || !(obj instanceof Resource)) {
+      return false;
+    }
+    Resource other = (Resource) obj;
+    if (getMemorySize() != other.getMemorySize()
+        || getVirtualCores() != other.getVirtualCores()) {
+      return false;
+    }
+
+    return true;
+  }
+
+  @Override
+  public int compareTo(Resource other) {
+    // compare memory and vcores first(in that order) to preserve
+    // existing behaviour
+    long diff = this.getMemorySize() - other.getMemorySize();
+    if (diff == 0) {
+      return this.getVirtualCores() - other.getVirtualCores();
+    } else if (diff > 0){
+      return 1;
+    } else {
+      return -1;
+    }
+  }
+
+  @Override
+  public int hashCode() {
+    final int prime = 47;
+    long result = prime + getMemorySize();
+    result = prime * result + getVirtualCores();
+
+    return (int) result;
   }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
index e3e25d1..110453a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
@@ -244,13 +244,28 @@ public class ResourceUtils {
                 minimumAllocation, maximumAllocation));
       }
     }
+
     checkMandatoryResources(resourceInformationMap);
     addMandatoryResources(resourceInformationMap);
+
     setMinimumAllocationForMandatoryResources(resourceInformationMap, conf);
     setMaximumAllocationForMandatoryResources(resourceInformationMap, conf);
+
+    initializeResourcesFromResourceInformationMap(resourceInformationMap);
+  }
+
+  /**
+   * This method is visible for testing, unit test can construct a
+   * resourceInformationMap and pass it to this method to initialize multiple resources.
+   * @param resourceInformationMap constructed resource information map.
+   */
+  @VisibleForTesting
+  public static void initializeResourcesFromResourceInformationMap(
+      Map<String, ResourceInformation> resourceInformationMap) {
     resourceTypes = Collections.unmodifiableMap(resourceInformationMap);
     updateKnownResources();
     updateResourceTypeIndex();
+    initializedResources = true;
   }
 
   private static void updateKnownResources() {
@@ -347,14 +362,12 @@ public class ResourceUtils {
           try {
             addResourcesFileToConf(resourceFile, conf);
             LOG.debug("Found " + resourceFile + ", adding to configuration");
-            initializeResourcesMap(conf);
-            initializedResources = true;
           } catch (FileNotFoundException fe) {
             LOG.info("Unable to find '" + resourceFile
                 + "'. Falling back to memory and vcores as resources.");
-            initializeResourcesMap(conf);
-            initializedResources = true;
           }
+          initializeResourcesMap(conf);
+
         }
       }
     }
@@ -558,7 +571,7 @@ public class ResourceUtils {
    */
   public static String getDefaultUnit(String resourceType) {
     ResourceInformation ri = getResourceTypes().get(resourceType);
-    if (null != ri) {
+    if (ri != null) {
       return ri.getUnits();
     }
     return "";
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java
index 9f34bec..06c30ff 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java
@@ -25,7 +25,6 @@ import org.apache.hadoop.classification.InterfaceStability.Unstable;
 import org.apache.hadoop.yarn.api.protocolrecords.ResourceTypes;
 import org.apache.hadoop.yarn.api.records.Resource;
 import org.apache.hadoop.yarn.api.records.ResourceInformation;
-import org.apache.hadoop.yarn.api.records.impl.BaseResource;
 import org.apache.hadoop.yarn.exceptions.ResourceNotFoundException;
 import org.apache.hadoop.yarn.exceptions.YarnRuntimeException;
 import org.apache.hadoop.yarn.proto.YarnProtos.ResourceProto;
@@ -34,13 +33,12 @@ import org.apache.hadoop.yarn.proto.YarnProtos.ResourceInformationProto;
 import org.apache.hadoop.yarn.util.UnitsConversionUtil;
 import org.apache.hadoop.yarn.util.resource.ResourceUtils;
 
-import java.util.Arrays;
 import java.util.Map;
 
 
 @Private
 @Unstable
-public class ResourcePBImpl extends BaseResource {
+public class ResourcePBImpl extends Resource {
 
   private static final Log LOG = LogFactory.getLog(ResourcePBImpl.class);
 
@@ -95,7 +93,7 @@ public class ResourcePBImpl extends BaseResource {
   @Override
   public long getMemorySize() {
     // memory should always be present
-    ResourceInformation ri = resources[MandatoryResources.MEMORY.getId()];
+    ResourceInformation ri = resources[MEMORY_INDEX];
 
     if (ri.getUnits().isEmpty()) {
       return ri.getValue();
@@ -113,19 +111,19 @@ public class ResourcePBImpl extends BaseResource {
   @Override
   public void setMemorySize(long memory) {
     maybeInitBuilder();
-    getResourceInformation(MEMORY).setValue(memory);
+    getResourceInformation(ResourceInformation.MEMORY_URI).setValue(memory);
   }
 
   @Override
   public int getVirtualCores() {
     // vcores should always be present
-    return (int) resources[MandatoryResources.VCORES.getId()].getValue();
+    return (int) resources[VCORES_INDEX].getValue();
   }
 
   @Override
   public void setVirtualCores(int vCores) {
     maybeInitBuilder();
-    getResourceInformation(VCORES).setValue(vCores);
+    getResourceInformation(ResourceInformation.VCORES_URI).setValue(vCores);
   }
 
   private void initResources() {
@@ -156,7 +154,6 @@ public class ResourcePBImpl extends BaseResource {
         resources[index].setValue(value);
       }
     }
-    readOnlyResources = Arrays.copyOf(resources, resources.length);
     this.setMemorySize(p.getMemory());
     this.setVirtualCores(p.getVirtualCores());
   }
@@ -187,11 +184,6 @@ public class ResourcePBImpl extends BaseResource {
   }
 
   @Override
-  public ResourceInformation[] getResources() {
-    return super.getResources();
-  }
-
-  @Override
   public ResourceInformation getResourceInformation(String resource)
       throws ResourceNotFoundException {
     return super.getResourceInformation(resource);
@@ -212,7 +204,6 @@ public class ResourcePBImpl extends BaseResource {
       }
 
       resources = new ResourceInformation[types.length];
-      readOnlyResources = new ResourceInformation[types.length];
       for (ResourceInformation entry : types) {
         int index = ResourceUtils.getResourceTypeIndex().get(entry.getName());
         resources[index] = ResourceInformation.newInstance(entry);
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
index ffd4fec..d64f03e 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java
@@ -73,7 +73,7 @@ public class DominantResourceCalculator extends ResourceCalculator {
     boolean rhsGreater = false;
     int ret = 0;
 
-    int maxLength = ResourceUtils.getResourceTypesArray().length;
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
     for (int i = 0; i < maxLength; i++) {
       ResourceInformation lhsResourceInformation = lhs
           .getResourceInformation(i);
@@ -111,10 +111,12 @@ public class DominantResourceCalculator extends ResourceCalculator {
     // resources and then look for which resource has the biggest
     // share overall.
     ResourceInformation[] clusterRes = clusterResource.getResources();
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
+
     // If array creation shows up as a time sink, these arrays could be cached
     // because they're always the same length.
-    double[] lhsShares = new double[clusterRes.length];
-    double[] rhsShares = new double[clusterRes.length];
+    double[] lhsShares = new double[maxLength];
+    double[] rhsShares = new double[maxLength];
     double diff;
 
     try {
@@ -124,10 +126,10 @@ public class DominantResourceCalculator extends ResourceCalculator {
         calculateShares(clusterRes, lhs, rhs, lhsShares, rhsShares, max);
 
         diff = max[0] - max[1];
-      } else if (clusterRes.length == 2) {
+      } else if (maxLength == 2) {
         // Special case to handle the common scenario of only CPU and memory
         // so that we can optimize for performance
-        diff = calculateSharesForMandatoryResources(clusterRes, lhs, rhs,
+        diff = calculateSharesForTwoMandatoryResources(clusterRes, lhs, rhs,
             lhsShares, rhsShares);
       } else {
         calculateShares(clusterRes, lhs, rhs, lhsShares, rhsShares);
@@ -182,7 +184,8 @@ public class DominantResourceCalculator extends ResourceCalculator {
     ResourceInformation[] firstRes = first.getResources();
     ResourceInformation[] secondRes = second.getResources();
 
-    for (int i = 0; i < clusterRes.length; i++) {
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
+    for (int i = 0; i < maxLength; i++) {
       firstShares[i] = calculateShare(clusterRes[i], firstRes[i]);
       secondShares[i] = calculateShare(clusterRes[i], secondRes[i]);
     }
@@ -205,35 +208,27 @@ public class DominantResourceCalculator extends ResourceCalculator {
    * second resource, respectively
    * @throws NullPointerException if any parameter is null
    */
-  private int calculateSharesForMandatoryResources(
+  private int calculateSharesForTwoMandatoryResources(
       ResourceInformation[] clusterRes, Resource first, Resource second,
       double[] firstShares, double[] secondShares) {
     ResourceInformation[] firstRes = first.getResources();
     ResourceInformation[] secondRes = second.getResources();
+    firstShares[0] = calculateShare(clusterRes[0], firstRes[0]);
+    secondShares[0] = calculateShare(clusterRes[0], secondRes[0]);
+    firstShares[1] = calculateShare(clusterRes[1], firstRes[1]);
+    secondShares[1] = calculateShare(clusterRes[1], secondRes[1]);
+
     int firstDom = 0;
+    int firstSub = 1;
+    if (firstShares[1] > firstShares[0]) {
+      firstDom = 1;
+      firstSub = 0;
+    }
     int secondDom = 0;
-    int firstSub = 0;
-    int secondSub = 0;
-
-    for (int i = 0; i < clusterRes.length; i++) {
-      firstShares[i] = calculateShare(clusterRes[i], firstRes[i]);
-      secondShares[i] = calculateShare(clusterRes[i], secondRes[i]);
-
-      if (firstShares[i] > firstShares[firstDom]) {
-        firstDom = i;
-      }
-
-      if (firstShares[i] < firstShares[firstSub]) {
-        firstSub = i;
-      }
-
-      if (secondShares[i] > secondShares[secondDom]) {
-        secondDom = i;
-      }
-
-      if (secondShares[i] < secondShares[secondSub]) {
-        secondSub = i;
-      }
+    int secondSub = 1;
+    if (secondShares[1] > secondShares[0]) {
+      secondDom = 1;
+      secondSub = 0;
     }
 
     if (firstShares[firstDom] > secondShares[secondDom]) {
@@ -280,7 +275,8 @@ public class DominantResourceCalculator extends ResourceCalculator {
     max[0] = 0.0;
     max[1] = 0.0;
 
-    for (int i = 0; i < clusterRes.length; i++) {
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
+    for (int i = 0; i < maxLength; i++) {
       firstShares[i] = calculateShare(clusterRes[i], firstRes[i]);
       secondShares[i] = calculateShare(clusterRes[i], secondRes[i]);
 
@@ -339,7 +335,7 @@ public class DominantResourceCalculator extends ResourceCalculator {
   public long computeAvailableContainers(Resource available,
       Resource required) {
     long min = Long.MAX_VALUE;
-    int maxLength = ResourceUtils.getResourceTypesArray().length;
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
     for (int i = 0; i < maxLength; i++) {
       ResourceInformation availableResource = available
           .getResourceInformation(i);
@@ -358,11 +354,12 @@ public class DominantResourceCalculator extends ResourceCalculator {
   @Override
   public float divide(Resource clusterResource,
       Resource numerator, Resource denominator) {
+    int nKnownResourceTypes = ResourceUtils.getNumberOfKnownResourceTypes();
     ResourceInformation[] clusterRes = clusterResource.getResources();
     // We have to provide the calculateShares() method with somewhere to store
     // the shares. We don't actually need these shares afterwards.
-    double[] numeratorShares = new double[clusterRes.length];
-    double[] denominatorShares = new double[clusterRes.length];
+    double[] numeratorShares = new double[nKnownResourceTypes];
+    double[] denominatorShares = new double[nKnownResourceTypes];
     // We also have to provide a place for calculateShares() to store the max
     // shares so that we can use them.
     double[] max = new double[2];
@@ -386,7 +383,7 @@ public class DominantResourceCalculator extends ResourceCalculator {
   @Override
   public float ratio(Resource a, Resource b) {
     float ratio = 0.0f;
-    int maxLength = ResourceUtils.getResourceTypesArray().length;
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
     for (int i = 0; i < maxLength; i++) {
       ResourceInformation aResourceInformation = a.getResourceInformation(i);
       ResourceInformation bResourceInformation = b.getResourceInformation(i);
@@ -407,7 +404,7 @@ public class DominantResourceCalculator extends ResourceCalculator {
 
   public Resource divideAndCeil(Resource numerator, long denominator) {
     Resource ret = Resource.newInstance(numerator);
-    int maxLength = ResourceUtils.getResourceTypesArray().length;
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
     for (int i = 0; i < maxLength; i++) {
       ResourceInformation resourceInformation = ret.getResourceInformation(i);
       resourceInformation
@@ -428,7 +425,7 @@ public class DominantResourceCalculator extends ResourceCalculator {
   public Resource normalize(Resource r, Resource minimumResource,
       Resource maximumResource, Resource stepFactor) {
     Resource ret = Resource.newInstance(r);
-    int maxLength = ResourceUtils.getResourceTypesArray().length;
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
     for (int i = 0; i < maxLength; i++) {
       ResourceInformation rResourceInformation = r.getResourceInformation(i);
       ResourceInformation minimumResourceInformation = minimumResource
@@ -474,7 +471,7 @@ public class DominantResourceCalculator extends ResourceCalculator {
 
   private Resource rounding(Resource r, Resource stepFactor, boolean roundUp) {
     Resource ret = Resource.newInstance(r);
-    int maxLength = ResourceUtils.getResourceTypesArray().length;
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
     for (int i = 0; i < maxLength; i++) {
       ResourceInformation rResourceInformation = r.getResourceInformation(i);
       ResourceInformation stepFactorResourceInformation = stepFactor
@@ -513,7 +510,7 @@ public class DominantResourceCalculator extends ResourceCalculator {
   private Resource multiplyAndNormalize(Resource r, double by,
       Resource stepFactor, boolean roundUp) {
     Resource ret = Resource.newInstance(r);
-    int maxLength = ResourceUtils.getResourceTypesArray().length;
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
     for (int i = 0; i < maxLength; i++) {
       ResourceInformation rResourceInformation = r.getResourceInformation(i);
       ResourceInformation stepFactorResourceInformation = stepFactor
@@ -542,7 +539,7 @@ public class DominantResourceCalculator extends ResourceCalculator {
 
   @Override
   public boolean fitsIn(Resource cluster, Resource smaller, Resource bigger) {
-    int maxLength = ResourceUtils.getResourceTypesArray().length;
+    int maxLength = ResourceUtils.getNumberOfKnownResourceTypes();
     for (int i = 0; i < maxLength; i++) {
       ResourceInformation sResourceInformation = smaller
           .getResourceInformation(i);
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java
index 1e2ce15..325bce4 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java
@@ -24,12 +24,9 @@ import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceStability.Unstable;
 import org.apache.hadoop.yarn.api.records.Resource;
 import org.apache.hadoop.yarn.api.records.ResourceInformation;
-import org.apache.hadoop.yarn.api.records.impl.BaseResource;
 import org.apache.hadoop.yarn.exceptions.ResourceNotFoundException;
 import org.apache.hadoop.yarn.util.UnitsConversionUtil;
 
-import java.util.Arrays;
-
 /**
  * Resources is a computation class which provides a set of apis to do
  * mathematical operations on Resource object.
@@ -45,9 +42,11 @@ public class Resources {
    * Helper class to create a resource with a fixed value for all resource
    * types. For example, a NONE resource which returns 0 for any resource type.
    */
-  static class FixedValueResource extends BaseResource {
+  @InterfaceAudience.Private
+  @Unstable
+  static class FixedValueResource extends Resource {
 
-    private long resourceValue;
+    private final long resourceValue;
     private String name;
 
     /**
@@ -101,6 +100,19 @@ public class Resources {
     }
 
     @Override
+    public void setResourceInformation(int index,
+        ResourceInformation resourceInformation)
+        throws ResourceNotFoundException {
+      throw new RuntimeException(name + " cannot be modified!");
+    }
+
+    @Override
+    public void setResourceValue(int index, long value)
+        throws ResourceNotFoundException {
+      throw new RuntimeException(name + " cannot be modified!");
+    }
+
+    @Override
     public void setResourceInformation(String resource,
         ResourceInformation resourceInformation)
         throws ResourceNotFoundException {
@@ -117,19 +129,11 @@ public class Resources {
       ResourceInformation[] types = ResourceUtils.getResourceTypesArray();
       if (types != null) {
         resources = new ResourceInformation[types.length];
-        readOnlyResources = new ResourceInformation[types.length];
         for (int index = 0; index < types.length; index++) {
           resources[index] = ResourceInformation.newInstance(types[index]);
           resources[index].setValue(resourceValue);
-
-          // this is a fix for getVirtualCores returning an int
-          if (resourceValue > Integer.MAX_VALUE && ResourceInformation.VCORES
-              .getName().equals(resources[index].getName())) {
-            resources[index].setValue((long) Integer.MAX_VALUE);
-          }
         }
       }
-      readOnlyResources = Arrays.copyOf(resources, resources.length);
     }
   }
 
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
index a5550a7..d6bab92 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceUtils.java
@@ -37,6 +37,8 @@ import java.util.Map;
  * Test class to verify all resource utility methods.
  */
 public class TestResourceUtils {
+  public static final String TEST_CONF_RESET_RESOURCE_TYPES =
+      "yarn.test.reset-resource-types";
 
   static class ResourceFileInformation {
     String filename;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
index fd6b1e9..d09be8b 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
@@ -1814,8 +1814,6 @@ public class RMAppAttemptImpl implements RMAppAttempt, Recoverable {
       if (newTrackingUrl != null &&
           !newTrackingUrl.equals(appAttempt.originalTrackingUrl)) {
         appAttempt.originalTrackingUrl = newTrackingUrl;
-        AggregateAppResourceUsage resUsage =
-            appAttempt.attemptMetrics.getAggregateAppResourceUsage();
         ApplicationAttemptStateData attemptState = ApplicationAttemptStateData
             .newInstance(appAttempt.applicationAttemptId,
                 appAttempt.getMasterContainer(),
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
index 2d7f3f6..0754584 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
@@ -105,6 +105,7 @@ import org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSe
 import org.apache.hadoop.yarn.util.Records;
 import org.apache.hadoop.yarn.util.YarnVersionInfo;
 import org.apache.hadoop.yarn.util.resource.ResourceUtils;
+import org.apache.hadoop.yarn.util.resource.TestResourceUtils;
 import org.apache.log4j.Level;
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
@@ -151,7 +152,10 @@ public class MockRM extends ResourceManager {
   public MockRM(Configuration conf, RMStateStore store,
       boolean useNullRMNodeLabelsManager, boolean useRealElector) {
     super();
-    ResourceUtils.resetResourceTypes(conf);
+    if (conf.getBoolean(TestResourceUtils.TEST_CONF_RESET_RESOURCE_TYPES,
+        true)) {
+      ResourceUtils.resetResourceTypes(conf);
+    }
     this.useNullRMNodeLabelsManager = useNullRMNodeLabelsManager;
     this.useRealElector = useRealElector;
     init(conf instanceof YarnConfiguration ? conf : new YarnConfiguration(conf));
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
index d23ef59..bb14e1e 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
@@ -4341,143 +4341,6 @@ public class TestCapacityScheduler {
     rm.stop();
   }
 
-  @Test (timeout = 300000)
-  public void testUserLimitThroughput() throws Exception {
-    // Since this is more of a performance unit test, only run if
-    // RunUserLimitThroughput is set (-DRunUserLimitThroughput=true)
-    Assume.assumeTrue(Boolean.valueOf(
-        System.getProperty("RunUserLimitThroughput")));
-
-    CapacitySchedulerConfiguration csconf =
-        new CapacitySchedulerConfiguration();
-    csconf.setMaximumApplicationMasterResourcePerQueuePercent("root", 100.0f);
-    csconf.setMaximumAMResourcePercentPerPartition("root", "", 100.0f);
-    csconf.setMaximumApplicationMasterResourcePerQueuePercent("root.default",
-        100.0f);
-    csconf.setMaximumAMResourcePercentPerPartition("root.default", "", 100.0f);
-    csconf.setResourceComparator(DominantResourceCalculator.class);
-
-    YarnConfiguration conf = new YarnConfiguration(csconf);
-      conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,
-          ResourceScheduler.class);
-
-    MockRM rm = new MockRM(conf);
-    rm.start();
-
-    CapacityScheduler cs = (CapacityScheduler) rm.getResourceScheduler();
-    LeafQueue qb = (LeafQueue)cs.getQueue("default");
-
-    // For now make user limit large so we can activate all applications
-    qb.setUserLimitFactor((float)100.0);
-    qb.setupConfigurableCapacities();
-
-    SchedulerEvent addAppEvent;
-    SchedulerEvent addAttemptEvent;
-    Container container = mock(Container.class);
-    ApplicationSubmissionContext submissionContext =
-        mock(ApplicationSubmissionContext.class);
-
-    final int appCount = 100;
-    ApplicationId[] appids = new ApplicationId[appCount];
-    RMAppAttemptImpl[] attempts = new RMAppAttemptImpl[appCount];
-    ApplicationAttemptId[] appAttemptIds = new ApplicationAttemptId[appCount];
-    RMAppImpl[] apps = new RMAppImpl[appCount];
-    RMAppAttemptMetrics[] attemptMetrics = new RMAppAttemptMetrics[appCount];
-    for (int i=0; i<appCount; i++) {
-      appids[i] = BuilderUtils.newApplicationId(100, i);
-      appAttemptIds[i] =
-      BuilderUtils.newApplicationAttemptId(appids[i], 1);
-
-      attemptMetrics[i] =
-          new RMAppAttemptMetrics(appAttemptIds[i], rm.getRMContext());
-      apps[i] = mock(RMAppImpl.class);
-      when(apps[i].getApplicationId()).thenReturn(appids[i]);
-      attempts[i] = mock(RMAppAttemptImpl.class);
-      when(attempts[i].getMasterContainer()).thenReturn(container);
-      when(attempts[i].getSubmissionContext()).thenReturn(submissionContext);
-      when(attempts[i].getAppAttemptId()).thenReturn(appAttemptIds[i]);
-      when(attempts[i].getRMAppAttemptMetrics()).thenReturn(attemptMetrics[i]);
-      when(apps[i].getCurrentAppAttempt()).thenReturn(attempts[i]);
-
-      rm.getRMContext().getRMApps().put(appids[i], apps[i]);
-      addAppEvent =
-          new AppAddedSchedulerEvent(appids[i], "default", "user1");
-      cs.handle(addAppEvent);
-      addAttemptEvent =
-          new AppAttemptAddedSchedulerEvent(appAttemptIds[i], false);
-      cs.handle(addAttemptEvent);
-    }
-
-    // add nodes  to cluster, so cluster has 20GB and 20 vcores
-    Resource newResource = Resource.newInstance(10 * GB, 10);
-    RMNode node = MockNodes.newNodeInfo(0, newResource, 1, "127.0.0.1");
-    cs.handle(new NodeAddedSchedulerEvent(node));
-
-    Resource newResource2 = Resource.newInstance(10 * GB, 10);
-    RMNode node2 = MockNodes.newNodeInfo(0, newResource2, 1, "127.0.0.2");
-    cs.handle(new NodeAddedSchedulerEvent(node2));
-
-    Priority u0Priority = TestUtils.createMockPriority(1);
-    RecordFactory recordFactory =
-        RecordFactoryProvider.getRecordFactory(null);
-
-    FiCaSchedulerApp[] fiCaApps = new FiCaSchedulerApp[appCount];
-    for (int i=0;i<appCount;i++) {
-      fiCaApps[i] =
-          cs.getSchedulerApplications().get(apps[i].getApplicationId())
-              .getCurrentAppAttempt();
-      // allocate container for app2 with 1GB memory and 1 vcore
-      fiCaApps[i].updateResourceRequests(Collections.singletonList(
-          TestUtils.createResourceRequest(ResourceRequest.ANY, 1*GB, 1, true,
-              u0Priority, recordFactory)));
-    }
-    // Now force everything to be over user limit
-    qb.setUserLimitFactor((float)0.0);
-
-    // Quiet the loggers while measuring throughput
-    for (Enumeration<?> loggers=LogManager.getCurrentLoggers();
-        loggers.hasMoreElements(); )  {
-      Logger logger = (Logger) loggers.nextElement();
-      logger.setLevel(Level.WARN);
-    }
-    final int topn = 20;
-    final int iterations = 2000000;
-    final int printInterval = 20000;
-    final float numerator = 1000.0f * printInterval;
-    PriorityQueue<Long> queue = new PriorityQueue<>(topn,
-        Collections.reverseOrder());
-
-    long n = Time.monotonicNow();
-    long timespent = 0;
-    for (int i = 0; i < iterations; i+=2) {
-      if (i > 0  && i % printInterval == 0){
-        long ts = (Time.monotonicNow() - n);
-        if (queue.size() < topn) {
-          queue.offer(ts);
-        } else {
-          Long last = queue.peek();
-          if (last > ts) {
-            queue.poll();
-            queue.offer(ts);
-          }
-        }
-        System.out.println(i + " " + (numerator / ts));
-        n= Time.monotonicNow();
-      }
-    cs.handle(new NodeUpdateSchedulerEvent(node));
-    cs.handle(new NodeUpdateSchedulerEvent(node2));
-    }
-    timespent=0;
-    int entries = queue.size();
-    while(queue.size() > 0){
-      long l = queue.poll();
-      timespent += l;
-    }
-    System.out.println("Avg of fastest " + entries + ": "
-        + numerator / (timespent / entries));
-    rm.stop();
-  }
-
   @Test
   public void testCSQueueBlocked() throws Exception {
     CapacitySchedulerConfiguration conf = new CapacitySchedulerConfiguration();
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerPerf.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerPerf.java
new file mode 100644
index 0000000..0837fd7
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerPerf.java
@@ -0,0 +1,265 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity;
+
+import org.apache.hadoop.util.Time;
+import org.apache.hadoop.yarn.api.protocolrecords.ResourceTypes;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ApplicationId;
+import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext;
+import org.apache.hadoop.yarn.api.records.Container;
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.factories.RecordFactory;
+import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
+import org.apache.hadoop.yarn.server.resourcemanager.MockNodes;
+import org.apache.hadoop.yarn.server.resourcemanager.MockRM;
+import org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl;
+import org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl;
+import org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics;
+import org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNode;
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceScheduler;
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp;
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAddedSchedulerEvent;
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent;
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeAddedSchedulerEvent;
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent;
+import org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent;
+import org.apache.hadoop.yarn.server.utils.BuilderUtils;
+import org.apache.hadoop.yarn.util.resource.DominantResourceCalculator;
+import org.apache.hadoop.yarn.util.resource.ResourceUtils;
+import org.apache.log4j.Level;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.junit.Assume;
+import org.junit.Test;
+
+import java.util.Collections;
+import java.util.Enumeration;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.PriorityQueue;
+
+import static org.apache.hadoop.yarn.util.resource.TestResourceUtils.TEST_CONF_RESET_RESOURCE_TYPES;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+public class TestCapacitySchedulerPerf {
+  private final int GB = 1024;
+
+  private String getResourceName(int idx) {
+    return "resource-" + idx;
+  }
+
+  private void testUserLimitThroughputWithNumberOfResourceTypes(
+      int numOfResourceTypes)
+      throws Exception {
+    if (numOfResourceTypes > 2) {
+      // Initialize resource map
+      Map<String, ResourceInformation> riMap = new HashMap<>();
+
+      // Initialize mandatory resources
+      riMap.put(ResourceInformation.MEMORY_URI, ResourceInformation.MEMORY_MB);
+      riMap.put(ResourceInformation.VCORES_URI, ResourceInformation.VCORES);
+
+      for (int i = 2; i < numOfResourceTypes; i++) {
+        String resourceName = getResourceName(i);
+        riMap.put(resourceName, ResourceInformation
+            .newInstance(resourceName, "", 0, ResourceTypes.COUNTABLE, 0,
+                Integer.MAX_VALUE));
+      }
+
+      ResourceUtils.initializeResourcesFromResourceInformationMap(riMap);
+    }
+
+    // Since this is more of a performance unit test, only run if
+    // RunUserLimitThroughput is set (-DRunUserLimitThroughput=true)
+    Assume.assumeTrue(Boolean.valueOf(
+        System.getProperty("RunCapacitySchedulerPerfTests")));
+
+    CapacitySchedulerConfiguration csconf =
+        new CapacitySchedulerConfiguration();
+    csconf.setMaximumApplicationMasterResourcePerQueuePercent("root", 100.0f);
+    csconf.setMaximumAMResourcePercentPerPartition("root", "", 100.0f);
+    csconf.setMaximumApplicationMasterResourcePerQueuePercent("root.default",
+        100.0f);
+    csconf.setMaximumAMResourcePercentPerPartition("root.default", "", 100.0f);
+    csconf.setResourceComparator(DominantResourceCalculator.class);
+
+    YarnConfiguration conf = new YarnConfiguration(csconf);
+    // Don't reset resource types since we have already configured resource types
+    conf.setBoolean(TEST_CONF_RESET_RESOURCE_TYPES, false);
+    conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,
+        ResourceScheduler.class);
+
+    MockRM rm = new MockRM(conf);
+    rm.start();
+
+    CapacityScheduler cs = (CapacityScheduler) rm.getResourceScheduler();
+    LeafQueue qb = (LeafQueue)cs.getQueue("default");
+
+    // For now make user limit large so we can activate all applications
+    qb.setUserLimitFactor((float)100.0);
+    qb.setupConfigurableCapacities();
+
+    SchedulerEvent addAppEvent;
+    SchedulerEvent addAttemptEvent;
+    Container container = mock(Container.class);
+    ApplicationSubmissionContext submissionContext =
+        mock(ApplicationSubmissionContext.class);
+
+    final int appCount = 100;
+    ApplicationId[] appids = new ApplicationId[appCount];
+    RMAppAttemptImpl[] attempts = new RMAppAttemptImpl[appCount];
+    ApplicationAttemptId[] appAttemptIds = new ApplicationAttemptId[appCount];
+    RMAppImpl[] apps = new RMAppImpl[appCount];
+    RMAppAttemptMetrics[] attemptMetrics = new RMAppAttemptMetrics[appCount];
+    for (int i=0; i<appCount; i++) {
+      appids[i] = BuilderUtils.newApplicationId(100, i);
+      appAttemptIds[i] =
+          BuilderUtils.newApplicationAttemptId(appids[i], 1);
+
+      attemptMetrics[i] =
+          new RMAppAttemptMetrics(appAttemptIds[i], rm.getRMContext());
+      apps[i] = mock(RMAppImpl.class);
+      when(apps[i].getApplicationId()).thenReturn(appids[i]);
+      attempts[i] = mock(RMAppAttemptImpl.class);
+      when(attempts[i].getMasterContainer()).thenReturn(container);
+      when(attempts[i].getSubmissionContext()).thenReturn(submissionContext);
+      when(attempts[i].getAppAttemptId()).thenReturn(appAttemptIds[i]);
+      when(attempts[i].getRMAppAttemptMetrics()).thenReturn(attemptMetrics[i]);
+      when(apps[i].getCurrentAppAttempt()).thenReturn(attempts[i]);
+
+      rm.getRMContext().getRMApps().put(appids[i], apps[i]);
+      addAppEvent =
+          new AppAddedSchedulerEvent(appids[i], "default", "user1");
+      cs.handle(addAppEvent);
+      addAttemptEvent =
+          new AppAttemptAddedSchedulerEvent(appAttemptIds[i], false);
+      cs.handle(addAttemptEvent);
+    }
+
+    // add nodes  to cluster, so cluster has 20GB and 20 vcores
+    Resource nodeResource = Resource.newInstance(10 * GB, 10);
+    if (numOfResourceTypes > 2) {
+      for (int i = 2; i < numOfResourceTypes; i++) {
+        nodeResource.setResourceValue(getResourceName(i), 10);
+      }
+    }
+
+    RMNode node = MockNodes.newNodeInfo(0, nodeResource, 1, "127.0.0.1");
+    cs.handle(new NodeAddedSchedulerEvent(node));
+
+    RMNode node2 = MockNodes.newNodeInfo(0, nodeResource, 1, "127.0.0.2");
+    cs.handle(new NodeAddedSchedulerEvent(node2));
+
+    Priority u0Priority = TestUtils.createMockPriority(1);
+    RecordFactory recordFactory =
+        RecordFactoryProvider.getRecordFactory(null);
+
+    FiCaSchedulerApp[] fiCaApps = new FiCaSchedulerApp[appCount];
+    for (int i=0;i<appCount;i++) {
+      fiCaApps[i] =
+          cs.getSchedulerApplications().get(apps[i].getApplicationId())
+              .getCurrentAppAttempt();
+
+      ResourceRequest resourceRequest = TestUtils.createResourceRequest(
+          ResourceRequest.ANY, 1 * GB, 1, true, u0Priority, recordFactory);
+      if (numOfResourceTypes > 2) {
+        for (int j = 2; j < numOfResourceTypes; j++) {
+          resourceRequest.getCapability().setResourceValue(getResourceName(j),
+              10);
+        }
+      }
+
+      // allocate container for app2 with 1GB memory and 1 vcore
+      fiCaApps[i].updateResourceRequests(
+          Collections.singletonList(resourceRequest));
+    }
+    // Now force everything to be over user limit
+    qb.setUserLimitFactor((float)0.0);
+
+    // Quiet the loggers while measuring throughput
+    for (Enumeration<?> loggers = LogManager.getCurrentLoggers();
+         loggers.hasMoreElements(); )  {
+      Logger logger = (Logger) loggers.nextElement();
+      logger.setLevel(Level.WARN);
+    }
+    final int topn = 20;
+    final int iterations = 2000000;
+    final int printInterval = 20000;
+    final float numerator = 1000.0f * printInterval;
+    PriorityQueue<Long> queue = new PriorityQueue<>(topn,
+        Collections.reverseOrder());
+
+    long n = Time.monotonicNow();
+    long timespent = 0;
+    for (int i = 0; i < iterations; i+=2) {
+      if (i > 0  && i % printInterval == 0){
+        long ts = (Time.monotonicNow() - n);
+        if (queue.size() < topn) {
+          queue.offer(ts);
+        } else {
+          Long last = queue.peek();
+          if (last > ts) {
+            queue.poll();
+            queue.offer(ts);
+          }
+        }
+        System.out.println(i + " " + (numerator / ts));
+        n= Time.monotonicNow();
+      }
+      cs.handle(new NodeUpdateSchedulerEvent(node));
+      cs.handle(new NodeUpdateSchedulerEvent(node2));
+    }
+    timespent=0;
+    int entries = queue.size();
+    while(queue.size() > 0){
+      long l = queue.poll();
+      timespent += l;
+    }
+    System.out.println(
+        "#ResourceTypes = " + numOfResourceTypes + ". Avg of fastest " + entries
+            + ": " + numerator / (timespent / entries));
+    rm.stop();
+  }
+
+  @Test(timeout = 300000)
+  public void testUserLimitThroughputForTwoResources() throws Exception {
+    testUserLimitThroughputWithNumberOfResourceTypes(2);
+  }
+
+  @Test(timeout = 300000)
+  public void testUserLimitThroughputForThreeResources() throws Exception {
+    testUserLimitThroughputWithNumberOfResourceTypes(3);
+  }
+
+  @Test(timeout = 300000)
+  public void testUserLimitThroughputForFourResources() throws Exception {
+    testUserLimitThroughputWithNumberOfResourceTypes(4);
+  }
+
+  @Test(timeout = 300000)
+  public void testUserLimitThroughputForFiveResources() throws Exception {
+    testUserLimitThroughputWithNumberOfResourceTypes(5);
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
index 75b7bff..611cfcc 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
@@ -102,6 +102,8 @@ import org.apache.hadoop.yarn.webapp.util.WebAppUtils;
 
 import com.google.common.annotations.VisibleForTesting;
 
+import static org.apache.hadoop.yarn.util.resource.TestResourceUtils.TEST_CONF_RESET_RESOURCE_TYPES;
+
 /**
  * <p>
  * Embedded Yarn minicluster for testcases that need to interact with a cluster.
@@ -261,7 +263,10 @@ public class MiniYARNCluster extends CompositeService {
         YarnConfiguration.DEFAULT_YARN_MINICLUSTER_USE_RPC);
     failoverTimeout = conf.getInt(YarnConfiguration.RM_ZK_TIMEOUT_MS,
         YarnConfiguration.DEFAULT_RM_ZK_TIMEOUT_MS);
-    ResourceUtils.resetResourceTypes(conf);
+
+    if (conf.getBoolean(TEST_CONF_RESET_RESOURCE_TYPES, true)) {
+      ResourceUtils.resetResourceTypes(conf);
+    }
 
     if (useRpc && !useFixedPorts) {
       throw new YarnRuntimeException("Invalid configuration!" +


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 09/20: YARN-7396. NPE when accessing container logs due to null dirsHandler. Contributed by Jonathan Hung

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 3d5a65211b8830b3c7d821612db35bf6f4409020
Author: Jian He <ji...@apache.org>
AuthorDate: Wed Nov 1 17:00:32 2017 -0700

    YARN-7396. NPE when accessing container logs due to null dirsHandler. Contributed by Jonathan Hung
---
 .../java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java    | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
index c74b54e..536ac3a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
@@ -371,6 +371,8 @@ public class NodeManager extends CompositeService
     
     this.aclsManager = new ApplicationACLsManager(conf);
 
+    this.dirsHandler = new LocalDirsHandlerService(metrics);
+
     boolean isDistSchedulingEnabled =
         conf.getBoolean(YarnConfiguration.DIST_SCHEDULING_ENABLED,
             YarnConfiguration.DEFAULT_DIST_SCHEDULING_ENABLED);
@@ -394,7 +396,6 @@ public class NodeManager extends CompositeService
     // NodeManager level dispatcher
     this.dispatcher = createNMDispatcher();
 
-    dirsHandler = new LocalDirsHandlerService(metrics);
     nodeHealthChecker =
         new NodeHealthCheckerService(
             getNodeHealthScriptRunner(conf), dirsHandler);


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 20/20: YARN-9271. Backport YARN-6927 for resource type support in MapReduce

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 9c6dbd827f3bb7288322c4e7c1f422d0a0b13724
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Wed Mar 20 17:46:35 2019 -0700

    YARN-9271. Backport YARN-6927 for resource type support in MapReduce
---
 .../mapreduce/v2/app/job/impl/TaskAttemptImpl.java | 141 +++++++-
 .../mapreduce/TestMapreduceConfigFields.java       |  11 +
 .../mapreduce/v2/app/job/impl/TestTaskAttempt.java | 365 ++++++++++++++++++++-
 .../org/apache/hadoop/mapreduce/MRJobConfig.java   |  68 +++-
 .../java/org/apache/hadoop/mapred/YARNRunner.java  |  86 ++++-
 .../org/apache/hadoop/mapred/TestYARNRunner.java   | 167 ++++++++++
 .../hadoop/yarn/util/resource/ResourceUtils.java   |  44 +++
 7 files changed, 853 insertions(+), 29 deletions(-)

diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
index dfc3adb..3f37d4d 100644
--- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
+++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
@@ -18,6 +18,8 @@
 
 package org.apache.hadoop.mapreduce.v2.app.job.impl;
 
+import static org.apache.commons.lang.StringUtils.isEmpty;
+
 import java.io.IOException;
 import java.net.InetAddress;
 import java.net.InetSocketAddress;
@@ -126,6 +128,7 @@ import org.apache.hadoop.yarn.api.records.LocalResourceType;
 import org.apache.hadoop.yarn.api.records.LocalResourceVisibility;
 import org.apache.hadoop.yarn.api.records.NodeId;
 import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
 import org.apache.hadoop.yarn.api.records.URL;
 import org.apache.hadoop.yarn.conf.YarnConfiguration;
 import org.apache.hadoop.yarn.event.EventHandler;
@@ -139,6 +142,8 @@ import org.apache.hadoop.yarn.state.StateMachine;
 import org.apache.hadoop.yarn.state.StateMachineFactory;
 import org.apache.hadoop.yarn.util.Clock;
 import org.apache.hadoop.yarn.util.RackResolver;
+import org.apache.hadoop.yarn.util.UnitsConversionUtil;
+import org.apache.hadoop.yarn.util.resource.ResourceUtils;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Preconditions;
@@ -664,12 +669,8 @@ public abstract class TaskAttemptImpl implements
     this.jobFile = jobFile;
     this.partition = partition;
 
-    //TODO:create the resource reqt for this Task attempt
     this.resourceCapability = recordFactory.newRecordInstance(Resource.class);
-    this.resourceCapability.setMemorySize(
-        getMemoryRequired(conf, taskId.getTaskType()));
-    this.resourceCapability.setVirtualCores(
-        getCpuRequired(conf, taskId.getTaskType()));
+    populateResourceCapability(taskId.getTaskType());
 
     this.dataLocalHosts = resolveHosts(dataLocalHosts);
     RackResolver.init(conf);
@@ -701,21 +702,133 @@ public abstract class TaskAttemptImpl implements
     return memory;
   }
 
+  private void populateResourceCapability(TaskType taskType) {
+    String resourceTypePrefix =
+        getResourceTypePrefix(taskType);
+    boolean memorySet = false;
+    boolean cpuVcoresSet = false;
+    if (resourceTypePrefix != null) {
+      List<ResourceInformation> resourceRequests =
+          ResourceUtils.getRequestedResourcesFromConfig(conf,
+              resourceTypePrefix);
+      for (ResourceInformation resourceRequest : resourceRequests) {
+        String resourceName = resourceRequest.getName();
+        if (MRJobConfig.RESOURCE_TYPE_NAME_MEMORY.equals(resourceName) ||
+            MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY.equals(
+                resourceName)) {
+          if (memorySet) {
+            throw new IllegalArgumentException(
+                "Only one of the following keys " +
+                    "can be specified for a single job: " +
+                    MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY + ", " +
+                    MRJobConfig.RESOURCE_TYPE_NAME_MEMORY);
+          }
+          String units = isEmpty(resourceRequest.getUnits()) ?
+              ResourceUtils.getDefaultUnit(ResourceInformation.MEMORY_URI) :
+                resourceRequest.getUnits();
+          this.resourceCapability.setMemorySize(
+              UnitsConversionUtil.convert(units, "Mi",
+                  resourceRequest.getValue()));
+          memorySet = true;
+          String memoryKey = getMemoryKey(taskType);
+          if (memoryKey != null && conf.get(memoryKey) != null) {
+            LOG.warn("Configuration " + resourceTypePrefix + resourceName +
+                "=" + resourceRequest.getValue() + resourceRequest.getUnits() +
+                " is overriding the " + memoryKey + "=" + conf.get(memoryKey) +
+                " configuration");
+          }
+        } else if (MRJobConfig.RESOURCE_TYPE_NAME_VCORE.equals(
+            resourceName)) {
+          this.resourceCapability.setVirtualCores(
+              (int) UnitsConversionUtil.convert(resourceRequest.getUnits(), "",
+                  resourceRequest.getValue()));
+          cpuVcoresSet = true;
+          String cpuKey = getCpuVcoresKey(taskType);
+          if (cpuKey != null && conf.get(cpuKey) != null) {
+            LOG.warn("Configuration " + resourceTypePrefix +
+                MRJobConfig.RESOURCE_TYPE_NAME_VCORE + "=" +
+                resourceRequest.getValue() + resourceRequest.getUnits() +
+                " is overriding the " + cpuKey + "=" +
+                conf.get(cpuKey) + " configuration");
+          }
+        } else {
+          ResourceInformation resourceInformation =
+              this.resourceCapability.getResourceInformation(resourceName);
+          resourceInformation.setUnits(resourceRequest.getUnits());
+          resourceInformation.setValue(resourceRequest.getValue());
+          this.resourceCapability.setResourceInformation(resourceName,
+              resourceInformation);
+        }
+      }
+    }
+    if (!memorySet) {
+      this.resourceCapability.setMemorySize(getMemoryRequired(conf, taskType));
+    }
+    if (!cpuVcoresSet) {
+      this.resourceCapability.setVirtualCores(getCpuRequired(conf, taskType));
+    }
+  }
+
+  private String getCpuVcoresKey(TaskType taskType) {
+    switch (taskType) {
+    case MAP:
+      return MRJobConfig.MAP_CPU_VCORES;
+    case REDUCE:
+      return MRJobConfig.REDUCE_CPU_VCORES;
+    default:
+      return null;
+    }
+  }
+
+  private String getMemoryKey(TaskType taskType) {
+    switch (taskType) {
+    case MAP:
+      return MRJobConfig.MAP_MEMORY_MB;
+    case REDUCE:
+      return MRJobConfig.REDUCE_MEMORY_MB;
+    default:
+      return null;
+    }
+  }
+
+  private Integer getCpuVcoreDefault(TaskType taskType) {
+    switch (taskType) {
+    case MAP:
+      return MRJobConfig.DEFAULT_MAP_CPU_VCORES;
+    case REDUCE:
+      return MRJobConfig.DEFAULT_REDUCE_CPU_VCORES;
+    default:
+      return null;
+    }
+  }
+
   private int getCpuRequired(Configuration conf, TaskType taskType) {
     int vcores = 1;
-    if (taskType == TaskType.MAP)  {
-      vcores =
-          conf.getInt(MRJobConfig.MAP_CPU_VCORES,
-              MRJobConfig.DEFAULT_MAP_CPU_VCORES);
-    } else if (taskType == TaskType.REDUCE) {
-      vcores =
-          conf.getInt(MRJobConfig.REDUCE_CPU_VCORES,
-              MRJobConfig.DEFAULT_REDUCE_CPU_VCORES);
+    String cpuVcoreKey = getCpuVcoresKey(taskType);
+    if (cpuVcoreKey != null) {
+      Integer defaultCpuVcores = getCpuVcoreDefault(taskType);
+      if (null == defaultCpuVcores) {
+        defaultCpuVcores = vcores;
+      }
+      vcores = conf.getInt(cpuVcoreKey, defaultCpuVcores);
     }
-    
     return vcores;
   }
 
+  private String getResourceTypePrefix(TaskType taskType) {
+    switch (taskType) {
+    case MAP:
+      return MRJobConfig.MAP_RESOURCE_TYPE_PREFIX;
+    case REDUCE:
+      return MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX;
+    default:
+      LOG.info("TaskType " + taskType +
+          " does not support custom resource types - this support can be " +
+          "added in " + getClass().getSimpleName());
+      return null;
+    }
+  }
+
   /**
    * Create a {@link LocalResource} record with all the given parameters.
    * The NM that hosts AM container will upload resources to shared cache.
diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/TestMapreduceConfigFields.java b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/TestMapreduceConfigFields.java
index 096cec9..f469aad 100644
--- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/TestMapreduceConfigFields.java
+++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/TestMapreduceConfigFields.java
@@ -78,6 +78,17 @@ public class TestMapreduceConfigFields extends TestConfigurationFieldsBase {
     xmlPropsToSkipCompare.add("mapreduce.local.clientfactory.class.name");
     xmlPropsToSkipCompare.add("mapreduce.jobtracker.system.dir");
     xmlPropsToSkipCompare.add("mapreduce.jobtracker.staging.root.dir");
+
+    // Resource type related properties are only prefixes,
+    // they need to be postfixed with the resource name
+    // in order to take effect.
+    // There is nothing to be added to mapred-default.xml
+    configurationPropsToSkipCompare.add(
+        MRJobConfig.MR_AM_RESOURCE_PREFIX);
+    configurationPropsToSkipCompare.add(
+        MRJobConfig.MAP_RESOURCE_TYPE_PREFIX);
+    configurationPropsToSkipCompare.add(
+        MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX);
   }
 
 }
diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
index 60a2177..e055798 100644
--- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
+++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
@@ -28,14 +28,21 @@ import static org.mockito.Mockito.times;
 import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
 
+import java.io.ByteArrayInputStream;
 import java.io.IOException;
+import java.io.InputStream;
 import java.net.InetSocketAddress;
+import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.Iterator;
+import java.util.List;
 import java.util.Map;
+import java.util.concurrent.CopyOnWriteArrayList;
 
 import com.google.common.base.Supplier;
+import org.junit.After;
 import org.junit.Assert;
+import org.junit.BeforeClass;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
@@ -43,6 +50,7 @@ import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.RawLocalFileSystem;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.MapTaskAttemptImpl;
+import org.apache.hadoop.mapred.ReduceTaskAttemptImpl;
 import org.apache.hadoop.mapreduce.Counters;
 import org.apache.hadoop.mapreduce.JobCounter;
 import org.apache.hadoop.mapreduce.MRJobConfig;
@@ -83,24 +91,36 @@ import org.apache.hadoop.mapreduce.v2.app.rm.ContainerRequestEvent;
 import org.apache.hadoop.mapreduce.v2.util.MRBuilderUtils;
 import org.apache.hadoop.security.Credentials;
 import org.apache.hadoop.security.token.Token;
+import org.apache.hadoop.yarn.LocalConfigurationProvider;
 import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
 import org.apache.hadoop.yarn.api.records.ApplicationId;
 import org.apache.hadoop.yarn.api.records.Container;
 import org.apache.hadoop.yarn.api.records.ContainerId;
 import org.apache.hadoop.yarn.api.records.NodeId;
 import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
 import org.apache.hadoop.yarn.conf.YarnConfiguration;
 import org.apache.hadoop.yarn.event.Event;
 import org.apache.hadoop.yarn.event.EventHandler;
+import org.apache.hadoop.yarn.exceptions.YarnException;
 import org.apache.hadoop.yarn.util.Clock;
 import org.apache.hadoop.yarn.util.ControlledClock;
 import org.apache.hadoop.yarn.util.SystemClock;
+import org.apache.hadoop.yarn.util.resource.ResourceUtils;
+import org.apache.log4j.AppenderSkeleton;
+import org.apache.log4j.Level;
+import org.apache.log4j.Logger;
+import org.apache.log4j.spi.LoggingEvent;
 import org.junit.Test;
 import org.mockito.ArgumentCaptor;
 
+import com.google.common.collect.ImmutableList;
+
 @SuppressWarnings({"unchecked", "rawtypes"})
 public class TestTaskAttempt{
-	
+
+  private static final String CUSTOM_RESOURCE_NAME = "a-custom-resource";
+
   static public class StubbedFS extends RawLocalFileSystem {
     @Override
     public FileStatus getFileStatus(Path f) throws IOException {
@@ -108,6 +128,63 @@ public class TestTaskAttempt{
     }
   }
 
+  private static class CustomResourceTypesConfigurationProvider
+      extends LocalConfigurationProvider {
+
+    @Override
+    public InputStream getConfigurationInputStream(Configuration bootstrapConf,
+        String name) throws YarnException, IOException {
+      if (YarnConfiguration.RESOURCE_TYPES_CONFIGURATION_FILE.equals(name)) {
+        return new ByteArrayInputStream(
+            ("<configuration>\n" +
+            " <property>\n" +
+            "   <name>yarn.resource-types</name>\n" +
+            "   <value>a-custom-resource</value>\n" +
+            " </property>\n" +
+            " <property>\n" +
+            "   <name>yarn.resource-types.a-custom-resource.units</name>\n" +
+            "   <value>G</value>\n" +
+            " </property>\n" +
+            "</configuration>\n").getBytes());
+      } else {
+        return super.getConfigurationInputStream(bootstrapConf, name);
+      }
+    }
+  }
+
+  private static class TestAppender extends AppenderSkeleton {
+
+    private final List<LoggingEvent> logEvents = new CopyOnWriteArrayList<>();
+
+    @Override
+    public boolean requiresLayout() {
+      return false;
+    }
+
+    @Override
+    public void close() {
+    }
+
+    @Override
+    protected void append(LoggingEvent arg0) {
+      logEvents.add(arg0);
+    }
+
+    private List<LoggingEvent> getLogEvents() {
+      return logEvents;
+    }
+  }
+
+  @BeforeClass
+  public static void setupBeforeClass() {
+    ResourceUtils.resetResourceTypes(new Configuration());
+  }
+
+  @After
+  public void tearDown() {
+    ResourceUtils.resetResourceTypes(new Configuration());
+  }
+
   @Test
   public void testMRAppHistoryForMap() throws Exception {
     MRApp app = new FailingAttemptsMRApp(1, 0);
@@ -329,17 +406,18 @@ public class TestTaskAttempt{
   private TaskAttemptImpl createMapTaskAttemptImplForTest(
       EventHandler eventHandler, TaskSplitMetaInfo taskSplitMetaInfo) {
     Clock clock = SystemClock.getInstance();
-    return createMapTaskAttemptImplForTest(eventHandler, taskSplitMetaInfo, clock);
+    return createMapTaskAttemptImplForTest(eventHandler, taskSplitMetaInfo,
+        clock, new JobConf());
   }
 
   private TaskAttemptImpl createMapTaskAttemptImplForTest(
-      EventHandler eventHandler, TaskSplitMetaInfo taskSplitMetaInfo, Clock clock) {
+      EventHandler eventHandler, TaskSplitMetaInfo taskSplitMetaInfo,
+      Clock clock, JobConf jobConf) {
     ApplicationId appId = ApplicationId.newInstance(1, 1);
     JobId jobId = MRBuilderUtils.newJobId(appId, 1);
     TaskId taskId = MRBuilderUtils.newTaskId(jobId, 1, TaskType.MAP);
     TaskAttemptListener taListener = mock(TaskAttemptListener.class);
     Path jobFile = mock(Path.class);
-    JobConf jobConf = new JobConf();
     TaskAttemptImpl taImpl =
         new MapTaskAttemptImpl(taskId, 1, eventHandler, jobFile, 1,
             taskSplitMetaInfo, jobConf, taListener, null,
@@ -347,6 +425,20 @@ public class TestTaskAttempt{
     return taImpl;
   }
 
+  private TaskAttemptImpl createReduceTaskAttemptImplForTest(
+      EventHandler eventHandler, Clock clock, JobConf jobConf) {
+    ApplicationId appId = ApplicationId.newInstance(1, 1);
+    JobId jobId = MRBuilderUtils.newJobId(appId, 1);
+    TaskId taskId = MRBuilderUtils.newTaskId(jobId, 1, TaskType.REDUCE);
+    TaskAttemptListener taListener = mock(TaskAttemptListener.class);
+    Path jobFile = mock(Path.class);
+    TaskAttemptImpl taImpl =
+        new ReduceTaskAttemptImpl(taskId, 1, eventHandler, jobFile, 1,
+            1, jobConf, taListener, null,
+            null, clock, null);
+    return taImpl;
+  }
+
   private void testMRAppHistory(MRApp app) throws Exception {
     Configuration conf = new Configuration();
     Job job = app.submit(conf);
@@ -1423,6 +1515,271 @@ public class TestTaskAttempt{
     assertFalse("InternalError occurred", eventHandler.internalError);
   }
 
+  @Test
+  public void testMapperCustomResourceTypes() {
+    initResourceTypes();
+    EventHandler eventHandler = mock(EventHandler.class);
+    TaskSplitMetaInfo taskSplitMetaInfo = new TaskSplitMetaInfo();
+    Clock clock = SystemClock.getInstance();
+    JobConf jobConf = new JobConf();
+    jobConf.setLong(MRJobConfig.MAP_RESOURCE_TYPE_PREFIX
+        + CUSTOM_RESOURCE_NAME, 7L);
+    TaskAttemptImpl taImpl = createMapTaskAttemptImplForTest(eventHandler,
+        taskSplitMetaInfo, clock, jobConf);
+    ResourceInformation resourceInfo =
+        getResourceInfoFromContainerRequest(taImpl, eventHandler).
+        getResourceInformation(CUSTOM_RESOURCE_NAME);
+    assertEquals("Expecting the default unit (G)",
+        "G", resourceInfo.getUnits());
+    assertEquals(7L, resourceInfo.getValue());
+  }
+
+  @Test
+  public void testReducerCustomResourceTypes() {
+    initResourceTypes();
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    JobConf jobConf = new JobConf();
+    jobConf.set(MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX
+        + CUSTOM_RESOURCE_NAME, "3m");
+    TaskAttemptImpl taImpl =
+        createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+    ResourceInformation resourceInfo =
+        getResourceInfoFromContainerRequest(taImpl, eventHandler).
+        getResourceInformation(CUSTOM_RESOURCE_NAME);
+    assertEquals("Expecting the specified unit (m)",
+        "m", resourceInfo.getUnits());
+    assertEquals(3L, resourceInfo.getValue());
+  }
+
+  @Test
+  public void testReducerMemoryRequestViaMapreduceReduceMemoryMb() {
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    JobConf jobConf = new JobConf();
+    jobConf.setInt(MRJobConfig.REDUCE_MEMORY_MB, 2048);
+    TaskAttemptImpl taImpl =
+        createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+    long memorySize =
+        getResourceInfoFromContainerRequest(taImpl, eventHandler).
+        getMemorySize();
+    assertEquals(2048, memorySize);
+  }
+
+  @Test
+  public void testReducerMemoryRequestViaMapreduceReduceResourceMemory() {
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    JobConf jobConf = new JobConf();
+    jobConf.set(MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX +
+        MRJobConfig.RESOURCE_TYPE_NAME_MEMORY, "2 Gi");
+    TaskAttemptImpl taImpl =
+        createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+    long memorySize =
+        getResourceInfoFromContainerRequest(taImpl, eventHandler).
+        getMemorySize();
+    assertEquals(2048, memorySize);
+  }
+
+  @Test
+  public void testReducerMemoryRequestDefaultMemory() {
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    TaskAttemptImpl taImpl =
+        createReduceTaskAttemptImplForTest(eventHandler, clock, new JobConf());
+    long memorySize =
+        getResourceInfoFromContainerRequest(taImpl, eventHandler).
+        getMemorySize();
+    assertEquals(MRJobConfig.DEFAULT_REDUCE_MEMORY_MB, memorySize);
+  }
+
+  @Test
+  public void testReducerMemoryRequestWithoutUnits() {
+    Clock clock = SystemClock.getInstance();
+    for (String memoryResourceName : ImmutableList.of(
+        MRJobConfig.RESOURCE_TYPE_NAME_MEMORY,
+        MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY)) {
+      EventHandler eventHandler = mock(EventHandler.class);
+      JobConf jobConf = new JobConf();
+      jobConf.setInt(MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX +
+          memoryResourceName, 2048);
+      TaskAttemptImpl taImpl =
+          createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+      long memorySize =
+          getResourceInfoFromContainerRequest(taImpl, eventHandler).
+          getMemorySize();
+      assertEquals(2048, memorySize);
+    }
+  }
+
+  @Test
+  public void testReducerMemoryRequestOverriding() {
+    for (String memoryName : ImmutableList.of(
+        MRJobConfig.RESOURCE_TYPE_NAME_MEMORY,
+        MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY)) {
+      TestAppender testAppender = new TestAppender();
+      final Logger logger = Logger.getLogger(TaskAttemptImpl.class);
+      try {
+        logger.addAppender(testAppender);
+        EventHandler eventHandler = mock(EventHandler.class);
+        Clock clock = SystemClock.getInstance();
+        JobConf jobConf = new JobConf();
+        jobConf.set(MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX + memoryName,
+            "3Gi");
+        jobConf.setInt(MRJobConfig.REDUCE_MEMORY_MB, 2048);
+        TaskAttemptImpl taImpl =
+            createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+        long memorySize =
+            getResourceInfoFromContainerRequest(taImpl, eventHandler).
+            getMemorySize();
+        assertEquals(3072, memorySize);
+        boolean foundLogWarning = false;
+        for (LoggingEvent e : testAppender.getLogEvents()) {
+          if (e.getLevel() == Level.WARN && ("Configuration " +
+                "mapreduce.reduce.resource." + memoryName + "=3Gi is " +
+                "overriding the mapreduce.reduce.memory.mb=2048 configuration")
+          .equals(e.getMessage())) {
+            foundLogWarning = true;
+            break;
+          }
+        }
+        assertTrue(foundLogWarning);
+      } finally {
+        logger.removeAppender(testAppender);
+      }
+    }
+  }
+
+  @Test(expected=IllegalArgumentException.class)
+  public void testReducerMemoryRequestMultipleName() {
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    JobConf jobConf = new JobConf();
+    for (String memoryName : ImmutableList.of(
+        MRJobConfig.RESOURCE_TYPE_NAME_MEMORY,
+        MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY)) {
+      jobConf.set(MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX + memoryName,
+          "3Gi");
+    }
+    createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+  }
+
+  @Test
+  public void testReducerCpuRequestViaMapreduceReduceCpuVcores() {
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    JobConf jobConf = new JobConf();
+    jobConf.setInt(MRJobConfig.REDUCE_CPU_VCORES, 3);
+    TaskAttemptImpl taImpl =
+        createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+    int vCores =
+        getResourceInfoFromContainerRequest(taImpl, eventHandler).
+        getVirtualCores();
+    assertEquals(3, vCores);
+  }
+
+  @Test
+  public void testReducerCpuRequestViaMapreduceReduceResourceVcores() {
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    JobConf jobConf = new JobConf();
+    jobConf.set(MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX +
+        MRJobConfig.RESOURCE_TYPE_NAME_VCORE, "5");
+    TaskAttemptImpl taImpl =
+        createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+    int vCores =
+        getResourceInfoFromContainerRequest(taImpl, eventHandler).
+        getVirtualCores();
+    assertEquals(5, vCores);
+  }
+
+  @Test
+  public void testReducerCpuRequestDefaultMemory() {
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    TaskAttemptImpl taImpl =
+        createReduceTaskAttemptImplForTest(eventHandler, clock, new JobConf());
+    int vCores =
+        getResourceInfoFromContainerRequest(taImpl, eventHandler).
+        getVirtualCores();
+    assertEquals(MRJobConfig.DEFAULT_REDUCE_CPU_VCORES, vCores);
+  }
+
+  @Test
+  public void testReducerCpuRequestOverriding() {
+    TestAppender testAppender = new TestAppender();
+    final Logger logger = Logger.getLogger(TaskAttemptImpl.class);
+    try {
+      logger.addAppender(testAppender);
+      EventHandler eventHandler = mock(EventHandler.class);
+      Clock clock = SystemClock.getInstance();
+      JobConf jobConf = new JobConf();
+      jobConf.set(MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX +
+          MRJobConfig.RESOURCE_TYPE_NAME_VCORE, "7");
+      jobConf.setInt(MRJobConfig.REDUCE_CPU_VCORES, 9);
+      TaskAttemptImpl taImpl =
+          createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+      long vCores =
+          getResourceInfoFromContainerRequest(taImpl, eventHandler).
+          getVirtualCores();
+      assertEquals(7, vCores);
+      boolean foundLogWarning = false;
+      for (LoggingEvent e : testAppender.getLogEvents()) {
+        if (e.getLevel() == Level.WARN && ("Configuration " +
+              "mapreduce.reduce.resource.vcores=7 is overriding the " +
+              "mapreduce.reduce.cpu.vcores=9 configuration"
+        ).equals(e.getMessage())) {
+          foundLogWarning = true;
+          break;
+        }
+      }
+      assertTrue(foundLogWarning);
+    } finally {
+      logger.removeAppender(testAppender);
+    }
+  }
+
+  private Resource getResourceInfoFromContainerRequest(
+      TaskAttemptImpl taImpl, EventHandler eventHandler) {
+    taImpl.handle(new TaskAttemptEvent(taImpl.getID(),
+        TaskAttemptEventType.TA_SCHEDULE));
+
+    assertEquals("Task attempt is not in STARTING state", taImpl.getState(),
+        TaskAttemptState.STARTING);
+
+    ArgumentCaptor<Event> captor = ArgumentCaptor.forClass(Event.class);
+    verify(eventHandler, times(2)).handle(captor.capture());
+
+    List<ContainerRequestEvent> containerRequestEvents = new ArrayList<>();
+    for (Event e : captor.getAllValues()) {
+      if (e instanceof ContainerRequestEvent) {
+        containerRequestEvents.add((ContainerRequestEvent) e);
+      }
+    }
+    assertEquals("Expected one ContainerRequestEvent after scheduling "
+        + "task attempt", 1, containerRequestEvents.size());
+
+    return containerRequestEvents.get(0).getCapability();
+  }
+
+  @Test(expected=IllegalArgumentException.class)
+  public void testReducerCustomResourceTypeWithInvalidUnit() {
+    initResourceTypes();
+    EventHandler eventHandler = mock(EventHandler.class);
+    Clock clock = SystemClock.getInstance();
+    JobConf jobConf = new JobConf();
+    jobConf.set(MRJobConfig.REDUCE_RESOURCE_TYPE_PREFIX
+        + CUSTOM_RESOURCE_NAME, "3z");
+    createReduceTaskAttemptImplForTest(eventHandler, clock, jobConf);
+  }
+
+  private void initResourceTypes() {
+    Configuration conf = new Configuration();
+    conf.set(YarnConfiguration.RM_CONFIGURATION_PROVIDER_CLASS,
+        CustomResourceTypesConfigurationProvider.class.getName());
+    ResourceUtils.resetResourceTypes(conf);
+  }
+
   private void setupTaskAttemptFinishingMonitor(
       EventHandler eventHandler, JobConf jobConf, AppContext appCtx) {
     TaskAttemptFinishingMonitor taskAttemptFinishingMonitor =
diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
index d666123..5a72def 100644
--- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
+++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
@@ -360,12 +360,47 @@ public interface MRJobConfig {
 
   public static final String MAP_INPUT_START = "mapreduce.map.input.start";
 
+  /**
+   * Configuration key for specifying memory requirement for the mapper.
+   * Kept for backward-compatibility, mapreduce.map.resource.memory
+   * is the new preferred way to specify this.
+   */
   public static final String MAP_MEMORY_MB = "mapreduce.map.memory.mb";
   public static final int DEFAULT_MAP_MEMORY_MB = 1024;
 
+  /**
+   * Configuration key for specifying CPU requirement for the mapper.
+   * Kept for backward-compatibility, mapreduce.map.resource.vcores
+   * is the new preferred way to specify this.
+   */
   public static final String MAP_CPU_VCORES = "mapreduce.map.cpu.vcores";
   public static final int DEFAULT_MAP_CPU_VCORES = 1;
 
+  /**
+   * Custom resource names required by the mapper should be
+   * appended to this prefix, the value's format is {amount}[ ][{unit}].
+   * If no unit is defined, the default unit will be used.
+   * Standard resource names: memory (default unit: Mi), vcores
+   */
+  public static final String MAP_RESOURCE_TYPE_PREFIX =
+      "mapreduce.map.resource.";
+
+  /**
+   * Resource type name for CPU vcores.
+   */
+  public static final String RESOURCE_TYPE_NAME_VCORE = "vcores";
+
+  /**
+   * Resource type name for memory.
+   */
+  public static final String RESOURCE_TYPE_NAME_MEMORY = "memory";
+
+  /**
+   * Alternative resource type name for memory.
+   */
+  public static final String RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY =
+      "memory-mb";
+
   public static final String MAP_ENV = "mapreduce.map.env";
 
   public static final String MAP_JAVA_OPTS = "mapreduce.map.java.opts";
@@ -408,12 +443,31 @@ public interface MRJobConfig {
 
   public static final String REDUCE_MARKRESET_BUFFER_SIZE = "mapreduce.reduce.markreset.buffer.size";
 
+  /**
+   * Configuration key for specifying memory requirement for the reducer.
+   * Kept for backward-compatibility, mapreduce.reduce.resource.memory
+   * is the new preferred way to specify this.
+   */
   public static final String REDUCE_MEMORY_MB = "mapreduce.reduce.memory.mb";
   public static final int DEFAULT_REDUCE_MEMORY_MB = 1024;
 
+  /**
+   * Configuration key for specifying CPU requirement for the reducer.
+   * Kept for backward-compatibility, mapreduce.reduce.resource.vcores
+   * is the new preferred way to specify this.
+   */
   public static final String REDUCE_CPU_VCORES = "mapreduce.reduce.cpu.vcores";
   public static final int DEFAULT_REDUCE_CPU_VCORES = 1;
 
+  /**
+   * Resource names required by the reducer should be
+   * appended to this prefix, the value's format is {amount}[ ][{unit}].
+   * If no unit is defined, the default unit will be used.
+   * Standard resource names: memory (default unit: Mi), vcores
+   */
+  public static final String REDUCE_RESOURCE_TYPE_PREFIX =
+      "mapreduce.reduce.resource.";
+
   public static final String REDUCE_MEMORY_TOTAL_BYTES = "mapreduce.reduce.memory.totalbytes";
 
   public static final String SHUFFLE_INPUT_BUFFER_PERCENT = "mapreduce.reduce.shuffle.input.buffer.percent";
@@ -599,7 +653,10 @@ public interface MRJobConfig {
   public static final String DEFAULT_MR_AM_STAGING_DIR = 
     "/tmp/hadoop-yarn/staging";
 
-  /** The amount of memory the MR app master needs.*/
+  /** The amount of memory the MR app master needs.
+   * Kept for backward-compatibility, yarn.app.mapreduce.am.resource.memory is
+   * the new preferred way to specify this
+   */
   public static final String MR_AM_VMEM_MB =
     MR_AM_PREFIX+"resource.mb";
   public static final int DEFAULT_MR_AM_VMEM_MB = 1536;
@@ -609,6 +666,15 @@ public interface MRJobConfig {
     MR_AM_PREFIX+"resource.cpu-vcores";
   public static final int DEFAULT_MR_AM_CPU_VCORES = 1;
 
+  /**
+   * Resource names required by the MR AM should be
+   * appended to this prefix, the value's format is {amount}[ ][{unit}].
+   * If no unit is defined, the default unit will be used
+   * Standard resource names: memory (default unit: Mi), vcores
+   */
+  public static final String MR_AM_RESOURCE_PREFIX =
+      MR_AM_PREFIX + "resource.";
+
   /** Command line arguments passed to the MR app master.*/
   public static final String MR_AM_COMMAND_OPTS =
     MR_AM_PREFIX+"command-opts";
diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
index a23ff34..12a3079 100644
--- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
+++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
@@ -18,6 +18,9 @@
 
 package org.apache.hadoop.mapred;
 
+import static org.apache.commons.lang.StringUtils.isEmpty;
+import static org.apache.hadoop.mapreduce.MRJobConfig.MR_AM_RESOURCE_PREFIX;
+
 import java.io.IOException;
 import java.net.URI;
 import java.net.URISyntaxException;
@@ -84,6 +87,7 @@ import org.apache.hadoop.yarn.api.records.LocalResourceVisibility;
 import org.apache.hadoop.yarn.api.records.Priority;
 import org.apache.hadoop.yarn.api.records.ReservationId;
 import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
 import org.apache.hadoop.yarn.api.records.ResourceRequest;
 import org.apache.hadoop.yarn.api.records.URL;
 import org.apache.hadoop.yarn.api.records.YarnApplicationState;
@@ -93,6 +97,8 @@ import org.apache.hadoop.yarn.factories.RecordFactory;
 import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
 import org.apache.hadoop.yarn.security.client.RMDelegationTokenSelector;
 import org.apache.hadoop.yarn.util.ConverterUtils;
+import org.apache.hadoop.yarn.util.UnitsConversionUtil;
+import org.apache.hadoop.yarn.util.resource.ResourceUtils;
 
 import com.google.common.annotations.VisibleForTesting;
 
@@ -659,16 +665,76 @@ public class YARNRunner implements ClientProtocol {
 
   private List<ResourceRequest> generateResourceRequests() throws IOException {
     Resource capability = recordFactory.newRecordInstance(Resource.class);
-    capability.setMemorySize(
-        conf.getInt(
-            MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB
-        )
-    );
-    capability.setVirtualCores(
-        conf.getInt(
-            MRJobConfig.MR_AM_CPU_VCORES, MRJobConfig.DEFAULT_MR_AM_CPU_VCORES
-        )
-    );
+    boolean memorySet = false;
+    boolean cpuVcoresSet = false;
+    List<ResourceInformation> resourceRequests = ResourceUtils
+        .getRequestedResourcesFromConfig(conf, MR_AM_RESOURCE_PREFIX);
+    for (ResourceInformation resourceReq : resourceRequests) {
+      String resourceName = resourceReq.getName();
+      if (MRJobConfig.RESOURCE_TYPE_NAME_MEMORY.equals(resourceName) ||
+          MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY.equals(
+              resourceName)) {
+        if (memorySet) {
+          throw new IllegalArgumentException(
+              "Only one of the following keys " +
+                  "can be specified for a single job: " +
+                  MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY + ", " +
+                  MRJobConfig.RESOURCE_TYPE_NAME_MEMORY);
+        }
+        String units = isEmpty(resourceReq.getUnits()) ?
+            ResourceUtils.getDefaultUnit(ResourceInformation.MEMORY_URI) :
+              resourceReq.getUnits();
+        capability.setMemorySize(
+            UnitsConversionUtil.convert(units, "Mi", resourceReq.getValue()));
+        memorySet = true;
+        if (conf.get(MRJobConfig.MR_AM_VMEM_MB) != null) {
+          LOG.warn("Configuration " + MR_AM_RESOURCE_PREFIX +
+              resourceName + "=" + resourceReq.getValue() +
+              resourceReq.getUnits() + " is overriding the " +
+              MRJobConfig.MR_AM_VMEM_MB + "=" +
+              conf.get(MRJobConfig.MR_AM_VMEM_MB) + " configuration");
+        }
+      } else if (MRJobConfig.RESOURCE_TYPE_NAME_VCORE.equals(resourceName)) {
+        capability.setVirtualCores(
+            (int) UnitsConversionUtil.convert(resourceReq.getUnits(), "",
+                resourceReq.getValue()));
+        cpuVcoresSet = true;
+        if (conf.get(MRJobConfig.MR_AM_CPU_VCORES) != null) {
+          LOG.warn("Configuration " + MR_AM_RESOURCE_PREFIX +
+              resourceName + "=" + resourceReq.getValue() +
+              resourceReq.getUnits() + " is overriding the " +
+              MRJobConfig.MR_AM_CPU_VCORES + "=" +
+              conf.get(MRJobConfig.MR_AM_CPU_VCORES) + " configuration");
+        }
+      } else if (!MRJobConfig.MR_AM_VMEM_MB.equals(
+          MR_AM_RESOURCE_PREFIX + resourceName) &&
+          !MRJobConfig.MR_AM_CPU_VCORES.equals(
+              MR_AM_RESOURCE_PREFIX + resourceName)) {
+        // the "mb", "cpu-vcores" resource types are not processed here
+        // since the yarn.app.mapreduce.am.resource.mb,
+        // yarn.app.mapreduce.am.resource.cpu-vcores keys are used for
+        // backward-compatibility - which is handled after this loop
+        ResourceInformation resourceInformation = capability
+            .getResourceInformation(resourceName);
+        resourceInformation.setUnits(resourceReq.getUnits());
+        resourceInformation.setValue(resourceReq.getValue());
+        capability.setResourceInformation(resourceName, resourceInformation);
+      }
+    }
+    if (!memorySet) {
+      capability.setMemorySize(
+          conf.getInt(
+              MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB
+          )
+      );
+    }
+    if (!cpuVcoresSet) {
+      capability.setVirtualCores(
+          conf.getInt(
+              MRJobConfig.MR_AM_CPU_VCORES, MRJobConfig.DEFAULT_MR_AM_CPU_VCORES
+          )
+      );
+    }
     if (LOG.isDebugEnabled()) {
       LOG.debug("AppMaster capability = " + capability);
     }
diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
index c79b08e..ecb396e 100644
--- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
+++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
@@ -32,10 +32,12 @@ import static org.mockito.Mockito.times;
 import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
 
+import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
 import java.io.File;
 import java.io.FileOutputStream;
 import java.io.IOException;
+import java.io.InputStream;
 import java.io.OutputStream;
 import java.net.InetSocketAddress;
 import java.nio.ByteBuffer;
@@ -43,6 +45,7 @@ import java.security.PrivilegedExceptionAction;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Map;
+import java.util.concurrent.CopyOnWriteArrayList;
 
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
@@ -69,6 +72,7 @@ import org.apache.hadoop.security.SecurityUtil;
 import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.hadoop.security.token.Token;
 import org.apache.hadoop.util.Shell;
+import org.apache.hadoop.yarn.LocalConfigurationProvider;
 import org.apache.hadoop.yarn.api.ApplicationClientProtocol;
 import org.apache.hadoop.yarn.api.ApplicationConstants;
 import org.apache.hadoop.yarn.api.ApplicationConstants.Environment;
@@ -96,28 +100,37 @@ import org.apache.hadoop.yarn.api.records.FinalApplicationStatus;
 import org.apache.hadoop.yarn.api.records.Priority;
 import org.apache.hadoop.yarn.api.records.QueueInfo;
 import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceInformation;
 import org.apache.hadoop.yarn.api.records.ResourceRequest;
 import org.apache.hadoop.yarn.api.records.YarnApplicationState;
 import org.apache.hadoop.yarn.api.records.YarnClusterMetrics;
 import org.apache.hadoop.yarn.client.api.impl.YarnClientImpl;
 import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.exceptions.YarnException;
 import org.apache.hadoop.yarn.factories.RecordFactory;
 import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
 import org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier;
 import org.apache.hadoop.yarn.server.utils.BuilderUtils;
 import org.apache.hadoop.yarn.util.Records;
+import org.apache.hadoop.yarn.util.resource.ResourceUtils;
 import org.apache.log4j.Appender;
+import org.apache.log4j.AppenderSkeleton;
 import org.apache.log4j.Layout;
+import org.apache.log4j.Level;
 import org.apache.log4j.Logger;
 import org.apache.log4j.SimpleLayout;
 import org.apache.log4j.WriterAppender;
+import org.apache.log4j.spi.LoggingEvent;
 import org.junit.After;
 import org.junit.Assert;
 import org.junit.Before;
+import org.junit.BeforeClass;
 import org.junit.Test;
 import org.mockito.invocation.InvocationOnMock;
 import org.mockito.stubbing.Answer;
 
+import com.google.common.collect.ImmutableList;
+
 /**
  * Test YarnRunner and make sure the client side plugin works
  * fine
@@ -131,6 +144,53 @@ public class TestYARNRunner {
       MRJobConfig.DEFAULT_TASK_PROFILE_PARAMS.substring(0,
           MRJobConfig.DEFAULT_TASK_PROFILE_PARAMS.lastIndexOf("%"));
 
+  private static class CustomResourceTypesConfigurationProvider
+      extends LocalConfigurationProvider {
+
+    @Override
+    public InputStream getConfigurationInputStream(Configuration bootstrapConf,
+        String name) throws YarnException, IOException {
+      if (YarnConfiguration.RESOURCE_TYPES_CONFIGURATION_FILE.equals(name)) {
+        return new ByteArrayInputStream(
+            ("<configuration>\n" +
+            " <property>\n" +
+            "   <name>yarn.resource-types</name>\n" +
+            "   <value>a-custom-resource</value>\n" +
+            " </property>\n" +
+            " <property>\n" +
+            "   <name>yarn.resource-types.a-custom-resource.units</name>\n" +
+            "   <value>G</value>\n" +
+            " </property>\n" +
+            "</configuration>\n").getBytes());
+      } else {
+        return super.getConfigurationInputStream(bootstrapConf, name);
+      }
+    }
+  }
+
+  private static class TestAppender extends AppenderSkeleton {
+
+    private final List<LoggingEvent> logEvents = new CopyOnWriteArrayList<>();
+
+    @Override
+    public boolean requiresLayout() {
+      return false;
+    }
+
+    @Override
+    public void close() {
+    }
+
+    @Override
+    protected void append(LoggingEvent arg0) {
+      logEvents.add(arg0);
+    }
+
+    private List<LoggingEvent> getLogEvents() {
+      return logEvents;
+    }
+  }
+
   private YARNRunner yarnRunner;
   private ResourceMgrDelegate resourceMgrDelegate;
   private YarnConfiguration conf;
@@ -143,6 +203,11 @@ public class TestYARNRunner {
   private  ClientServiceDelegate clientDelegate;
   private static final String failString = "Rejected job";
 
+  @BeforeClass
+  public static void setupBeforeClass() {
+    ResourceUtils.resetResourceTypes(new Configuration());
+  }
+
   @Before
   public void setUp() throws Exception {
     resourceMgrDelegate = mock(ResourceMgrDelegate.class);
@@ -175,6 +240,7 @@ public class TestYARNRunner {
   @After
   public void cleanup() {
     FileUtil.fullyDelete(testWorkDir);
+    ResourceUtils.resetResourceTypes(new Configuration());
   }
 
   @Test(timeout=20000)
@@ -884,4 +950,105 @@ public class TestYARNRunner {
         .get("hadoop.tmp.dir").equals("testconfdir"));
     UserGroupInformation.reset();
   }
+
+  @Test
+  public void testCustomAMRMResourceType() throws Exception {
+    initResourceTypes();
+    String customResourceName = "a-custom-resource";
+
+    JobConf jobConf = new JobConf();
+
+    jobConf.setInt(MRJobConfig.MR_AM_RESOURCE_PREFIX +
+        customResourceName, 5);
+    jobConf.setInt(MRJobConfig.MR_AM_CPU_VCORES, 3);
+
+    yarnRunner = new YARNRunner(jobConf);
+
+    submissionContext = buildSubmitContext(yarnRunner, jobConf);
+
+    List<ResourceRequest> resourceRequests =
+        submissionContext.getAMContainerResourceRequests();
+
+    Assert.assertEquals(1, resourceRequests.size());
+    ResourceRequest resourceRequest = resourceRequests.get(0);
+
+    ResourceInformation resourceInformation = resourceRequest.getCapability()
+        .getResourceInformation(customResourceName);
+    Assert.assertEquals("Expecting the default unit (G)",
+        "G", resourceInformation.getUnits());
+    Assert.assertEquals(5L, resourceInformation.getValue());
+    Assert.assertEquals(3, resourceRequest.getCapability().getVirtualCores());
+  }
+
+  @Test
+  public void testAMRMemoryRequest() throws Exception {
+    for (String memoryName : ImmutableList.of(
+        MRJobConfig.RESOURCE_TYPE_NAME_MEMORY,
+        MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY)) {
+      JobConf jobConf = new JobConf();
+      jobConf.set(MRJobConfig.MR_AM_RESOURCE_PREFIX + memoryName, "3 Gi");
+
+      yarnRunner = new YARNRunner(jobConf);
+
+      submissionContext = buildSubmitContext(yarnRunner, jobConf);
+
+      List<ResourceRequest> resourceRequests =
+          submissionContext.getAMContainerResourceRequests();
+
+      Assert.assertEquals(1, resourceRequests.size());
+      ResourceRequest resourceRequest = resourceRequests.get(0);
+
+      long memorySize = resourceRequest.getCapability().getMemorySize();
+      Assert.assertEquals(3072, memorySize);
+    }
+  }
+
+  @Test
+  public void testAMRMemoryRequestOverriding() throws Exception {
+    for (String memoryName : ImmutableList.of(
+        MRJobConfig.RESOURCE_TYPE_NAME_MEMORY,
+        MRJobConfig.RESOURCE_TYPE_ALTERNATIVE_NAME_MEMORY)) {
+      TestAppender testAppender = new TestAppender();
+      Logger logger = Logger.getLogger(YARNRunner.class);
+      logger.addAppender(testAppender);
+      try {
+        JobConf jobConf = new JobConf();
+        jobConf.set(MRJobConfig.MR_AM_RESOURCE_PREFIX + memoryName, "3 Gi");
+        jobConf.setInt(MRJobConfig.MR_AM_VMEM_MB, 2048);
+
+        yarnRunner = new YARNRunner(jobConf);
+
+        submissionContext = buildSubmitContext(yarnRunner, jobConf);
+
+        List<ResourceRequest> resourceRequests =
+            submissionContext.getAMContainerResourceRequests();
+
+        Assert.assertEquals(1, resourceRequests.size());
+        ResourceRequest resourceRequest = resourceRequests.get(0);
+
+        long memorySize = resourceRequest.getCapability().getMemorySize();
+        Assert.assertEquals(3072, memorySize);
+        boolean foundLogWarning = false;
+        for (LoggingEvent e : testAppender.getLogEvents()) {
+          if (e.getLevel() == Level.WARN && ("Configuration " +
+                "yarn.app.mapreduce.am.resource." + memoryName + "=3Gi is " +
+                "overriding the yarn.app.mapreduce.am.resource.mb=2048 " +
+                "configuration").equals(e.getMessage())) {
+            foundLogWarning = true;
+            break;
+          }
+        }
+        assertTrue(foundLogWarning);
+      } finally {
+        logger.removeAppender(testAppender);
+      }
+    }
+  }
+
+  private void initResourceTypes() {
+    Configuration configuration = new Configuration();
+    configuration.set(YarnConfiguration.RM_CONFIGURATION_PROVIDER_CLASS,
+        CustomResourceTypesConfigurationProvider.class.getName());
+    ResourceUtils.resetResourceTypes(configuration);
+  }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
index 65eb5a2..3806771 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceUtils.java
@@ -44,7 +44,10 @@ import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
+import java.util.Map.Entry;
 import java.util.concurrent.ConcurrentHashMap;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
 
 import static org.apache.hadoop.yarn.api.records.ResourceInformation.GPU_URI;
 
@@ -60,6 +63,8 @@ public class ResourceUtils {
 
   private static final String MEMORY = ResourceInformation.MEMORY_MB.getName();
   private static final String VCORES = ResourceInformation.VCORES.getName();
+  private static final Pattern RESOURCE_REQUEST_VALUE_PATTERN =
+      Pattern.compile("^([0-9]+) ?([a-zA-Z]*)$");
 
   private static volatile boolean initializedResources = false;
   private static final Map<String, Integer> RESOURCE_NAME_TO_INDEX =
@@ -564,4 +569,43 @@ public class ResourceUtils {
     }
     return array;
   }
+
+  /**
+   * From a given configuration get all entries representing requested
+   * resources: entries that match the {prefix}{resourceName}={value}[{units}]
+   * pattern.
+   * @param configuration The configuration
+   * @param prefix Keys with this prefix are considered from the configuration
+   * @return The list of requested resources as described by the configuration
+   */
+  public static List<ResourceInformation> getRequestedResourcesFromConfig(
+      Configuration configuration, String prefix) {
+    List<ResourceInformation> result = new ArrayList<>();
+    Map<String, String> customResourcesMap = configuration
+        .getValByRegex("^" + Pattern.quote(prefix) + "[^.]+$");
+    for (Entry<String, String> resource : customResourcesMap.entrySet()) {
+      String resourceName = resource.getKey().substring(prefix.length());
+      Matcher matcher =
+          RESOURCE_REQUEST_VALUE_PATTERN.matcher(resource.getValue());
+      if (!matcher.matches()) {
+        String errorMsg = "Invalid resource request specified for property "
+            + resource.getKey() + ": \"" + resource.getValue()
+            + "\", expected format is: value[ ][units]";
+        LOG.error(errorMsg);
+        throw new IllegalArgumentException(errorMsg);
+      }
+      long value = Long.parseLong(matcher.group(1));
+      String unit = matcher.group(2);
+      if (unit.isEmpty()) {
+        unit = ResourceUtils.getDefaultUnit(resourceName);
+      }
+      ResourceInformation resourceInformation = new ResourceInformation();
+      resourceInformation.setName(resourceName);
+      resourceInformation.setValue(value);
+      resourceInformation.setUnits(unit);
+      result.add(resourceInformation);
+    }
+    return result;
+  }
+
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 15/20: YARN-9291. Backport YARN-7637 to branch-2

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit ea259c493dedf2b7244afd6967848d09d51564c3
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Wed Mar 20 17:45:01 2019 -0700

    YARN-9291. Backport YARN-7637 to branch-2
---
 .../recovery/NMNullStateStoreService.java          |  1 +
 .../resources/gpu/TestGpuResourceHandler.java      | 30 ++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
index 7d1010f..95ec61a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
@@ -272,6 +272,7 @@ public class NMNullStateStoreService extends NMStateStoreService {
   public void storeAssignedResources(Container container,
       String resourceType, List<Serializable> assignedResources)
       throws IOException {
+    updateContainerResourceMapping(container, resourceType, assignedResources);
   }
 
   @Override
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
index b5796df..7a3bd02 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
@@ -38,6 +38,7 @@ import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resource
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerRuntimeConstants;
+import org.apache.hadoop.yarn.server.nodemanager.recovery.NMNullStateStoreService;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService;
 import org.apache.hadoop.yarn.util.resource.TestResourceUtils;
 import org.junit.Assert;
@@ -349,6 +350,35 @@ public class TestGpuResourceHandler {
   }
 
   @Test
+  public void testAllocationStoredWithNULLStateStore() throws Exception {
+    NMNullStateStoreService mockNMNULLStateStore = mock(NMNullStateStoreService.class);
+
+    Context nmnctx = mock(Context.class);
+    when(nmnctx.getNMStateStore()).thenReturn(mockNMNULLStateStore);
+
+    GpuResourceHandlerImpl gpuNULLStateResourceHandler =
+        new GpuResourceHandlerImpl(nmnctx, mockCGroupsHandler,
+        mockPrivilegedExecutor);
+
+    Configuration conf = new YarnConfiguration();
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1,2:3,3:4");
+    GpuDiscoverer.getInstance().initialize(conf);
+
+    gpuNULLStateResourceHandler.bootstrap(conf);
+    Assert.assertEquals(4,
+        gpuNULLStateResourceHandler.getGpuAllocator().getAvailableGpus());
+
+    /* Start container 1, asks 3 containers */
+    Container container = mockContainerWithGpuRequest(1, 3);
+    gpuNULLStateResourceHandler.preStart(container);
+
+    verify(nmnctx.getNMStateStore()).storeAssignedResources(container,
+        ResourceInformation.GPU_URI, Arrays
+            .<Serializable>asList(new GpuDevice(0, 0), new GpuDevice(1, 1),
+                new GpuDevice(2, 3)));
+  }
+
+  @Test
   public void testRecoverResourceAllocation() throws Exception {
     Configuration conf = new YarnConfiguration();
     conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1,2:3,3:4");


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 12/20: YARN-7594. TestNMWebServices#testGetNMResourceInfo fails on trunk. Contributed by Gergely Novák.

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 618d0154ef289218c8f99f1d13bdbda70145b89f
Author: Sunil G <su...@apache.org>
AuthorDate: Mon Dec 4 10:45:07 2017 +0530

    YARN-7594. TestNMWebServices#testGetNMResourceInfo fails on trunk. Contributed by Gergely Novák.
---
 .../yarn/server/nodemanager/webapp/TestNMWebServices.java    | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
index 72071da..2f1577f 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
@@ -456,17 +456,17 @@ public class TestNMWebServices extends JerseyTestBase {
         ClientResponse.class);
     assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
 
-    // Access resource-2 should fail (null NMResourceInfo returned).
+    // Access resource-2 should fail (empty NMResourceInfo returned).
     JSONObject json = response.getEntity(JSONObject.class);
-    assertIncludesException(json);
+    assertEquals(0, json.length());
 
-    // Access resource-3 should fail (unkown plugin)
+    // Access resource-3 should fail (unknown plugin)
     response = r.path("ws").path("v1").path("node").path(
         "resources").path("resource-3").accept(MediaType.APPLICATION_JSON).get(
         ClientResponse.class);
     assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
     json = response.getEntity(JSONObject.class);
-    assertIncludesException(json);
+    assertEquals(0, json.length());
 
     // Access resource-1 should success
     response = r.path("ws").path("v1").path("node").path(
@@ -533,10 +533,6 @@ public class TestNMWebServices extends JerseyTestBase {
     assertEquals(2, json.getJSONArray("assignedGpuDevices").length());
   }
 
-  private void assertIncludesException(JSONObject json) {
-    assertTrue(json.has("RemoteException"));
-  }
-
   private void testContainerLogs(WebResource r, ContainerId containerId)
       throws IOException {
     final String containerIdStr = containerId.toString();


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 10/20: YARN-9289. Backport YARN-7330 for GPU in UI to branch-2

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 9a61778525202baad70bd5fa0785d2d8d37c4fee
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Fri Feb 8 11:26:59 2019 -0800

    YARN-9289. Backport YARN-7330 for GPU in UI to branch-2
---
 .../hadoop-yarn/dev-support/findbugs-exclude.xml   |   8 +
 .../apache/hadoop/yarn/api/records/Resource.java   |  20 +++
 .../linux/resources/gpu/GpuResourceAllocator.java  |  19 ++-
 .../resources/gpu/GpuResourceHandlerImpl.java      |   1 -
 .../resourceplugin/ResourcePlugin.java             |  11 ++
 .../resourceplugin/gpu/AssignedGpuDevice.java      |  88 ++++++++++
 .../resourceplugin/gpu/GpuDevice.java              |  14 +-
 .../resourceplugin/gpu/GpuResourcePlugin.java      |  24 ++-
 .../server/nodemanager/webapp/NMWebServices.java   |  27 +++
 .../nodemanager/webapp/dao/NMResourceInfo.java}    |  16 +-
 .../webapp/dao/gpu/GpuDeviceInformation.java       |   2 +-
 .../webapp/dao/gpu/NMGpuResourceInfo.java          |  80 +++++++++
 .../webapp/dao/gpu/PerGpuDeviceInformation.java    |   2 +-
 .../webapp/dao/gpu/PerGpuMemoryUsage.java          |   2 +-
 .../resources/gpu/TestGpuResourceHandler.java      |   6 +-
 .../nodemanager/webapp/TestNMWebServices.java      | 188 +++++++++++++++++----
 .../dao/gpu/TestGpuDeviceInformationParser.java    |   2 +-
 .../app/{constants.js => adapters/yarn-nm-gpu.js}  |  21 ++-
 .../src/main/webapp/app/components/donut-chart.js  |  18 +-
 .../main/webapp/app/components/gpu-donut-chart.js  |  66 ++++++++
 .../src/main/webapp/app/constants.js               |  13 ++
 .../webapp/app/controllers/yarn-nodes/table.js     |   2 +-
 .../src/main/webapp/app/models/cluster-metric.js   |  69 ++++++++
 .../app/{constants.js => models/yarn-nm-gpu.js}    |  15 +-
 .../webapp/app/models/yarn-queue/capacity-queue.js |   3 +-
 .../src/main/webapp/app/models/yarn-rm-node.js     |  35 ++++
 .../hadoop-yarn-ui/src/main/webapp/app/router.js   |   5 +-
 .../src/main/webapp/app/routes/cluster-overview.js |   2 +-
 .../src/main/webapp/app/routes/yarn-node.js        |   2 +
 .../yarn-node/yarn-nm-gpu.js}                      |  10 +-
 .../yarn-node.js => serializers/yarn-nm-gpu.js}    |  34 ++--
 .../app/serializers/yarn-queue/capacity-queue.js   |   1 +
 .../main/webapp/app/serializers/yarn-rm-node.js    |   4 +-
 .../main/webapp/app/templates/cluster-overview.hbs |  88 ++++++----
 .../app/templates/components/node-menu-panel.hbs   |  10 +-
 .../app/templates/components/yarn-nm-gpu-info.hbs  |  69 ++++++++
 .../src/main/webapp/app/templates/yarn-node.hbs    | 125 --------------
 .../main/webapp/app/templates/yarn-node/info.hbs   | 154 +++++++++++++++++
 .../webapp/app/templates/yarn-node/yarn-nm-gpu.hbs |  53 ++++++
 .../src/main/webapp/app/utils/converter.js         |  51 ++++++
 40 files changed, 1115 insertions(+), 245 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml b/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
index 45aa868..e6dcefb 100644
--- a/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
+++ b/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
@@ -633,4 +633,12 @@
     <Method name="getResources" />
     <Bug pattern="EI_EXPOSE_REP" />
   </Match>
+
+  <!-- EQ_OVERRIDING_EQUALS_NOT_SYMMETRIC -->
+  <Match>
+    <Class name="org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.AssignedGpuDevice" />
+    <Method name="equals" />
+    <Bug pattern="EQ_OVERRIDING_EQUALS_NOT_SYMMETRIC" />
+  </Match>
+
 </FindBugsFilter>
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
index 7e8c01d..92137ad 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
@@ -18,8 +18,12 @@
 
 package org.apache.hadoop.yarn.api.records;
 
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.List;
 
+import com.google.common.collect.Lists;
+import org.apache.commons.lang.ArrayUtils;
 import org.apache.commons.lang.NotImplementedException;
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceAudience.Public;
@@ -213,6 +217,22 @@ public abstract class Resource implements Comparable<Resource> {
   }
 
   /**
+   * Get list of resource information, this will be used by JAXB.
+   * @return list of resources copy.
+   */
+  @InterfaceAudience.Private
+  @InterfaceStability.Unstable
+  public List<ResourceInformation> getAllResourcesListCopy() {
+    List<ResourceInformation> list = new ArrayList<>();
+    for (ResourceInformation i : resources) {
+      ResourceInformation ri = new ResourceInformation();
+      ResourceInformation.copy(i, ri);
+      list.add(ri);
+    }
+    return list;
+  }
+
+  /**
    * Get ResourceInformation for a specified resource.
    *
    * @param resource name of the resource
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
index f4a49f9..493aa7b 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
@@ -30,11 +30,13 @@ import org.apache.hadoop.yarn.exceptions.ResourceNotFoundException;
 import org.apache.hadoop.yarn.server.nodemanager.Context;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.AssignedGpuDevice;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice;
 
 import java.io.IOException;
 import java.io.Serializable;
 import java.util.ArrayList;
+import java.util.Collection;
 import java.util.Collections;
 import java.util.HashMap;
 import java.util.HashSet;
@@ -224,7 +226,20 @@ public class GpuResourceAllocator {
   }
 
   @VisibleForTesting
-  public synchronized Map<GpuDevice, ContainerId> getDeviceAllocationMapping() {
-     return new HashMap<>(usedDevices);
+  public synchronized Map<GpuDevice, ContainerId> getDeviceAllocationMappingCopy() {
+    return new HashMap<>(usedDevices);
+  }
+
+  public synchronized List<GpuDevice> getAllowedGpusCopy() {
+    return new ArrayList<>(allowedGpuDevices);
+  }
+
+  public synchronized List<AssignedGpuDevice> getAssignedGpusCopy() {
+    List<AssignedGpuDevice> assigns = new ArrayList<>();
+    for (Map.Entry<GpuDevice, ContainerId> entry : usedDevices.entrySet()) {
+      assigns.add(new AssignedGpuDevice(entry.getKey().getIndex(),
+          entry.getKey().getMinorNumber(), entry.getValue()));
+    }
+    return assigns;
   }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
index 4a783d3..5003821 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
@@ -133,7 +133,6 @@ public class GpuResourceHandlerImpl implements ResourceHandler {
     return ret;
   }
 
-  @VisibleForTesting
   public GpuResourceAllocator getGpuAllocator() {
     return gpuAllocator;
   }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePlugin.java
index 6e134b3..78167c4 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePlugin.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePlugin.java
@@ -24,6 +24,7 @@ import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileg
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandler;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.NMResourceInfo;
 
 /**
  * {@link ResourcePlugin} is an interface for node manager to easier support
@@ -80,4 +81,14 @@ public interface ResourcePlugin {
    * @throws YarnException if any issue occurs
    */
   void cleanup() throws YarnException;
+
+  /**
+   * Get resource information from this plugin.
+   *
+   * @return NMResourceInfo, an example is
+   * {@link org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.GpuDeviceInformation}
+   *
+   * @throws YarnException when any issue occurs
+   */
+  NMResourceInfo getNMResourceInfo() throws YarnException;
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java
new file mode 100644
index 0000000..df4b905
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java
@@ -0,0 +1,88 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu;
+
+import javax.xml.bind.annotation.XmlAccessType;
+import javax.xml.bind.annotation.XmlAccessorType;
+import javax.xml.bind.annotation.XmlRootElement;
+import org.apache.hadoop.yarn.api.records.ContainerId;
+
+/**
+ * In addition to {@link GpuDevice}, this include container id and more runtime
+ * information related to who is using the GPU device if possible
+ */
+@XmlRootElement
+@XmlAccessorType(XmlAccessType.FIELD)
+public class AssignedGpuDevice extends GpuDevice {
+  private static final long serialVersionUID = -12983712986315L;
+
+  String containerId;
+
+  public AssignedGpuDevice() {
+
+  }
+
+  public AssignedGpuDevice(int index, int minorNumber,
+      ContainerId containerId) {
+    super(index, minorNumber);
+    this.containerId = containerId.toString();
+  }
+
+  public String getContainerId() {
+    return containerId;
+  }
+
+  public void setContainerId(String containerId) {
+    this.containerId = containerId;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+    if (obj == null || !(obj instanceof AssignedGpuDevice)) {
+      return false;
+    }
+    AssignedGpuDevice other = (AssignedGpuDevice) obj;
+    return index == other.index && minorNumber == other.minorNumber
+        && containerId.equals(other.containerId);
+  }
+
+  @Override
+  public int compareTo(Object obj) {
+    if (obj == null || (!(obj instanceof AssignedGpuDevice))) {
+      return -1;
+    }
+
+    AssignedGpuDevice other = (AssignedGpuDevice) obj;
+
+    int result = Integer.compare(index, other.index);
+    if (0 != result) {
+      return result;
+    }
+    result = Integer.compare(minorNumber, other.minorNumber);
+    if (0 != result) {
+      return result;
+    }
+    return containerId.compareTo(other.containerId);
+  }
+
+  @Override
+  public int hashCode() {
+    final int prime = 47;
+    return prime * (prime * index + minorNumber) + containerId.hashCode();
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
index 8119924..6f084e6 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
@@ -19,15 +19,25 @@
 package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu;
 
 import java.io.Serializable;
+import javax.xml.bind.annotation.XmlAccessType;
+import javax.xml.bind.annotation.XmlAccessorType;
+import javax.xml.bind.annotation.XmlRootElement;
+
 
 /**
  * This class is used to represent GPU device while allocation.
  */
+@XmlRootElement
+@XmlAccessorType(XmlAccessType.FIELD)
 public class GpuDevice implements Serializable, Comparable {
-  private int index;
-  private int minorNumber;
+  protected int index;
+  protected int minorNumber;
   private static final long serialVersionUID = -6812314470754667710L;
 
+  public GpuDevice() {
+
+  }
+
   public GpuDevice(int index, int minorNumber) {
     this.index = index;
     this.minorNumber = minorNumber;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuResourcePlugin.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuResourcePlugin.java
index 9576ce7..d294503 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuResourcePlugin.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuResourcePlugin.java
@@ -18,17 +18,25 @@
 
 package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu;
 
+import org.apache.hadoop.yarn.api.records.ContainerId;
 import org.apache.hadoop.yarn.exceptions.YarnException;
 import org.apache.hadoop.yarn.server.nodemanager.Context;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandler;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.NodeResourceUpdaterPlugin;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePlugin;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.NMResourceInfo;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.GpuDeviceInformation;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.NMGpuResourceInfo;
+
+import java.util.List;
+import java.util.Map;
 
 public class GpuResourcePlugin implements ResourcePlugin {
-  private ResourceHandler gpuResourceHandler = null;
+  private GpuResourceHandlerImpl gpuResourceHandler = null;
   private GpuNodeResourceUpdateHandler resourceDiscoverHandler = null;
 
   @Override
@@ -58,4 +66,18 @@ public class GpuResourcePlugin implements ResourcePlugin {
   public void cleanup() throws YarnException {
     // Do nothing.
   }
+
+  @Override
+  public NMResourceInfo getNMResourceInfo() throws YarnException {
+    GpuDeviceInformation gpuDeviceInformation =
+        GpuDiscoverer.getInstance().getGpuDeviceInformation();
+    GpuResourceAllocator gpuResourceAllocator =
+        gpuResourceHandler.getGpuAllocator();
+    List<GpuDevice> totalGpus = gpuResourceAllocator.getAllowedGpusCopy();
+    List<AssignedGpuDevice> assignedGpuDevices =
+        gpuResourceAllocator.getAssignedGpusCopy();
+
+    return new NMGpuResourceInfo(gpuDeviceInformation, totalGpus,
+        assignedGpuDevices);
+  }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java
index 60905d7..7476d75 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java
@@ -27,6 +27,10 @@ import java.util.HashSet;
 import java.util.List;
 import java.util.Map.Entry;
 import java.util.Set;
+
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePlugin;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.NMResourceInfo;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -488,6 +492,29 @@ public class NMWebServices {
     }
   }
 
+  @GET
+  @Path("/resources/{resourcename}")
+  @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
+  public Object getNMResourceInfo(
+      @PathParam("resourcename")
+          String resourceName) throws YarnException {
+    init();
+    ResourcePluginManager rpm = this.nmContext.getResourcePluginManager();
+    if (rpm != null && rpm.getNameToPlugins() != null) {
+      ResourcePlugin plugin = rpm.getNameToPlugins().get(resourceName);
+      if (plugin != null) {
+        NMResourceInfo nmResourceInfo = plugin.getNMResourceInfo();
+        if (nmResourceInfo != null) {
+          return nmResourceInfo;
+        }
+      }
+    }
+
+    throw new YarnException(
+        "Could not get detailed resource information for given resource-name="
+            + resourceName);
+  }
+
   private long parseLongParam(String bytes) {
     if (bytes == null || bytes.isEmpty()) {
       return Long.MAX_VALUE;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NMResourceInfo.java
similarity index 73%
copy from hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
copy to hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NMResourceInfo.java
index d2937a0..18ce8ea 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/NMResourceInfo.java
@@ -16,9 +16,13 @@
  * limitations under the License.
  */
 
-/**
- * Application level global constants go here.
- */
-export default {
-  PARAM_SEPARATOR: '!',
-};
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao;
+
+import javax.xml.bind.annotation.XmlAccessType;
+import javax.xml.bind.annotation.XmlAccessorType;
+import javax.xml.bind.annotation.XmlRootElement;
+
+@XmlRootElement
+@XmlAccessorType(XmlAccessType.FIELD)
+public class NMResourceInfo {
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformation.java
index 977032a..837d5cc 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformation.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/GpuDeviceInformation.java
@@ -25,7 +25,7 @@ import javax.xml.bind.annotation.XmlRootElement;
 import java.util.List;
 
 /**
- * All GPU Device Information in the system.
+ * All GPU Device Information in the system, fetched from nvidia-smi.
  */
 @InterfaceAudience.Private
 @InterfaceStability.Unstable
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/NMGpuResourceInfo.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/NMGpuResourceInfo.java
new file mode 100644
index 0000000..e585537
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/NMGpuResourceInfo.java
@@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu;
+
+import javax.xml.bind.annotation.XmlAccessType;
+import javax.xml.bind.annotation.XmlAccessorType;
+import javax.xml.bind.annotation.XmlRootElement;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.AssignedGpuDevice;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.NMResourceInfo;
+
+import java.util.List;
+
+/**
+ * Gpu device information return to client when
+ * {@link org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices#getNMResourceInfo(String)}
+ * is invoked.
+ */
+@XmlRootElement
+@XmlAccessorType(XmlAccessType.FIELD)
+public class NMGpuResourceInfo extends NMResourceInfo {
+  GpuDeviceInformation gpuDeviceInformation;
+
+  List<GpuDevice> totalGpuDevices;
+  List<AssignedGpuDevice> assignedGpuDevices;
+
+  public NMGpuResourceInfo() {
+
+  }
+
+  public NMGpuResourceInfo(GpuDeviceInformation gpuDeviceInformation,
+      List<GpuDevice> totalGpuDevices,
+      List<AssignedGpuDevice> assignedGpuDevices) {
+    this.gpuDeviceInformation = gpuDeviceInformation;
+    this.totalGpuDevices = totalGpuDevices;
+    this.assignedGpuDevices = assignedGpuDevices;
+  }
+
+  public GpuDeviceInformation getGpuDeviceInformation() {
+    return gpuDeviceInformation;
+  }
+
+  public void setGpuDeviceInformation(
+      GpuDeviceInformation gpuDeviceInformation) {
+    this.gpuDeviceInformation = gpuDeviceInformation;
+  }
+
+  public List<GpuDevice> getTotalGpuDevices() {
+    return totalGpuDevices;
+  }
+
+  public void setTotalGpuDevices(List<GpuDevice> totalGpuDevices) {
+    this.totalGpuDevices = totalGpuDevices;
+  }
+
+  public List<AssignedGpuDevice> getAssignedGpuDevices() {
+    return assignedGpuDevices;
+  }
+
+  public void setAssignedGpuDevices(
+      List<AssignedGpuDevice> assignedGpuDevices) {
+    this.assignedGpuDevices = assignedGpuDevices;
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java
index f315313..25c2e3a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuDeviceInformation.java
@@ -135,7 +135,7 @@ public class PerGpuDeviceInformation {
     this.gpuUtilizations = utilizations;
   }
 
-  @XmlElement(name = "bar1_memory_usage")
+  @XmlElement(name = "fb_memory_usage")
   public PerGpuMemoryUsage getGpuMemoryUsage() {
     return gpuMemoryUsage;
   }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMemoryUsage.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMemoryUsage.java
index 3964c4e..afc1a96 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMemoryUsage.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/PerGpuMemoryUsage.java
@@ -27,7 +27,7 @@ import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter;
 
 @InterfaceAudience.Private
 @InterfaceStability.Unstable
-@XmlRootElement(name = "bar1_memory_usage")
+@XmlRootElement(name = "fb_memory_usage")
 public class PerGpuMemoryUsage {
   long usedMemoryMiB = -1L;
   long availMemoryMiB = -1L;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
index d985b5b..b5796df 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
@@ -374,7 +374,7 @@ public class TestGpuResourceHandler {
     gpuResourceHandler.reacquireContainer(getContainerId(1));
 
     Map<GpuDevice, ContainerId> deviceAllocationMapping =
-        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
+        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMappingCopy();
     Assert.assertEquals(2, deviceAllocationMapping.size());
     Assert.assertTrue(
         deviceAllocationMapping.keySet().contains(new GpuDevice(1, 1)));
@@ -408,7 +408,7 @@ public class TestGpuResourceHandler {
 
     // Make sure internal state not changed.
     deviceAllocationMapping =
-        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
+        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMappingCopy();
     Assert.assertEquals(2, deviceAllocationMapping.size());
     Assert.assertTrue(deviceAllocationMapping.keySet()
         .containsAll(Arrays.asList(new GpuDevice(1, 1), new GpuDevice(2, 3))));
@@ -440,7 +440,7 @@ public class TestGpuResourceHandler {
 
     // Make sure internal state not changed.
     deviceAllocationMapping =
-        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
+        gpuResourceHandler.getGpuAllocator().getDeviceAllocationMappingCopy();
     Assert.assertEquals(2, deviceAllocationMapping.size());
     Assert.assertTrue(deviceAllocationMapping.keySet()
         .containsAll(Arrays.asList(new GpuDevice(1, 1), new GpuDevice(2, 3))));
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
index 4586a7b..72071da 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
@@ -18,25 +18,20 @@
 
 package org.apache.hadoop.yarn.server.nodemanager.webapp;
 
-import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertTrue;
-import static org.junit.Assert.fail;
-
-import java.io.File;
-import java.io.IOException;
-import java.io.PrintWriter;
-import java.io.StringReader;
-import java.net.HttpURLConnection;
-import java.net.URI;
-import java.net.URL;
-import java.util.List;
-import javax.servlet.http.HttpServletResponse;
-import javax.ws.rs.core.MediaType;
-import javax.xml.parsers.DocumentBuilder;
-import javax.xml.parsers.DocumentBuilderFactory;
-
-import org.junit.Assert;
+import com.google.inject.Guice;
+import com.google.inject.Injector;
+import com.google.inject.servlet.GuiceServletContextListener;
+import com.google.inject.servlet.ServletModule;
+import com.sun.jersey.api.client.ClientResponse;
+import com.sun.jersey.api.client.ClientResponse.Status;
+import com.sun.jersey.api.client.GenericType;
+import com.sun.jersey.api.client.UniformInterfaceException;
+import com.sun.jersey.api.client.WebResource;
+import com.sun.jersey.guice.spi.container.servlet.GuiceContainer;
+import com.sun.jersey.test.framework.WebAppDescriptor;
+import javax.xml.bind.annotation.XmlAccessType;
+import javax.xml.bind.annotation.XmlAccessorType;
+import javax.xml.bind.annotation.XmlRootElement;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.FileUtil;
@@ -48,6 +43,7 @@ import org.apache.hadoop.yarn.api.records.ApplicationId;
 import org.apache.hadoop.yarn.api.records.ContainerId;
 import org.apache.hadoop.yarn.conf.YarnConfiguration;
 import org.apache.hadoop.yarn.event.AsyncDispatcher;
+import org.apache.hadoop.yarn.exceptions.YarnException;
 import org.apache.hadoop.yarn.logaggregation.ContainerLogAggregationType;
 import org.apache.hadoop.yarn.logaggregation.ContainerLogFileInfo;
 import org.apache.hadoop.yarn.logaggregation.TestContainerLogsUtils;
@@ -59,7 +55,15 @@ import org.apache.hadoop.yarn.server.nodemanager.ResourceView;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerState;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePlugin;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.AssignedGpuDevice;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice;
 import org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer.NMWebApp;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.NMResourceInfo;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.GpuDeviceInformation;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.NMGpuResourceInfo;
+import org.apache.hadoop.yarn.server.nodemanager.webapp.dao.gpu.PerGpuDeviceInformation;
 import org.apache.hadoop.yarn.server.security.ApplicationACLsManager;
 import org.apache.hadoop.yarn.server.utils.BuilderUtils;
 import org.apache.hadoop.yarn.server.webapp.YarnWebServiceParams;
@@ -73,6 +77,7 @@ import org.apache.hadoop.yarn.webapp.util.WebAppUtils;
 import org.codehaus.jettison.json.JSONException;
 import org.codehaus.jettison.json.JSONObject;
 import org.junit.AfterClass;
+import org.junit.Assert;
 import org.junit.Before;
 import org.junit.Test;
 import org.w3c.dom.Document;
@@ -80,24 +85,35 @@ import org.w3c.dom.Element;
 import org.w3c.dom.NodeList;
 import org.xml.sax.InputSource;
 
-import com.google.inject.Guice;
-import com.google.inject.Injector;
-import com.google.inject.servlet.GuiceServletContextListener;
-import com.google.inject.servlet.ServletModule;
-import com.sun.jersey.api.client.ClientResponse;
-import com.sun.jersey.api.client.ClientResponse.Status;
-import com.sun.jersey.api.client.GenericType;
-import com.sun.jersey.api.client.UniformInterfaceException;
-import com.sun.jersey.api.client.WebResource;
-import com.sun.jersey.guice.spi.container.servlet.GuiceContainer;
-import com.sun.jersey.test.framework.WebAppDescriptor;
+import javax.servlet.http.HttpServletResponse;
+import javax.ws.rs.core.MediaType;
+import javax.xml.parsers.DocumentBuilder;
+import javax.xml.parsers.DocumentBuilderFactory;
+import java.io.File;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.io.StringReader;
+import java.net.HttpURLConnection;
+import java.net.URI;
+import java.net.URL;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
 
 /**
  * Test the nodemanager node info web services api's
  */
 public class TestNMWebServices extends JerseyTestBase {
 
-  private static Context nmContext;
+  private static NodeManager.NMContext nmContext;
   private static ResourceView resourceView;
   private static ApplicationACLsManager aclsManager;
   private static LocalDirsHandlerService dirsHandler;
@@ -411,6 +427,116 @@ public class TestNMWebServices extends JerseyTestBase {
     assertFalse(redirectURL.contains(YarnWebServiceParams.NM_ID));
   }
 
+  @XmlRootElement
+  @XmlAccessorType(XmlAccessType.FIELD)
+  private static class MockNMResourceInfo extends NMResourceInfo {
+    public long a = 1000L;
+    public MockNMResourceInfo() { }
+  }
+
+  @Test
+  public void testGetNMResourceInfo()
+      throws YarnException, InterruptedException, JSONException {
+    ResourcePluginManager rpm = mock(ResourcePluginManager.class);
+    Map<String, ResourcePlugin> namesToPlugins = new HashMap<>();
+    ResourcePlugin mockPlugin1 = mock(ResourcePlugin.class);
+    NMResourceInfo nmResourceInfo1 = new MockNMResourceInfo();
+    when(mockPlugin1.getNMResourceInfo()).thenReturn(nmResourceInfo1);
+    namesToPlugins.put("resource-1", mockPlugin1);
+    namesToPlugins.put("yarn.io/resource-1", mockPlugin1);
+    ResourcePlugin mockPlugin2 = mock(ResourcePlugin.class);
+    namesToPlugins.put("resource-2", mockPlugin2);
+    when(rpm.getNameToPlugins()).thenReturn(namesToPlugins);
+
+    nmContext.setResourcePluginManager(rpm);
+
+    WebResource r = resource();
+    ClientResponse response = r.path("ws").path("v1").path("node").path(
+        "resources").path("resource-2").accept(MediaType.APPLICATION_JSON).get(
+        ClientResponse.class);
+    assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
+
+    // Access resource-2 should fail (null NMResourceInfo returned).
+    JSONObject json = response.getEntity(JSONObject.class);
+    assertIncludesException(json);
+
+    // Access resource-3 should fail (unkown plugin)
+    response = r.path("ws").path("v1").path("node").path(
+        "resources").path("resource-3").accept(MediaType.APPLICATION_JSON).get(
+        ClientResponse.class);
+    assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
+    json = response.getEntity(JSONObject.class);
+    assertIncludesException(json);
+
+    // Access resource-1 should success
+    response = r.path("ws").path("v1").path("node").path(
+        "resources").path("resource-1").accept(MediaType.APPLICATION_JSON).get(
+        ClientResponse.class);
+    assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
+    json = response.getEntity(JSONObject.class);
+    assertEquals(1000, Long.parseLong(json.get("a").toString()));
+
+    // Access resource-1 should success (encoded yarn.io/Fresource-1).
+    response = r.path("ws").path("v1").path("node").path("resources").path(
+        "yarn.io%2Fresource-1").accept(MediaType.APPLICATION_JSON).get(
+        ClientResponse.class);
+    assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
+    json = response.getEntity(JSONObject.class);
+    assertEquals(1000, Long.parseLong(json.get("a").toString()));
+  }
+
+  private ContainerId createContainerId(int id) {
+    ApplicationId appId = ApplicationId.newInstance(0, 0);
+    ApplicationAttemptId appAttemptId =
+        ApplicationAttemptId.newInstance(appId, 1);
+    ContainerId containerId = ContainerId.newContainerId(appAttemptId, id);
+    return containerId;
+  }
+
+  @Test
+  public void testGetYarnGpuResourceInfo()
+      throws YarnException, InterruptedException, JSONException {
+    ResourcePluginManager rpm = mock(ResourcePluginManager.class);
+    Map<String, ResourcePlugin> namesToPlugins = new HashMap<>();
+    ResourcePlugin mockPlugin1 = mock(ResourcePlugin.class);
+    GpuDeviceInformation gpuDeviceInformation = new GpuDeviceInformation();
+    gpuDeviceInformation.setDriverVersion("1.2.3");
+    gpuDeviceInformation.setGpus(Arrays.asList(new PerGpuDeviceInformation()));
+    NMResourceInfo nmResourceInfo1 = new NMGpuResourceInfo(gpuDeviceInformation,
+        Arrays.asList(new GpuDevice(1, 1), new GpuDevice(2, 2),
+            new GpuDevice(3, 3)), Arrays
+        .asList(new AssignedGpuDevice(2, 2, createContainerId(1)),
+            new AssignedGpuDevice(3, 3, createContainerId(2))));
+    when(mockPlugin1.getNMResourceInfo()).thenReturn(nmResourceInfo1);
+    namesToPlugins.put("resource-1", mockPlugin1);
+    namesToPlugins.put("yarn.io/resource-1", mockPlugin1);
+    ResourcePlugin mockPlugin2 = mock(ResourcePlugin.class);
+    namesToPlugins.put("resource-2", mockPlugin2);
+    when(rpm.getNameToPlugins()).thenReturn(namesToPlugins);
+
+    nmContext.setResourcePluginManager(rpm);
+
+    WebResource r = resource();
+    ClientResponse response;
+    JSONObject json;
+
+    // Access resource-1 should success
+    response = r.path("ws").path("v1").path("node").path(
+        "resources").path("resource-1").accept(MediaType.APPLICATION_JSON).get(
+        ClientResponse.class);
+    assertEquals(MediaType.APPLICATION_JSON, response.getType().toString());
+    json = response.getEntity(JSONObject.class);
+    assertEquals("1.2.3",
+        json.getJSONObject("gpuDeviceInformation").get("driver_version"));
+    assertEquals(3, json.getJSONArray("totalGpuDevices").length());
+    assertEquals(2, json.getJSONArray("assignedGpuDevices").length());
+    assertEquals(2, json.getJSONArray("assignedGpuDevices").length());
+  }
+
+  private void assertIncludesException(JSONObject json) {
+    assertTrue(json.has("RemoteException"));
+  }
+
   private void testContainerLogs(WebResource r, ContainerId containerId)
       throws IOException {
     final String containerIdStr = containerId.toString();
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/TestGpuDeviceInformationParser.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/TestGpuDeviceInformationParser.java
index e22597d..dc96746 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/TestGpuDeviceInformationParser.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/gpu/TestGpuDeviceInformationParser.java
@@ -39,7 +39,7 @@ public class TestGpuDeviceInformationParser {
     Assert.assertEquals(2, info.getGpus().size());
     PerGpuDeviceInformation gpu1 = info.getGpus().get(1);
     Assert.assertEquals("Tesla P100-PCIE-12GB", gpu1.getProductName());
-    Assert.assertEquals(16384, gpu1.getGpuMemoryUsage().getTotalMemoryMiB());
+    Assert.assertEquals(12193, gpu1.getGpuMemoryUsage().getTotalMemoryMiB());
     Assert.assertEquals(10.3f,
         gpu1.getGpuUtilizations().getOverallGpuUtilization(), 1e-6);
     Assert.assertEquals(34f, gpu1.getTemperature().getCurrentGpuTemp(), 1e-6);
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/adapters/yarn-nm-gpu.js
similarity index 70%
copy from hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
copy to hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/adapters/yarn-nm-gpu.js
index d2937a0..bf6307a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/adapters/yarn-nm-gpu.js
@@ -16,9 +16,18 @@
  * limitations under the License.
  */
 
-/**
- * Application level global constants go here.
- */
-export default {
-  PARAM_SEPARATOR: '!',
-};
+import AbstractAdapter from './abstract';
+
+export default AbstractAdapter.extend({
+
+  address: "localBaseAddress",
+  restNameSpace: "node",
+  serverName: "NM",
+
+  urlForFindRecord(id/*, modelName, snapshot*/) {
+    var url = this._buildURL();
+    url = url.replace("{nodeAddress}", id) + "/resources/yarn.io%2Fgpu";
+    return url;
+  }
+
+});
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/donut-chart.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/donut-chart.js
index ce26811..5236ca0 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/donut-chart.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/donut-chart.js
@@ -20,6 +20,7 @@ import Ember from 'ember';
 import BaseChartComponent from 'yarn-ui/components/base-chart-component';
 import ColorUtils from 'yarn-ui/utils/color-utils';
 import Converter from 'yarn-ui/utils/converter';
+import {Entities} from 'yarn-ui/constants';
 
 export default BaseChartComponent.extend({
   /*
@@ -41,8 +42,10 @@ export default BaseChartComponent.extend({
     }
 
     if (!middleValue) {
-      if (this.get("type") === "memory") {
+      if (this.get(Entities.Type) === Entities.Memory) {
         middleValue = Converter.memoryToSimpliedUnit(total);
+      } else if (this.get(Entities.Type) === Entities.Resource) {
+        middleValue = Converter.resourceToSimplifiedUnit(total, this.get(Entities.Unit));
       } else {
         middleValue = total;
       }
@@ -151,7 +154,10 @@ export default BaseChartComponent.extend({
           var value = d.value;
           if (this.get("type") === "memory") {
             value = Converter.memoryToSimpliedUnit(value);
+          } else if (this.get("type") === "resource") {
+            value = Converter.resourceToSimplifiedUnit(value, this.get(Entities.Unit));
           }
+
           return d.label + ' = ' + value + suffix;
         }.bind(this));
     }
@@ -185,10 +191,18 @@ export default BaseChartComponent.extend({
     }
 
     this.renderDonutChart(this.get("data"), this.get("title"), this.get("showLabels"),
-                          this.get("middleLabel"), this.get("middleValue"));
+                          this.get("middleLabel"), this.get("middleValue"), this.get("suffix"));
   },
 
   didInsertElement: function() {
+    // When parentIdPrefix is specified, use parentidPrefix + name as new parent
+    // id
+    if (this.get("parentIdPrefix")) {
+      var newParentId = this.get("parentIdPrefix") + this.get("id");
+      this.set("parentId", newParentId);
+      console.log(newParentId);
+    }
+
     this.initChart();
     this.draw();
   },
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/gpu-donut-chart.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/gpu-donut-chart.js
new file mode 100644
index 0000000..fa5ca8a
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/gpu-donut-chart.js
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import DonutChart from 'yarn-ui/components/donut-chart';
+import ColorUtils from 'yarn-ui/utils/color-utils';
+
+export default DonutChart.extend({
+  draw: function() {
+    // Construct data
+    var data = [];
+    if (this.get("gpu-render-type") === "gpu-memory") {
+      data.push({
+        label: "Used",
+        value: parseFloat(this.get("gpuInfo").gpuMemoryUsage.usedMemoryMiB),
+      });
+      data.push({
+        label: "Available",
+        value: parseFloat(this.get("gpuInfo").gpuMemoryUsage.availMemoryMiB)
+      });
+    } else if (this.get("gpu-render-type") === "gpu-utilization") {
+      var utilization = parseFloat(this.get("gpuInfo").gpuUtilizations.overallGpuUtilization);
+      data.push({
+        label: "Utilized",
+        value: utilization,
+      });
+      data.push({
+        label: "Available",
+        value: 100 - utilization
+      });
+    }
+
+    var colorTargets = this.get("colorTargets");
+    if (colorTargets) {
+      var colorTargetReverse = Boolean(this.get("colorTargetReverse"));
+      var targets = colorTargets.split(" ");
+      this.colors = ColorUtils.getColors(data.length, targets, colorTargetReverse);
+    }
+
+    this.renderDonutChart(data, this.get("title"), this.get("showLabels"),
+      this.get("middleLabel"), this.get("middleValue"), this.get("suffix"));
+  },
+
+  didInsertElement: function() {
+    // ParentId includes minorNumber
+    var newParentId = this.get("parentId") + this.get("gpuInfo").minorNumber;
+    this.set("parentId", newParentId);
+
+    this.initChart();
+    this.draw();
+  },
+});
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
index d2937a0..29ad4bc 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
@@ -22,3 +22,16 @@
 export default {
   PARAM_SEPARATOR: '!',
 };
+
+const BASE_UNIT = 1024
+
+export const Type = 'type';
+export const Memory = 'memory';
+export const Resource = 'resource';
+export const Unit = 'unit';
+export const Entities = {
+  Type: 'type',
+  Memory:'memory',
+  Resource: 'resource',
+  Unit: 'unit'
+}
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-nodes/table.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-nodes/table.js
index 3fae596..f4bd578 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-nodes/table.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-nodes/table.js
@@ -60,7 +60,7 @@ export default Ember.Controller.extend({
             getCellContent: function(row) {
               var node_id = row.get("id"),
                   node_addr = row.get("nodeHTTPAddress"),
-                  href = `#/yarn-node/${node_id}/${node_addr}`;
+                  href = `#/yarn-node/${node_id}/${node_addr}/info`;
                 switch(row.get("nodeState")) {
                 case "SHUTDOWN":
                 case "LOST":
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/cluster-metric.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/cluster-metric.js
index dcc0c29..d9a5eef 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/cluster-metric.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/cluster-metric.js
@@ -43,6 +43,8 @@ export default DS.Model.extend({
   decommissionedNodes: DS.attr('number'),
   rebootedNodes: DS.attr('number'),
   activeNodes: DS.attr('number'),
+  totalUsedResourcesAcrossPartition: DS.attr('object'),
+  totalClusterResourcesAcrossPartition: DS.attr('object'),
 
   getFinishedAppsDataForDonutChart: function() {
     var arr = [];
@@ -135,4 +137,71 @@ export default DS.Model.extend({
 
     return arr;
   }.property("allocatedVirtualCores", "reservedVirtualCores", "availableVirtualCores"),
+
+  getResourceTypes: function() {
+    var types = [];
+    if (this.get("totalClusterResourcesAcrossPartition")) {
+
+      console.log(types);
+    }
+  }.property("totalClusterResourcesAcrossPartition"),
+
+  /*
+   * Returned format
+   * [
+   *     {
+   *         name: <resource-name>
+   *         unit: <resource-unit>
+   *         [
+   *            {
+   *               label: <label>
+   *               value: <value>
+   *            },
+   *            {
+   *            }
+   *            ...
+   *         ],
+   *     }
+   * ]
+   */
+  getAllResourceTypesDonutChart: function() {
+    if (this.get("totalClusterResourcesAcrossPartition")
+      && this.get("totalUsedResourcesAcrossPartition")) {
+      var usages = [];
+
+      var clusterResourceInformations = this.get("totalClusterResourcesAcrossPartition").resourcesInformations;
+      var usedResourceInformations = this.get("totalUsedResourcesAcrossPartition").resourcesInformations;
+
+      clusterResourceInformations.forEach(function(cluster) {
+        var perResourceTypeUsage = {
+          name: cluster.name,
+          unit: cluster.units,
+          data: []
+        };
+
+        usedResourceInformations.forEach(function (used) {
+          if (used.name === perResourceTypeUsage.name) {
+            var usedValue = used.value;
+            perResourceTypeUsage.data.push({
+              label: "Used",
+              value: usedValue
+            }, {
+              label: "Available",
+              value: cluster.value - usedValue
+            });
+          }
+        });
+
+        usages.push(perResourceTypeUsage);
+
+        // Make sure id is a valid w3c ID
+        perResourceTypeUsage.id = perResourceTypeUsage.name.replace('/', '-');
+        perResourceTypeUsage.id = perResourceTypeUsage.id.replace('.', '-');
+      });
+
+      console.log(usages);
+      return usages;
+    }
+    return null;
+  }.property()
 });
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-nm-gpu.js
similarity index 81%
copy from hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
copy to hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-nm-gpu.js
index d2937a0..b3e9c2a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-nm-gpu.js
@@ -16,9 +16,12 @@
  * limitations under the License.
  */
 
-/**
- * Application level global constants go here.
- */
-export default {
-  PARAM_SEPARATOR: '!',
-};
+import DS from 'ember-data';
+
+export default DS.Model.extend({
+  info: DS.attr('object'),
+
+  jsonString: function() {
+    return JSON.stringify(this.get("info"));
+  }.property(),
+});
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-queue/capacity-queue.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-queue/capacity-queue.js
index 1cb07bb..9b0f9ac 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-queue/capacity-queue.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-queue/capacity-queue.js
@@ -36,6 +36,7 @@ export default DS.Model.extend({
   numActiveApplications: DS.attr('number'),
   users: DS.hasMany('YarnUser'),
   type: DS.attr('string'),
+  resources: DS.attr('object'),
 
   isLeafQueue: function() {
     var len = this.get("children.length");
@@ -91,5 +92,5 @@ export default DS.Model.extend({
         value: this.get("numActiveApplications") || 0
       }
     ];
-  }.property("numPendingApplications", "numActiveApplications")
+  }.property("numPendingApplications", "numActiveApplications"),
 });
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js
index 20b6f5b..b1b1518 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js
@@ -32,6 +32,8 @@ export default DS.Model.extend({
   availableVirtualCores: DS.attr('number'),
   version: DS.attr('string'),
   nodeLabels: DS.attr('array'),
+  availableResource: DS.attr('object'),
+  usedResource: DS.attr('object'),
 
   nodeLabelsAsString: function() {
     var labels = this.get("nodeLabels");
@@ -90,6 +92,39 @@ export default DS.Model.extend({
     return arr;
   }.property("availableVirtualCores", "usedVirtualCores"),
 
+  getGpuDataForDonutChart: function() {
+    var arr = [];
+    var used = 0;
+    var ri;
+
+    var resourceInformations = this.get("usedResource").resourcesInformations;
+    for (var i = 0; i < resourceInformations.length; i++) {
+      ri = resourceInformations[i];
+      if (ri.name === "yarn.io/gpu") {
+        used = ri.value;
+      }
+    }
+
+    var available = 0;
+    resourceInformations = this.get("availableResource").resourcesInformations;
+    for (i = 0; i < resourceInformations.length; i++) {
+      ri = resourceInformations[i];
+      if (ri.name === "yarn.io/gpu") {
+        available = ri.value;
+      }
+    }
+
+    arr.push({
+      label: "Used",
+      value: used
+    });
+    arr.push({
+      label: "Available",
+      value: available
+    });
+    return arr;
+  }.property("availableResource", "usedResource"),
+
   toolTipText: function() {
     return "<p>Rack: " + this.get("rack") + '</p>' +
            "<p>Host: " + this.get("nodeHostName") + '</p>';
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/router.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/router.js
index 9013142..1a01b86 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/router.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/router.js
@@ -37,7 +37,10 @@ Router.map(function() {
     this.route('apps');
   });
   this.route('yarn-nodes-heatmap');
-  this.route('yarn-node', { path: '/yarn-node/:node_id/:node_addr' });
+  this.route('yarn-node', { path: '/yarn-node/:node_id/:node_addr' }, function() {
+    this.route("info");
+    this.route("yarn-nm-gpu");
+  });
   this.route('yarn-node-apps', { path: '/yarn-node-apps/:node_id/:node_addr' });
   this.route('yarn-node-app',
       { path: '/yarn-node-app/:node_id/:node_addr/:app_id' });
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/cluster-overview.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/cluster-overview.js
index d03ea0d..254ece4 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/cluster-overview.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/cluster-overview.js
@@ -31,7 +31,7 @@ export default AbstractRoute.extend({
       queues: this.store.query("yarn-queue.yarn-queue", {}).then((model) => {
         let type = model.get('firstObject').get('type');
         return this.store.query("yarn-queue." + type + "-queue", {});
-      }),
+      })
     });
   },
 
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node.js
index 3d54846..7ce615c 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node.js
@@ -25,6 +25,7 @@ export default AbstractRoute.extend({
     // Fetches data from both NM and RM. RM is queried to get node usage info.
     return Ember.RSVP.hash({
       nodeInfo: { id: param.node_id, addr: param.node_addr },
+      nmGpuInfo: this.store.findRecord('yarn-nm-gpu', param.node_addr, {reload:true}),
       node: this.store.findRecord('yarn-node', param.node_addr, {reload: true}),
       rmNode: this.store.findRecord('yarn-rm-node', param.node_id, {reload: true})
     });
@@ -33,5 +34,6 @@ export default AbstractRoute.extend({
   unloadAll() {
     this.store.unloadAll('yarn-node');
     this.store.unloadAll('yarn-rm-node');
+    this.store.unloadAll('yarn-nm-gpu');
   }
 });
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node/yarn-nm-gpu.js
similarity index 89%
copy from hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
copy to hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node/yarn-nm-gpu.js
index d2937a0..38ae5d1 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/constants.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node/yarn-nm-gpu.js
@@ -16,9 +16,7 @@
  * limitations under the License.
  */
 
-/**
- * Application level global constants go here.
- */
-export default {
-  PARAM_SEPARATOR: '!',
-};
+import Ember from 'ember';
+
+export default Ember.Route.extend({
+});
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-nm-gpu.js
similarity index 56%
copy from hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node.js
copy to hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-nm-gpu.js
index 3d54846..3567c68 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/yarn-node.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-nm-gpu.js
@@ -16,22 +16,28 @@
  * limitations under the License.
  */
 
-import Ember from 'ember';
+import DS from 'ember-data';
 
-import AbstractRoute from './abstract';
+export default DS.JSONAPISerializer.extend({
+  internalNormalizeSingleResponse(store, primaryModelClass, payload, id) {
+    if (payload.nodeInfo) {
+      payload = payload.nodeInfo;
+    }
 
-export default AbstractRoute.extend({
-  model(param) {
-    // Fetches data from both NM and RM. RM is queried to get node usage info.
-    return Ember.RSVP.hash({
-      nodeInfo: { id: param.node_id, addr: param.node_addr },
-      node: this.store.findRecord('yarn-node', param.node_addr, {reload: true}),
-      rmNode: this.store.findRecord('yarn-rm-node', param.node_id, {reload: true})
-    });
+    var fixedPayload = {
+      id: id,
+      type: primaryModelClass.modelName,
+      attributes: {
+        info: payload
+      }
+    };
+    return fixedPayload;
   },
 
-  unloadAll() {
-    this.store.unloadAll('yarn-node');
-    this.store.unloadAll('yarn-rm-node');
-  }
+  normalizeSingleResponse(store, primaryModelClass, payload, id/*, requestType*/) {
+    // payload is of the form {"nodeInfo":{}}
+    var p = this.internalNormalizeSingleResponse(store,
+      primaryModelClass, payload, id);
+    return { data: p };
+  },
 });
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-queue/capacity-queue.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-queue/capacity-queue.js
index c7350ef..7626598 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-queue/capacity-queue.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-queue/capacity-queue.js
@@ -72,6 +72,7 @@ export default DS.JSONAPISerializer.extend({
           preemptionDisabled: payload.preemptionDisabled,
           numPendingApplications: payload.numPendingApplications,
           numActiveApplications: payload.numActiveApplications,
+          resources: payload.resources,
           type: "capacity",
         },
         // Relationships
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-rm-node.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-rm-node.js
index 1c6d1be..a3a1d59 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-rm-node.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-rm-node.js
@@ -41,7 +41,9 @@ export default DS.JSONAPISerializer.extend({
         usedVirtualCores: payload.usedVirtualCores,
         availableVirtualCores: payload.availableVirtualCores,
         version: payload.version,
-        nodeLabels: payload.nodeLabels
+        nodeLabels: payload.nodeLabels,
+        usedResource: payload.used,
+        availableResource: payload.avail
       }
     };
     return fixedPayload;
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/cluster-overview.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/cluster-overview.hbs
index e549ce5..ff4682a 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/cluster-overview.hbs
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/cluster-overview.hbs
@@ -90,41 +90,71 @@
 
   <hr>
   <div class="row">
-
-    <div class="col-lg-4 container-fluid">
-      <div class="panel panel-default">
-        <div class="panel-heading">
-          Resource - Memory
+    <!-- When getAllResourceTypesDonutChart is not null, use it to show per-resource-type usages. Otherwise only show
+         vcore/memory usage from metrics -->
+    {{#if model.clusterMetrics.firstObject.getAllResourceTypesDonutChart}}
+      {{#each
+        model.clusterMetrics.firstObject.getAllResourceTypesDonutChart as |perTypeUsage|}}
+        <div class="col-lg-4 container-fluid">
+          <div class="panel panel-default">
+            <div class="panel-heading">
+              {{perTypeUsage.name}} - Usages
+            </div>
+            <div class="container-fluid" id="resource-type-{{perTypeUsage.id}}">
+              {{donut-chart
+                data=perTypeUsage.data
+                showLabels=true
+                parentIdPrefix="resource-type-"
+                id=perTypeUsage.id
+                ratio=0.6
+                unit=perTypeUsage.unit
+                type="resource"
+                maxHeight=350
+                colorTargets="good"
+                colorTargetReverse=true}}
+            </div>
+          </div>
         </div>
-        <div class="container-fluid" id="mem-donut-chart">
-          {{donut-chart data=model.clusterMetrics.firstObject.getMemoryDataForDonutChart
-          showLabels=true
-          parentId="mem-donut-chart"
-          ratio=0.6
-          maxHeight=350
-          colorTargets="good"
-          colorTargetReverse=true
-          type="memory"}}
+      {{/each}}
+    {{else}}
+      <div class="col-lg-4 container-fluid">
+        <div class="panel panel-default">
+          <div class="panel-heading">
+            Resource - Memory
+          </div>
+          <div class="container-fluid" id="mem-donut-chart">
+            {{donut-chart
+              data=model.clusterMetrics.firstObject.getMemoryDataForDonutChart
+              showLabels=true
+              parentId="mem-donut-chart"
+              ratio=0.6
+              maxHeight=350
+              colorTargets="good"
+              colorTargetReverse=true
+              type="memory"}}
+          </div>
         </div>
       </div>
-    </div>
 
-    <div class="col-lg-4 container-fluid">
-      <div class="panel panel-default">
-        <div class="panel-heading">
-          Resource - VCores
-        </div>
-        <div class="container-fluid" id="vcore-donut-chart">
-          {{donut-chart data=model.clusterMetrics.firstObject.getVCoreDataForDonutChart
-          showLabels=true
-          parentId="vcore-donut-chart"
-          ratio=0.6
-          maxHeight=350
-          colorTargets="good"
-          colorTargetReverse=true}}
+      <div class="col-lg-4 container-fluid">
+        <div class="panel panel-default">
+          <div class="panel-heading">
+            Resource - VCores
+          </div>
+          <div class="container-fluid" id="vcore-donut-chart">
+            {{donut-chart
+              data=model.clusterMetrics.firstObject.getVCoreDataForDonutChart
+              showLabels=true
+              parentId="vcore-donut-chart"
+              ratio=0.6
+              maxHeight=350
+              colorTargets="good"
+              colorTargetReverse=true}}
+          </div>
         </div>
       </div>
-    </div>
+    {{/if}}
+
   </div>
   <div class="row">
     <div class="col-lg-6 container-fluid">
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs
index d2486c9..fffae30 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs
@@ -24,8 +24,8 @@
       <div class="panel-body">
         <ul class="nav nav-pills nav-stacked" id="stacked-menu">
           <ul class="nav nav-pills nav-stacked collapse in">
-            {{#link-to 'yarn-node' tagName="li"}}
-              {{#link-to 'yarn-node' nodeId nodeAddr}}Node Information
+            {{#link-to 'yarn-node.info' tagName="li"}}
+              {{#link-to 'yarn-node.info' nodeId nodeAddr}}Node Information
               {{/link-to}}
             {{/link-to}}
             {{#link-to 'yarn-node-apps' tagName="li"}}
@@ -36,6 +36,12 @@
               {{#link-to 'yarn-node-containers' nodeId nodeAddr}}List of Containers
               {{/link-to}}
             {{/link-to}}
+            {{#if nmGpuInfo}}
+              {{#link-to 'yarn-node.yarn-nm-gpu' tagName="li"}}
+                {{#link-to 'yarn-node.yarn-nm-gpu' nodeId nodeAddr }}GPU Information
+                {{/link-to}}
+              {{/link-to}}
+            {{/if}}
           </ul>
         </ul>
       </div>
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-nm-gpu-info.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-nm-gpu-info.hbs
new file mode 100644
index 0000000..4118b1e
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-nm-gpu-info.hbs
@@ -0,0 +1,69 @@
+{{!
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+}}
+
+<div class="panel panel-default">
+  <div class="panel-heading">Gpu Information - (Minor
+    Number {{gpu.minorNumber}})
+  </div>
+  <table class="table">
+    <tbody>
+    <tr>
+      <td>Product Name</td>
+      <td>{{gpu.productName}}</td>
+    </tr>
+    <tr>
+      <td>UUID</td>
+      <td>{{gpu.uuid}}</td>
+    </tr>
+    <tr>
+      <td>Current Temperature</td>
+      <td>{{gpu.temperature.currentGpuTemp}}</td>
+    </tr>
+    <tr>
+      <td>Max Temperature</td>
+      <td>{{gpu.temperature.maxGpuTemp}}</td>
+    </tr>
+    </tbody>
+  </table>
+
+  <div class="col-md-5 container-fluid" id="mem-donut-chart{{gpu.minorNumber}}">
+    {{gpu-donut-chart gpuInfo=gpu
+                      showLabels=true
+                      parentId="mem-donut-chart"
+                      middleLabel = "Gpu Memory"
+                      ratio=0.6
+                      type="memory"
+                      gpu-render-type = "gpu-memory"
+                      colorTargets="good"
+                      colorTargetReverse=true
+                      maxHeight=350}}
+  </div>
+
+  <div class="col-md-5 container-fluid" id="utilization-donut-chart{{gpu.minorNumber}}">
+    {{gpu-donut-chart gpuInfo=gpu
+                      showLabels=true
+                      parentId="utilization-donut-chart"
+                      middleLabel = "Gpu Utilization"
+                      ratio=0.6
+                      gpu-render-type = "gpu-utilization"
+                      colorTargets="good"
+                      colorTargetReverse=true
+                      suffix="%"
+                      maxHeight=350}}
+  </div>
+</div>
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node.hbs
deleted file mode 100644
index 1e8549b..0000000
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node.hbs
+++ /dev/null
@@ -1,125 +0,0 @@
-{{!--
-  Licensed to the Apache Software Foundation (ASF) under one
-  or more contributor license agreements.  See the NOTICE file
-  distributed with this work for additional information
-  regarding copyright ownership.  The ASF licenses this file
-  to you under the Apache License, Version 2.0 (the
-  "License"); you may not use this file except in compliance
-  with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
---}}
-
-{{breadcrumb-bar breadcrumbs=breadcrumbs}}
-
-<div class="col-md-12 container-fluid">
-  <div class="row">
-
-    {{node-menu-panel path="yarn-node" nodeId=model.rmNode.id nodeAddr=model.node.id}}
-
-    <div class="col-md-10 container-fluid">
-
-      <div class="row">
-        <div class="col-md-12 container-fluid">
-          <div class="panel panel-default">
-          <div class="panel-heading">Node Information: {{model.rmNode.id}}</div>
-          <table class="table">
-            <tbody>
-            <tr>
-              <td>Total Vmem allocated for Containers</td>
-              <td>{{divide num=model.node.totalVmemAllocatedContainersMB den=1024}} GB</td>
-            </tr>
-            <tr>
-              <td>Vmem enforcement enabled</td>
-              <td>{{model.node.vmemCheckEnabled}}</td>
-            </tr>
-            <tr>
-              <td>Total Pmem allocated for Containers</td>
-              <td>{{divide num=model.node.totalPmemAllocatedContainersMB den=1024}} GB</td>
-            </tr>
-            <tr>
-              <td>Pmem enforcement enabled</td>
-              <td>{{model.node.pmemCheckEnabled}}</td>
-            </tr>
-            <tr>
-              <td>Total VCores allocated for Containers</td>
-              <td>{{model.node.totalVCoresAllocatedContainers}}</td>
-            </tr>
-            <tr>
-              <td>Node Healthy Status</td>
-              <td>{{model.node.nodeHealthy}}</td>
-            </tr>
-            <tr>
-              <td>Last Node Health Report Time</td>
-              <td>{{model.node.lastNodeUpdateTime}}</td>
-            </tr>
-            <tr>
-              <td>Node Health Report</td>
-              <td>{{model.node.healthReport}}</td>
-            </tr>
-            {{#if model.node.nmStartupTime}}
-              <tr>
-                <td>Node Manager Start Time</td>
-                <td>{{model.node.nmStartupTime}}</td>
-              </tr>
-            {{/if}}
-            <tr>
-              <td>Node Manager Version</td>
-              <td>{{model.node.nodeManagerBuildVersion}}</td>
-            </tr>
-            <tr>
-              <td>Hadoop Version</td>
-              <td>{{model.node.hadoopBuildVersion}}</td>
-            </tr>
-            </tbody>
-          </table>
-        </div>
-        </div>
-      </div>
-
-      <div class="row">
-        <div class="col-lg-6 container-fluid">
-          <div class="panel panel-default">
-            <div class="panel-heading">
-              Resource - Memory
-            </div>
-            <div class="container-fluid" id="mem-donut-chart">
-              {{donut-chart data=model.rmNode.getMemoryDataForDonutChart
-              showLabels=true
-              parentId="mem-donut-chart"
-              ratio=0.6
-              type="memory"
-              colorTargets="good"
-              colorTargetReverse=true
-              maxHeight=350}}
-            </div>
-          </div>
-        </div>
-
-        <div class="col-lg-6 container-fluid">
-          <div class="panel panel-default">
-            <div class="panel-heading">
-              Resource - VCores
-            </div>
-            <div class="container-fluid" id="vcore-donut-chart">
-              {{donut-chart data=model.rmNode.getVCoreDataForDonutChart
-              showLabels=true
-              parentId="vcore-donut-chart"
-              ratio=0.6
-              colorTargets="good"
-              colorTargetReverse=true
-              maxHeight=350}}
-            </div>
-          </div>
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-{{outlet}}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/info.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/info.hbs
new file mode 100644
index 0000000..ad411c0
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/info.hbs
@@ -0,0 +1,154 @@
+{{!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+--}}
+
+{{breadcrumb-bar breadcrumbs=breadcrumbs}}
+
+<div class="col-md-12 container-fluid">
+  <div class="row">
+
+    {{node-menu-panel path="yarn-node" nodeId=model.rmNode.id
+                      nodeAddr=model.node.id nmGpuInfo=model.nmGpuInfo}}
+
+    <div class="col-md-10 container-fluid">
+
+      <div class="row">
+        <div class="col-md-12 container-fluid">
+          <div class="panel panel-default">
+            <div class="panel-heading">Node
+              Information: {{model.rmNode.id}}</div>
+            <table class="table">
+              <tbody>
+              <tr>
+                <td>Total Vmem allocated for Containers</td>
+                <td>{{divide num=model.node.totalVmemAllocatedContainersMB
+                             den=1024}} GB
+                </td>
+              </tr>
+              <tr>
+                <td>Vmem enforcement enabled</td>
+                <td>{{model.node.vmemCheckEnabled}}</td>
+              </tr>
+              <tr>
+                <td>Total Pmem allocated for Containers</td>
+                <td>{{divide num=model.node.totalPmemAllocatedContainersMB
+                             den=1024}} GB
+                </td>
+              </tr>
+              <tr>
+                <td>Pmem enforcement enabled</td>
+                <td>{{model.node.pmemCheckEnabled}}</td>
+              </tr>
+              <tr>
+                <td>Total VCores allocated for Containers</td>
+                <td>{{model.node.totalVCoresAllocatedContainers}}</td>
+              </tr>
+              <tr>
+                <td>Node Healthy Status</td>
+                <td>{{model.node.nodeHealthy}}</td>
+              </tr>
+              <tr>
+                <td>Last Node Health Report Time</td>
+                <td>{{model.node.lastNodeUpdateTime}}</td>
+              </tr>
+              <tr>
+                <td>Node Health Report</td>
+                <td>{{model.node.healthReport}}</td>
+              </tr>
+              {{#if model.node.nmStartupTime}}
+                <tr>
+                  <td>Node Manager Start Time</td>
+                  <td>{{model.node.nmStartupTime}}</td>
+                </tr>
+              {{/if}}
+              <tr>
+                <td>Node Manager Version</td>
+                <td>{{model.node.nodeManagerBuildVersion}}</td>
+              </tr>
+              <tr>
+                <td>Hadoop Version</td>
+                <td>{{model.node.hadoopBuildVersion}}</td>
+              </tr>
+              </tbody>
+            </table>
+          </div>
+        </div>
+      </div>
+
+      <div class="row">
+        <div class="col-lg-6 container-fluid">
+          <div class="panel panel-default">
+            <div class="panel-heading">
+              Resource - Memory
+            </div>
+            <div class="container-fluid" id="mem-donut-chart">
+              {{donut-chart data=model.rmNode.getMemoryDataForDonutChart
+                            showLabels=true
+                            parentId="mem-donut-chart"
+                            ratio=0.6
+                            type="memory"
+                            colorTargets="good"
+                            colorTargetReverse=true
+                            maxHeight=350}}
+            </div>
+          </div>
+        </div>
+
+        <div class="col-lg-6 container-fluid">
+          <div class="panel panel-default">
+            <div class="panel-heading">
+              Resource - VCores
+            </div>
+            <div class="container-fluid" id="vcore-donut-chart">
+              {{donut-chart data=model.rmNode.getVCoreDataForDonutChart
+                            showLabels=true
+                            parentId="vcore-donut-chart"
+                            ratio=0.6
+                            colorTargets="good"
+                            colorTargetReverse=true
+                            maxHeight=350}}
+            </div>
+          </div>
+        </div>
+      </div>
+
+      {{#if model.nmGpuInfo}}
+        <div class="row">
+          <div class="col-lg-6 container-fluid">
+            <div class="panel panel-default">
+              <div class="panel-heading">
+                <li>
+                  Resources - yarn.io/gpu
+                </li>
+              </div>
+              <div class="container-fluid" id="gpu-donut-chart">
+                {{donut-chart data=model.rmNode.getGpuDataForDonutChart
+                              showLabels=true
+                              parentId="gpu-donut-chart"
+                              ratio=0.6
+                              colorTargets="good"
+                              colorTargetReverse=true
+                              maxHeight=350}}
+              </div>
+            </div>
+          </div>
+        </div>
+      {{/if}}
+    </div>
+  </div>
+</div>
+{{outlet}}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/yarn-nm-gpu.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/yarn-nm-gpu.hbs
new file mode 100644
index 0000000..55840ad
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/yarn-nm-gpu.hbs
@@ -0,0 +1,53 @@
+{{!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+--}}
+
+{{breadcrumb-bar breadcrumbs=breadcrumbs}}
+
+<div class="col-md-12 container-fluid">
+  <div class="row">
+
+    {{node-menu-panel path="yarn-node" nodeId=model.rmNode.id
+                      nodeAddr=model.node.id nmGpuInfo=model.nmGpuInfo}}
+
+    <div class="col-md-10 container-fluid">
+      <div class="panel panel-default">
+        <div class="panel-heading">Gpu Information</div>
+        <table class="table">
+          <tbody>
+          <tr>
+            <td>Vendor</td>
+            <td>NVIDIA</td>
+          </tr>
+          <tr>
+            <td>Driver Version</td>
+            <td>{{model.nmGpuInfo.info.gpuDeviceInformation.driverVersion}}</td>
+          </tr>
+          <tr>
+            <td>Total Number Of Gpus</td>
+            <td>{{model.nmGpuInfo.info.totalGpuDevices.length}}</td>
+          </tr>
+          </tbody>
+        </table>
+      </div>
+
+      {{#each model.nmGpuInfo.info.gpuDeviceInformation.gpus as |gpu|}}
+        {{yarn-nm-gpu-info gpu=gpu}}
+      {{/each}}
+    </div>
+  </div>
+</div>
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/utils/converter.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/utils/converter.js
index 7c9a1f8..e47edad 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/utils/converter.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/utils/converter.js
@@ -130,6 +130,57 @@ export default {
     }
     return value.toFixed(1) + " " + unit;
   },
+  resourceToSimplifiedUnit: function (value, unit) {
+    // First convert unit to base unit ("").
+    var normalizedValue = value;
+    if (unit === "Ki") {
+      normalizedValue = normalizedValue * 1024;
+    } else if (unit === "Mi") {
+      normalizedValue = normalizedValue * 1024 * 1024;
+    } else if (unit === "Gi") {
+      normalizedValue = normalizedValue * 1024 * 1024 * 1024;
+    } else if (unit === "Ti") {
+      normalizedValue = normalizedValue * 1024 * 1024 * 1024 * 1024;
+    } else if (unit === "Pi") {
+      normalizedValue = normalizedValue * 1024 * 1024 * 1024 * 1024 * 1024;
+    } else if (unit === "K" || unit === "k") {
+      normalizedValue = normalizedValue * 1000;
+    } else if (unit === "M" || unit === "m") {
+      normalizedValue = normalizedValue * 1000 * 1000;
+    } else if (unit === "G" || unit === "g") {
+      normalizedValue = normalizedValue * 1000 * 1000 * 1000;
+    } else if (unit === "T" || unit === "t") {
+      normalizedValue = normalizedValue * 1000 * 1000 * 1000 * 1000;
+    } else if (unit === "P" || unit === "p") {
+      normalizedValue = normalizedValue * 1000 * 1000 * 1000 * 1000 * 1000;
+    }
+
+    // From baseunit ("") convert to most human readable unit
+    // (which value < 1024 * 0.9).
+    var finalUnit = "";
+    if (normalizedValue / 1024 >= 0.9) {
+      normalizedValue = normalizedValue / 1024;
+      finalUnit = "Ki";
+    }
+    if (normalizedValue / 1024 >= 0.9) {
+      normalizedValue = normalizedValue / 1024;
+      finalUnit = "Mi";
+    }
+    if (normalizedValue / 1024 >= 0.9) {
+      normalizedValue = normalizedValue / 1024;
+      finalUnit = "Gi";
+    }
+    if (normalizedValue / 1024 >= 0.9) {
+      normalizedValue = normalizedValue / 1024;
+      finalUnit = "Ti";
+    }
+    if (normalizedValue / 1024 >= 0.9) {
+      normalizedValue = normalizedValue / 1024;
+      finalUnit = "Pi";
+    }
+
+    return normalizedValue.toFixed(1) + " " + finalUnit;
+  },
   msToElapsedTimeUnit: function(millisecs, short) {
     var seconds = Math.floor(millisecs / 1000);
     var days = Math.floor(seconds / (3600 * 24));


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 08/20: YARN-9174. Backport YARN-7224 for refactoring of GpuDevice class

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit faf0b36e2f9a17f1aa50154beae996cd6a2695b3
Author: Jonathan Hung <jh...@linkedin.com>
AuthorDate: Wed Feb 6 16:44:26 2019 -0800

    YARN-9174. Backport YARN-7224 for refactoring of GpuDevice class
---
 .../linux/resources/gpu/GpuResourceAllocator.java  | 102 ++++++-------
 .../resources/gpu/GpuResourceHandlerImpl.java      |  26 ++--
 .../resourceplugin/gpu/GpuDevice.java              |  78 ++++++++++
 .../resourceplugin/gpu/GpuDiscoverer.java          |  30 ++--
 .../gpu/GpuNodeResourceUpdateHandler.java          |  10 +-
 .../recovery/NMLeveldbStateStoreService.java       |  66 +++++----
 .../recovery/NMNullStateStoreService.java          |   3 +-
 .../nodemanager/recovery/NMStateStoreService.java  |  15 +-
 .../TestContainerManagerRecovery.java              |   9 +-
 .../resources/gpu/TestGpuResourceHandler.java      | 161 +++++++++++++++------
 .../resourceplugin/gpu/TestGpuDiscoverer.java      |  34 ++++-
 .../recovery/NMMemoryStateStoreService.java        |   8 +-
 .../recovery/TestNMLeveldbStateStoreService.java   |  22 ++-
 13 files changed, 385 insertions(+), 179 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
index d6bae09..f4a49f9 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
@@ -26,12 +26,11 @@ import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.util.StringUtils;
 import org.apache.hadoop.yarn.api.records.ContainerId;
 import org.apache.hadoop.yarn.api.records.Resource;
-import org.apache.hadoop.yarn.api.records.ResourceInformation;
 import org.apache.hadoop.yarn.exceptions.ResourceNotFoundException;
 import org.apache.hadoop.yarn.server.nodemanager.Context;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
-import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice;
 
 import java.io.IOException;
 import java.io.Serializable;
@@ -54,8 +53,8 @@ import static org.apache.hadoop.yarn.api.records.ResourceInformation.GPU_URI;
 public class GpuResourceAllocator {
   final static Log LOG = LogFactory.getLog(GpuResourceAllocator.class);
 
-  private Set<Integer> allowedGpuDevices = new TreeSet<>();
-  private Map<Integer, ContainerId> usedDevices = new TreeMap<>();
+  private Set<GpuDevice> allowedGpuDevices = new TreeSet<>();
+  private Map<GpuDevice, ContainerId> usedDevices = new TreeMap<>();
   private Context nmContext;
 
   public GpuResourceAllocator(Context ctx) {
@@ -63,14 +62,14 @@ public class GpuResourceAllocator {
   }
 
   /**
-   * Contains allowed and denied devices with minor number.
+   * Contains allowed and denied devices
    * Denied devices will be useful for cgroups devices module to do blacklisting
    */
   static class GpuAllocation {
-    private Set<Integer> allowed = Collections.emptySet();
-    private Set<Integer> denied = Collections.emptySet();
+    private Set<GpuDevice> allowed = Collections.emptySet();
+    private Set<GpuDevice> denied = Collections.emptySet();
 
-    GpuAllocation(Set<Integer> allowed, Set<Integer> denied) {
+    GpuAllocation(Set<GpuDevice> allowed, Set<GpuDevice> denied) {
       if (allowed != null) {
         this.allowed = ImmutableSet.copyOf(allowed);
       }
@@ -79,21 +78,21 @@ public class GpuResourceAllocator {
       }
     }
 
-    public Set<Integer> getAllowedGPUs() {
+    public Set<GpuDevice> getAllowedGPUs() {
       return allowed;
     }
 
-    public Set<Integer> getDeniedGPUs() {
+    public Set<GpuDevice> getDeniedGPUs() {
       return denied;
     }
   }
 
   /**
    * Add GPU to allowed list
-   * @param minorNumber minor number of the GPU device.
+   * @param gpuDevice gpu device
    */
-  public synchronized void addGpu(int minorNumber) {
-    allowedGpuDevices.add(minorNumber);
+  public synchronized void addGpu(GpuDevice gpuDevice) {
+    allowedGpuDevices.add(gpuDevice);
   }
 
   private String getResourceHandlerExceptionMessage(int numRequestedGpuDevices,
@@ -117,42 +116,42 @@ public class GpuResourceAllocator {
               + containerId);
     }
 
-    for (Serializable deviceId : c.getResourceMappings().getAssignedResources(
-        GPU_URI)){
-      if (!(deviceId instanceof String)) {
+    for (Serializable gpuDeviceSerializable : c.getResourceMappings()
+        .getAssignedResources(GPU_URI)) {
+      if (!(gpuDeviceSerializable instanceof GpuDevice)) {
         throw new ResourceHandlerException(
             "Trying to recover device id, however it"
-                + " is not String, this shouldn't happen");
+                + " is not GpuDevice, this shouldn't happen");
       }
 
-
-      int devId;
-      try {
-        devId = Integer.parseInt((String)deviceId);
-      } catch (NumberFormatException e) {
-        throw new ResourceHandlerException("Failed to recover device id because"
-            + "it is not a valid integer, devId:" + deviceId);
-      }
+      GpuDevice gpuDevice = (GpuDevice) gpuDeviceSerializable;
 
       // Make sure it is in allowed GPU device.
-      if (!allowedGpuDevices.contains(devId)) {
-        throw new ResourceHandlerException("Try to recover device id = " + devId
-            + " however it is not in allowed device list:" + StringUtils
-            .join(",", allowedGpuDevices));
+      if (!allowedGpuDevices.contains(gpuDevice)) {
+        throw new ResourceHandlerException(
+            "Try to recover device = " + gpuDevice
+                + " however it is not in allowed device list:" + StringUtils
+                .join(",", allowedGpuDevices));
       }
 
       // Make sure it is not occupied by anybody else
-      if (usedDevices.containsKey(devId)) {
-        throw new ResourceHandlerException("Try to recover device id = " + devId
-            + " however it is already assigned to container=" + usedDevices
-            .get(devId) + ", please double check what happened.");
+      if (usedDevices.containsKey(gpuDevice)) {
+        throw new ResourceHandlerException(
+            "Try to recover device id = " + gpuDevice
+                + " however it is already assigned to container=" + usedDevices
+                .get(gpuDevice) + ", please double check what happened.");
       }
 
-      usedDevices.put(devId, containerId);
+      usedDevices.put(gpuDevice, containerId);
     }
   }
 
-  private int getRequestedGpus(Resource requestedResource) {
+  /**
+   * Get number of requested GPUs from resource.
+   * @param requestedResource requested resource
+   * @return #gpus.
+   */
+  public static int getRequestedGpus(Resource requestedResource) {
     try {
       return Long.valueOf(requestedResource.getResourceValue(
           GPU_URI)).intValue();
@@ -164,8 +163,8 @@ public class GpuResourceAllocator {
   /**
    * Assign GPU to requestor
    * @param container container to allocate
-   * @return List of denied Gpus with minor numbers
-   * @throws ResourceHandlerException When failed to
+   * @return allocation results.
+   * @throws ResourceHandlerException When failed to assign GPUs.
    */
   public synchronized GpuAllocation assignGpus(Container container)
       throws ResourceHandlerException {
@@ -180,12 +179,12 @@ public class GpuResourceAllocator {
                 containerId));
       }
 
-      Set<Integer> assignedGpus = new HashSet<>();
+      Set<GpuDevice> assignedGpus = new TreeSet<>();
 
-      for (int deviceNum : allowedGpuDevices) {
-        if (!usedDevices.containsKey(deviceNum)) {
-          usedDevices.put(deviceNum, containerId);
-          assignedGpus.add(deviceNum);
+      for (GpuDevice gpu : allowedGpuDevices) {
+        if (!usedDevices.containsKey(gpu)) {
+          usedDevices.put(gpu, containerId);
+          assignedGpus.add(gpu);
           if (assignedGpus.size() == numRequestedGpuDevices) {
             break;
           }
@@ -194,21 +193,10 @@ public class GpuResourceAllocator {
 
       // Record in state store if we allocated anything
       if (!assignedGpus.isEmpty()) {
-        List<Serializable> allocatedDevices = new ArrayList<>();
-        for (int gpu : assignedGpus) {
-          allocatedDevices.add(String.valueOf(gpu));
-        }
         try {
-          // Update Container#getResourceMapping.
-          ResourceMappings.AssignedResources assignedResources =
-              new ResourceMappings.AssignedResources();
-          assignedResources.updateAssignedResources(allocatedDevices);
-          container.getResourceMappings().addAssignedResources(GPU_URI,
-              assignedResources);
-
           // Update state store.
-          nmContext.getNMStateStore().storeAssignedResources(containerId,
-              GPU_URI, allocatedDevices);
+          nmContext.getNMStateStore().storeAssignedResources(container, GPU_URI,
+              new ArrayList<Serializable>(assignedGpus));
         } catch (IOException e) {
           cleanupAssignGpus(containerId);
           throw new ResourceHandlerException(e);
@@ -226,7 +214,7 @@ public class GpuResourceAllocator {
    * @param containerId containerId
    */
   public synchronized void cleanupAssignGpus(ContainerId containerId) {
-    Iterator<Map.Entry<Integer, ContainerId>> iter =
+    Iterator<Map.Entry<GpuDevice, ContainerId>> iter =
         usedDevices.entrySet().iterator();
     while (iter.hasNext()) {
       if (iter.next().getValue().equals(containerId)) {
@@ -236,7 +224,7 @@ public class GpuResourceAllocator {
   }
 
   @VisibleForTesting
-  public synchronized Map<Integer, ContainerId> getDeviceAllocationMapping() {
+  public synchronized Map<GpuDevice, ContainerId> getDeviceAllocationMapping() {
      return new HashMap<>(usedDevices);
   }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
index 7144bb2..4a783d3 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
@@ -24,8 +24,6 @@ import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.util.StringUtils;
 import org.apache.hadoop.yarn.api.records.ContainerId;
-import org.apache.hadoop.yarn.api.records.ResourceInformation;
-import org.apache.hadoop.yarn.exceptions.ResourceNotFoundException;
 import org.apache.hadoop.yarn.exceptions.YarnException;
 import org.apache.hadoop.yarn.server.nodemanager.Context;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
@@ -35,6 +33,7 @@ import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileg
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandler;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer;
 
 import java.util.ArrayList;
@@ -64,17 +63,23 @@ public class GpuResourceHandlerImpl implements ResourceHandler {
   @Override
   public List<PrivilegedOperation> bootstrap(Configuration configuration)
       throws ResourceHandlerException {
-    List<Integer> minorNumbersOfUsableGpus;
+    List<GpuDevice> usableGpus;
     try {
-      minorNumbersOfUsableGpus = GpuDiscoverer.getInstance()
-          .getMinorNumbersOfGpusUsableByYarn();
+      usableGpus = GpuDiscoverer.getInstance()
+          .getGpusUsableByYarn();
+      if (usableGpus == null || usableGpus.isEmpty()) {
+        String message = "GPU is enabled on the NodeManager, but couldn't find "
+            + "any usable GPU devices, please double check configuration.";
+        LOG.error(message);
+        throw new ResourceHandlerException(message);
+      }
     } catch (YarnException e) {
       LOG.error("Exception when trying to get usable GPU device", e);
       throw new ResourceHandlerException(e);
     }
 
-    for (int minorNumber : minorNumbersOfUsableGpus) {
-      gpuAllocator.addGpu(minorNumber);
+    for (GpuDevice gpu : usableGpus) {
+      gpuAllocator.addGpu(gpu);
     }
 
     // And initialize cgroups
@@ -102,10 +107,13 @@ public class GpuResourceHandlerImpl implements ResourceHandler {
           PrivilegedOperation.OperationType.GPU, Arrays
           .asList(CONTAINER_ID_CLI_OPTION, containerIdStr));
       if (!allocation.getDeniedGPUs().isEmpty()) {
+        List<Integer> minorNumbers = new ArrayList<>();
+        for (GpuDevice deniedGpu : allocation.getDeniedGPUs()) {
+          minorNumbers.add(deniedGpu.getMinorNumber());
+        }
         privilegedOperation.appendArgs(Arrays.asList(EXCLUDED_GPUS_CLI_OPTION,
-            StringUtils.join(",", allocation.getDeniedGPUs())));
+            StringUtils.join(",", minorNumbers)));
       }
-
       privilegedOperationExecutor.executePrivilegedOperation(
           privilegedOperation, true);
     } catch (PrivilegedOperationException e) {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
new file mode 100644
index 0000000..8119924
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
@@ -0,0 +1,78 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu;
+
+import java.io.Serializable;
+
+/**
+ * This class is used to represent GPU device while allocation.
+ */
+public class GpuDevice implements Serializable, Comparable {
+  private int index;
+  private int minorNumber;
+  private static final long serialVersionUID = -6812314470754667710L;
+
+  public GpuDevice(int index, int minorNumber) {
+    this.index = index;
+    this.minorNumber = minorNumber;
+  }
+
+  public int getIndex() {
+    return index;
+  }
+
+  public int getMinorNumber() {
+    return minorNumber;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+    if (obj == null || !(obj instanceof GpuDevice)) {
+      return false;
+    }
+    GpuDevice other = (GpuDevice) obj;
+    return index == other.index && minorNumber == other.minorNumber;
+  }
+
+  @Override
+  public int compareTo(Object obj) {
+    if (obj == null || (!(obj instanceof  GpuDevice))) {
+      return -1;
+    }
+
+    GpuDevice other = (GpuDevice) obj;
+
+    int result = Integer.compare(index, other.index);
+    if (0 != result) {
+      return result;
+    }
+    return Integer.compare(minorNumber, other.minorNumber);
+  }
+
+  @Override
+  public int hashCode() {
+    final int prime = 47;
+    return prime * index + minorNumber;
+  }
+
+  @Override
+  public String toString() {
+    return "(index=" + index + ",minor_number=" + minorNumber + ")";
+  }
+}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java
index 61b8ce5..6e3cf13 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDiscoverer.java
@@ -136,12 +136,12 @@ public class GpuDiscoverer {
   }
 
   /**
-   * Get list of minor device numbers of Gpu devices usable by YARN.
+   * Get list of GPU devices usable by YARN.
    *
-   * @return List of minor device numbers of Gpu devices.
+   * @return List of GPU devices
    * @throws YarnException when any issue happens
    */
-  public synchronized List<Integer> getMinorNumbersOfGpusUsableByYarn()
+  public synchronized List<GpuDevice> getGpusUsableByYarn()
       throws YarnException {
     validateConfOrThrowException();
 
@@ -149,7 +149,7 @@ public class GpuDiscoverer {
         YarnConfiguration.NM_GPU_ALLOWED_DEVICES,
         YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES);
 
-    List<Integer> minorNumbers = new ArrayList<>();
+    List<GpuDevice> gpuDevices = new ArrayList<>();
 
     if (allowedDevicesStr.equals(
         YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
@@ -167,21 +167,31 @@ public class GpuDiscoverer {
       }
 
       if (lastDiscoveredGpuInformation.getGpus() != null) {
-        for (PerGpuDeviceInformation gpu : lastDiscoveredGpuInformation
-            .getGpus()) {
-          minorNumbers.add(gpu.getMinorNumber());
+        for (int i = 0; i < lastDiscoveredGpuInformation.getGpus().size();
+             i++) {
+          List<PerGpuDeviceInformation> gpuInfos =
+              lastDiscoveredGpuInformation.getGpus();
+          gpuDevices.add(new GpuDevice(i, gpuInfos.get(i).getMinorNumber()));
         }
       }
     } else{
       for (String s : allowedDevicesStr.split(",")) {
         if (s.trim().length() > 0) {
-          minorNumbers.add(Integer.valueOf(s.trim()));
+          String[] kv = s.trim().split(":");
+          if (kv.length != 2) {
+            throw new YarnException(
+                "Illegal format, it should be index:minor_number format, now it="
+                    + s);
+          }
+
+          gpuDevices.add(
+              new GpuDevice(Integer.parseInt(kv[0]), Integer.parseInt(kv[1])));
         }
       }
-      LOG.info("Allowed GPU devices with minor numbers:" + allowedDevicesStr);
+      LOG.info("Allowed GPU devices:" + gpuDevices);
     }
 
-    return minorNumbers;
+    return gpuDevices;
   }
 
   public synchronized void initialize(Configuration conf) throws YarnException {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuNodeResourceUpdateHandler.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuNodeResourceUpdateHandler.java
index f6bf506..796eb25 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuNodeResourceUpdateHandler.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuNodeResourceUpdateHandler.java
@@ -40,12 +40,14 @@ public class GpuNodeResourceUpdateHandler extends NodeResourceUpdaterPlugin {
   public void updateConfiguredResource(Resource res) throws YarnException {
     LOG.info("Initializing configured GPU resources for the NodeManager.");
 
-    List<Integer> usableGpus =
-        GpuDiscoverer.getInstance().getMinorNumbersOfGpusUsableByYarn();
+    List<GpuDevice> usableGpus =
+        GpuDiscoverer.getInstance().getGpusUsableByYarn();
     if (null == usableGpus || usableGpus.isEmpty()) {
-      LOG.info("Didn't find any usable GPUs on the NodeManager.");
+      String message = "GPU is enabled, but couldn't find any usable GPUs on the "
+          + "NodeManager.";
+      LOG.error(message);
       // No gpu can be used by YARN.
-      return;
+      throw new YarnException(message);
     }
 
     long nUsableGpus = usableGpus.size();
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
index 6aec1be..0cbf078 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
@@ -18,28 +18,9 @@
 
 package org.apache.hadoop.yarn.server.nodemanager.recovery;
 
-import static org.fusesource.leveldbjni.JniDBFactory.asString;
-import static org.fusesource.leveldbjni.JniDBFactory.bytes;
-
-import org.slf4j.Logger;
-import org.apache.hadoop.yarn.api.records.Token;
-import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
-import org.slf4j.LoggerFactory;
-
-import java.io.File;
-import java.io.IOException;
-import java.io.Serializable;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Map.Entry;
-import java.util.Timer;
-import java.util.TimerTask;
-import java.util.Set;
-
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.ListMultimap;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
@@ -51,9 +32,11 @@ import org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestP
 import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
 import org.apache.hadoop.yarn.api.records.ApplicationId;
 import org.apache.hadoop.yarn.api.records.ContainerId;
+import org.apache.hadoop.yarn.api.records.Token;
 import org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl;
 import org.apache.hadoop.yarn.conf.YarnConfiguration;
 import org.apache.hadoop.yarn.proto.YarnProtos.LocalResourceProto;
+import org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos.ContainerTokenIdentifierProto;
 import org.apache.hadoop.yarn.proto.YarnServerCommonProtos.MasterKeyProto;
 import org.apache.hadoop.yarn.proto.YarnServerCommonProtos.VersionProto;
 import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.ContainerManagerApplicationProto;
@@ -61,9 +44,10 @@ import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.Deletion
 import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.LocalizedResourceProto;
 import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.LogDeleterProto;
 import org.apache.hadoop.yarn.proto.YarnServiceProtos.StartContainerRequestProto;
-import org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos.ContainerTokenIdentifierProto;
+import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
 import org.apache.hadoop.yarn.server.api.records.MasterKey;
 import org.apache.hadoop.yarn.server.api.records.impl.pb.MasterKeyPBImpl;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 import org.apache.hadoop.yarn.server.records.Version;
 import org.apache.hadoop.yarn.server.records.impl.pb.VersionPBImpl;
@@ -76,10 +60,26 @@ import org.iq80.leveldb.DB;
 import org.iq80.leveldb.DBException;
 import org.iq80.leveldb.Options;
 import org.iq80.leveldb.WriteBatch;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Set;
+import java.util.Timer;
+import java.util.TimerTask;
+
+import static org.fusesource.leveldbjni.JniDBFactory.asString;
+import static org.fusesource.leveldbjni.JniDBFactory.bytes;
 
-import com.google.common.annotations.VisibleForTesting;
-import com.google.common.collect.ArrayListMultimap;
-import com.google.common.collect.ListMultimap;
 
 public class NMLeveldbStateStoreService extends NMStateStoreService {
 
@@ -1180,15 +1180,18 @@ public class NMLeveldbStateStoreService extends NMStateStoreService {
   }
 
   @Override
-  public void storeAssignedResources(ContainerId containerId,
+  public void storeAssignedResources(Container container,
       String resourceType, List<Serializable> assignedResources)
       throws IOException {
     if (LOG.isDebugEnabled()) {
-      LOG.debug("storeAssignedResources: containerId=" + containerId
-          + ", assignedResources=" + StringUtils.join(",", assignedResources));
+      LOG.debug(
+          "storeAssignedResources: containerId=" + container.getContainerId()
+              + ", assignedResources=" + StringUtils
+              .join(",", assignedResources));
+
     }
 
-    String keyResChng = CONTAINERS_KEY_PREFIX + containerId.toString()
+    String keyResChng = CONTAINERS_KEY_PREFIX + container.getContainerId().toString()
         + CONTAINER_ASSIGNED_RESOURCES_KEY_SUFFIX + resourceType;
     try {
       WriteBatch batch = db.createWriteBatch();
@@ -1206,6 +1209,9 @@ public class NMLeveldbStateStoreService extends NMStateStoreService {
     } catch (DBException e) {
       throw new IOException(e);
     }
+
+    // update container resource mapping.
+    updateContainerResourceMapping(container, resourceType, assignedResources);
   }
 
   @SuppressWarnings("deprecation")
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
index 6e3707b..7d1010f 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
@@ -35,6 +35,7 @@ import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.Localize
 import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.LogDeleterProto;
 import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
 import org.apache.hadoop.yarn.server.api.records.MasterKey;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
 
 // The state store to use when state isn't being stored
 public class NMNullStateStoreService extends NMStateStoreService {
@@ -268,7 +269,7 @@ public class NMNullStateStoreService extends NMStateStoreService {
   }
 
   @Override
-  public void storeAssignedResources(ContainerId containerId,
+  public void storeAssignedResources(Container container,
       String resourceType, List<Serializable> assignedResources)
       throws IOException {
   }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
index a929fe2..350f242 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
@@ -44,6 +44,7 @@ import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.Localize
 import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.LogDeleterProto;
 import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
 import org.apache.hadoop.yarn.server.api.records.MasterKey;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 
 @Private
@@ -732,12 +733,12 @@ public abstract class NMStateStoreService extends AbstractService {
   /**
    * Store the assigned resources to a container.
    *
-   * @param containerId Container Id
+   * @param container NMContainer
    * @param resourceType Resource Type
    * @param assignedResources Assigned resources
    * @throws IOException if fails
    */
-  public abstract void storeAssignedResources(ContainerId containerId,
+  public abstract void storeAssignedResources(Container container,
       String resourceType, List<Serializable> assignedResources)
       throws IOException;
 
@@ -746,4 +747,14 @@ public abstract class NMStateStoreService extends AbstractService {
   protected abstract void startStorage() throws IOException;
 
   protected abstract void closeStorage() throws IOException;
+
+  protected void updateContainerResourceMapping(Container container,
+      String resourceType, List<Serializable> assignedResources) {
+    // Update Container#getResourceMapping.
+    ResourceMappings.AssignedResources newAssigned =
+        new ResourceMappings.AssignedResources();
+    newAssigned.updateAssignedResources(assignedResources);
+    container.getResourceMappings().addAssignedResources(resourceType,
+        newAssigned);
+  }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
index 6241055..52fa9f3 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
@@ -519,18 +519,20 @@ public class TestContainerManagerRecovery extends BaseContainerManagerTest {
 
     commonLaunchContainer(appId, cid, cm);
 
+    Container nmContainer = context.getContainers().get(cid);
+
     Application app = context.getApplications().get(appId);
     assertNotNull(app);
 
     // store resource mapping of the container
     List<Serializable> gpuResources =
         Arrays.<Serializable>asList("1", "2", "3");
-    stateStore.storeAssignedResources(cid, "gpu", gpuResources);
+    stateStore.storeAssignedResources(nmContainer, "gpu", gpuResources);
     List<Serializable> numaResources = Arrays.<Serializable>asList("numa1");
-    stateStore.storeAssignedResources(cid, "numa", numaResources);
+    stateStore.storeAssignedResources(nmContainer, "numa", numaResources);
     List<Serializable> fpgaResources =
         Arrays.<Serializable>asList("fpga1", "fpga2");
-    stateStore.storeAssignedResources(cid, "fpga", fpgaResources);
+    stateStore.storeAssignedResources(nmContainer, "fpga", fpgaResources);
 
     cm.stop();
     context = createContext(conf, stateStore);
@@ -542,7 +544,6 @@ public class TestContainerManagerRecovery extends BaseContainerManagerTest {
     app = context.getApplications().get(appId);
     assertNotNull(app);
 
-    Container nmContainer = context.getContainers().get(cid);
     Assert.assertNotNull(nmContainer);
     ResourceMappings resourceMappings = nmContainer.getResourceMappings();
     List<Serializable> assignedResource = resourceMappings
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
index 1c4313c..d985b5b 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
@@ -20,7 +20,6 @@ package org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resourc
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.util.StringUtils;
-import org.apache.hadoop.yarn.api.protocolrecords.ResourceTypes;
 import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
 import org.apache.hadoop.yarn.api.records.ApplicationId;
 import org.apache.hadoop.yarn.api.records.ContainerId;
@@ -36,9 +35,10 @@ import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileg
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandler;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerRuntimeConstants;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService;
-import org.apache.hadoop.yarn.util.resource.ResourceUtils;
 import org.apache.hadoop.yarn.util.resource.TestResourceUtils;
 import org.junit.Assert;
 import org.junit.Before;
@@ -46,6 +46,7 @@ import org.junit.Test;
 
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
 import java.util.HashMap;
@@ -92,7 +93,7 @@ public class TestGpuResourceHandler {
   @Test
   public void testBootStrap() throws Exception {
     Configuration conf = new YarnConfiguration();
-    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0");
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0");
 
     GpuDiscoverer.getInstance().initialize(conf);
 
@@ -106,8 +107,8 @@ public class TestGpuResourceHandler {
         .newInstance(ApplicationId.newInstance(1234L, 1), 1), id);
   }
 
-  private static Container mockContainerWithGpuRequest(int id,
-      int numGpuRequest) {
+  private static Container mockContainerWithGpuRequest(int id, int numGpuRequest,
+      boolean dockerContainerEnabled) {
     Container c = mock(Container.class);
     when(c.getContainerId()).thenReturn(getContainerId(id));
 
@@ -117,29 +118,46 @@ public class TestGpuResourceHandler {
     res.setResourceValue(ResourceInformation.GPU_URI, numGpuRequest);
     when(c.getResource()).thenReturn(res);
     when(c.getResourceMappings()).thenReturn(resMapping);
+
+    ContainerLaunchContext clc = mock(ContainerLaunchContext.class);
+    Map<String, String> env = new HashMap<>();
+    if (dockerContainerEnabled) {
+      env.put(ContainerRuntimeConstants.ENV_CONTAINER_TYPE, "docker");
+    }
+    when(clc.getEnvironment()).thenReturn(env);
+    when(c.getLaunchContext()).thenReturn(clc);
     return c;
   }
 
+  private static Container mockContainerWithGpuRequest(int id,
+      int numGpuRequest) {
+    return mockContainerWithGpuRequest(id, numGpuRequest, false);
+  }
+
   private void verifyDeniedDevices(ContainerId containerId,
-      List<Integer> deniedDevices)
+      List<GpuDevice> deniedDevices)
       throws ResourceHandlerException, PrivilegedOperationException {
     verify(mockCGroupsHandler, times(1)).createCGroup(
         CGroupsHandler.CGroupController.DEVICES, containerId.toString());
 
     if (null != deniedDevices && !deniedDevices.isEmpty()) {
+      List<Integer> deniedDevicesMinorNumber = new ArrayList<>();
+      for (GpuDevice deniedDevice : deniedDevices) {
+        deniedDevicesMinorNumber.add(deniedDevice.getMinorNumber());
+      }
       verify(mockPrivilegedExecutor, times(1)).executePrivilegedOperation(
           new PrivilegedOperation(PrivilegedOperation.OperationType.GPU, Arrays
               .asList(GpuResourceHandlerImpl.CONTAINER_ID_CLI_OPTION,
                   containerId.toString(),
                   GpuResourceHandlerImpl.EXCLUDED_GPUS_CLI_OPTION,
-                  StringUtils.join(",", deniedDevices))), true);
+                  StringUtils.join(",", deniedDevicesMinorNumber))), true);
     }
   }
 
-  @Test
-  public void testAllocation() throws Exception {
+  private void commonTestAllocation(boolean dockerContainerEnabled)
+      throws Exception {
     Configuration conf = new YarnConfiguration();
-    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,3,4");
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1,2:3,3:4");
     GpuDiscoverer.getInstance().initialize(conf);
 
     gpuResourceHandler.bootstrap(conf);
@@ -147,31 +165,55 @@ public class TestGpuResourceHandler {
         gpuResourceHandler.getGpuAllocator().getAvailableGpus());
 
     /* Start container 1, asks 3 containers */
-    gpuResourceHandler.preStart(mockContainerWithGpuRequest(1, 3));
+    gpuResourceHandler.preStart(
+        mockContainerWithGpuRequest(1, 3, dockerContainerEnabled));
 
     // Only device=4 will be blocked.
-    verifyDeniedDevices(getContainerId(1), Arrays.asList(4));
+    if (dockerContainerEnabled) {
+      verifyDeniedDevices(getContainerId(1),
+          Collections.<GpuDevice>emptyList());
+    } else{
+      verifyDeniedDevices(getContainerId(1), Arrays.asList(new GpuDevice(3,4)));
+    }
 
     /* Start container 2, asks 2 containers. Excepted to fail */
     boolean failedToAllocate = false;
     try {
-      gpuResourceHandler.preStart(mockContainerWithGpuRequest(2, 2));
+      gpuResourceHandler.preStart(
+          mockContainerWithGpuRequest(2, 2, dockerContainerEnabled));
     } catch (ResourceHandlerException e) {
       failedToAllocate = true;
     }
     Assert.assertTrue(failedToAllocate);
 
     /* Start container 3, ask 1 container, succeeded */
-    gpuResourceHandler.preStart(mockContainerWithGpuRequest(3, 1));
+    gpuResourceHandler.preStart(
+        mockContainerWithGpuRequest(3, 1, dockerContainerEnabled));
 
     // devices = 0/1/3 will be blocked
-    verifyDeniedDevices(getContainerId(3), Arrays.asList(0, 1, 3));
+    if (dockerContainerEnabled) {
+      verifyDeniedDevices(getContainerId(3),
+          Collections.<GpuDevice>emptyList());
+    } else {
+      verifyDeniedDevices(getContainerId(3), Arrays
+          .asList(new GpuDevice(0, 0), new GpuDevice(1, 1),
+              new GpuDevice(2, 3)));
+    }
 
-    /* Start container 4, ask 0 container, succeeded */
-    gpuResourceHandler.preStart(mockContainerWithGpuRequest(4, 0));
 
-    // All devices will be blocked
-    verifyDeniedDevices(getContainerId(4), Arrays.asList(0, 1, 3, 4));
+    /* Start container 4, ask 0 container, succeeded */
+    gpuResourceHandler.preStart(
+        mockContainerWithGpuRequest(4, 0, dockerContainerEnabled));
+
+    if (dockerContainerEnabled) {
+      verifyDeniedDevices(getContainerId(4),
+          Collections.<GpuDevice>emptyList());
+    } else{
+      // All devices will be blocked
+      verifyDeniedDevices(getContainerId(4), Arrays
+          .asList(new GpuDevice(0, 0), new GpuDevice(1, 1), new GpuDevice(2, 3),
+              new GpuDevice(3, 4)));
+    }
 
     /* Release container-1, expect cgroups deleted */
     gpuResourceHandler.postComplete(getContainerId(1));
@@ -190,12 +232,24 @@ public class TestGpuResourceHandler {
         gpuResourceHandler.getGpuAllocator().getAvailableGpus());
   }
 
+  @Test
+  public void testAllocationWhenDockerContainerEnabled() throws Exception {
+    // When docker container is enabled, no devices should be written to
+    // devices.deny.
+    commonTestAllocation(true);
+  }
+
+  @Test
+  public void testAllocation() throws Exception {
+    commonTestAllocation(false);
+  }
+
   @SuppressWarnings("unchecked")
   @Test
   public void testAssignedGpuWillBeCleanedupWhenStoreOpFails()
       throws Exception {
     Configuration conf = new YarnConfiguration();
-    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,3,4");
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1,2:3,3:4");
     GpuDiscoverer.getInstance().initialize(conf);
 
     gpuResourceHandler.bootstrap(conf);
@@ -204,7 +258,7 @@ public class TestGpuResourceHandler {
 
     doThrow(new IOException("Exception ...")).when(mockNMStateStore)
         .storeAssignedResources(
-        any(ContainerId.class), anyString(), anyList());
+        any(Container.class), anyString(), anyList());
 
     boolean exception = false;
     /* Start container 1, asks 3 containers */
@@ -227,13 +281,16 @@ public class TestGpuResourceHandler {
     conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, " ");
     GpuDiscoverer.getInstance().initialize(conf);
 
-    gpuResourceHandler.bootstrap(conf);
-    Assert.assertEquals(0,
-        gpuResourceHandler.getGpuAllocator().getAvailableGpus());
+    try {
+      gpuResourceHandler.bootstrap(conf);
+      Assert.fail("Should fail because no GPU available");
+    } catch (ResourceHandlerException e) {
+      // Expected because of no resource available
+    }
 
     /* Start container 1, asks 0 containers */
     gpuResourceHandler.preStart(mockContainerWithGpuRequest(1, 0));
-    verifyDeniedDevices(getContainerId(1), Collections.<Integer>emptyList());
+    verifyDeniedDevices(getContainerId(1), Collections.<GpuDevice>emptyList());
 
     /* Start container 2, asks 1 containers. Excepted to fail */
     boolean failedToAllocate = false;
@@ -256,7 +313,7 @@ public class TestGpuResourceHandler {
   @Test
   public void testAllocationStored() throws Exception {
     Configuration conf = new YarnConfiguration();
-    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,3,4");
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1,2:3,3:4");
     GpuDiscoverer.getInstance().initialize(conf);
 
     gpuResourceHandler.bootstrap(conf);
@@ -267,34 +324,34 @@ public class TestGpuResourceHandler {
     Container container = mockContainerWithGpuRequest(1, 3);
     gpuResourceHandler.preStart(container);
 
-    verify(mockNMStateStore).storeAssignedResources(getContainerId(1),
-        ResourceInformation.GPU_URI,
-        Arrays.<Serializable>asList("0", "1", "3"));
-
-    Assert.assertEquals(3, container.getResourceMappings()
-        .getAssignedResources(ResourceInformation.GPU_URI).size());
+    verify(mockNMStateStore).storeAssignedResources(container,
+        ResourceInformation.GPU_URI, Arrays
+            .<Serializable>asList(new GpuDevice(0, 0), new GpuDevice(1, 1),
+                new GpuDevice(2, 3)));
 
     // Only device=4 will be blocked.
-    verifyDeniedDevices(getContainerId(1), Arrays.asList(4));
+    verifyDeniedDevices(getContainerId(1), Arrays.asList(new GpuDevice(3, 4)));
 
     /* Start container 2, ask 0 container, succeeded */
     container = mockContainerWithGpuRequest(2, 0);
     gpuResourceHandler.preStart(container);
 
-    verifyDeniedDevices(getContainerId(2), Arrays.asList(0, 1, 3, 4));
+    verifyDeniedDevices(getContainerId(2), Arrays
+        .asList(new GpuDevice(0, 0), new GpuDevice(1, 1), new GpuDevice(2, 3),
+            new GpuDevice(3, 4)));
     Assert.assertEquals(0, container.getResourceMappings()
         .getAssignedResources(ResourceInformation.GPU_URI).size());
 
     // Store assigned resource will not be invoked.
     verify(mockNMStateStore, never()).storeAssignedResources(
-        eq(getContainerId(2)), eq(ResourceInformation.GPU_URI),
+        eq(container), eq(ResourceInformation.GPU_URI),
         anyListOf(Serializable.class));
   }
 
   @Test
   public void testRecoverResourceAllocation() throws Exception {
     Configuration conf = new YarnConfiguration();
-    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,3,4");
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1,2:3,3:4");
     GpuDiscoverer.getInstance().initialize(conf);
 
     gpuResourceHandler.bootstrap(conf);
@@ -305,7 +362,8 @@ public class TestGpuResourceHandler {
     ResourceMappings rmap = new ResourceMappings();
     ResourceMappings.AssignedResources ar =
         new ResourceMappings.AssignedResources();
-    ar.updateAssignedResources(Arrays.<Serializable>asList("1", "3"));
+    ar.updateAssignedResources(
+        Arrays.<Serializable>asList(new GpuDevice(1, 1), new GpuDevice(2, 3)));
     rmap.addAssignedResources(ResourceInformation.GPU_URI, ar);
     when(nmContainer.getResourceMappings()).thenReturn(rmap);
 
@@ -315,12 +373,15 @@ public class TestGpuResourceHandler {
     // Reacquire container restore state of GPU Resource Allocator.
     gpuResourceHandler.reacquireContainer(getContainerId(1));
 
-    Map<Integer, ContainerId> deviceAllocationMapping =
+    Map<GpuDevice, ContainerId> deviceAllocationMapping =
         gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
     Assert.assertEquals(2, deviceAllocationMapping.size());
     Assert.assertTrue(
-        deviceAllocationMapping.keySet().containsAll(Arrays.asList(1, 3)));
-    Assert.assertEquals(deviceAllocationMapping.get(1), getContainerId(1));
+        deviceAllocationMapping.keySet().contains(new GpuDevice(1, 1)));
+    Assert.assertTrue(
+        deviceAllocationMapping.keySet().contains(new GpuDevice(2, 3)));
+    Assert.assertEquals(deviceAllocationMapping.get(new GpuDevice(1, 1)),
+        getContainerId(1));
 
     // TEST CASE
     // Try to reacquire a container but requested device is not in allowed list.
@@ -328,7 +389,8 @@ public class TestGpuResourceHandler {
     rmap = new ResourceMappings();
     ar = new ResourceMappings.AssignedResources();
     // id=5 is not in allowed list.
-    ar.updateAssignedResources(Arrays.<Serializable>asList("4", "5"));
+    ar.updateAssignedResources(
+        Arrays.<Serializable>asList(new GpuDevice(3, 4), new GpuDevice(4, 5)));
     rmap.addAssignedResources(ResourceInformation.GPU_URI, ar);
     when(nmContainer.getResourceMappings()).thenReturn(rmap);
 
@@ -348,9 +410,10 @@ public class TestGpuResourceHandler {
     deviceAllocationMapping =
         gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
     Assert.assertEquals(2, deviceAllocationMapping.size());
-    Assert.assertTrue(
-        deviceAllocationMapping.keySet().containsAll(Arrays.asList(1, 3)));
-    Assert.assertEquals(deviceAllocationMapping.get(1), getContainerId(1));
+    Assert.assertTrue(deviceAllocationMapping.keySet()
+        .containsAll(Arrays.asList(new GpuDevice(1, 1), new GpuDevice(2, 3))));
+    Assert.assertEquals(deviceAllocationMapping.get(new GpuDevice(1, 1)),
+        getContainerId(1));
 
     // TEST CASE
     // Try to reacquire a container but requested device is already assigned.
@@ -358,7 +421,8 @@ public class TestGpuResourceHandler {
     rmap = new ResourceMappings();
     ar = new ResourceMappings.AssignedResources();
     // id=3 is already assigned
-    ar.updateAssignedResources(Arrays.<Serializable>asList("4", "3"));
+    ar.updateAssignedResources(
+        Arrays.<Serializable>asList(new GpuDevice(3, 4), new GpuDevice(2, 3)));
     rmap.addAssignedResources("gpu", ar);
     when(nmContainer.getResourceMappings()).thenReturn(rmap);
 
@@ -378,8 +442,9 @@ public class TestGpuResourceHandler {
     deviceAllocationMapping =
         gpuResourceHandler.getGpuAllocator().getDeviceAllocationMapping();
     Assert.assertEquals(2, deviceAllocationMapping.size());
-    Assert.assertTrue(
-        deviceAllocationMapping.keySet().containsAll(Arrays.asList(1, 3)));
-    Assert.assertEquals(deviceAllocationMapping.get(1), getContainerId(1));
+    Assert.assertTrue(deviceAllocationMapping.keySet()
+        .containsAll(Arrays.asList(new GpuDevice(1, 1), new GpuDevice(2, 3))));
+    Assert.assertEquals(deviceAllocationMapping.get(new GpuDevice(1, 1)),
+        getContainerId(1));
   }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java
index 83bace2..4abb633 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestGpuDiscoverer.java
@@ -101,23 +101,41 @@ public class TestGpuDiscoverer {
     GpuDeviceInformation info = plugin.getGpuDeviceInformation();
 
     Assert.assertTrue(info.getGpus().size() > 0);
-    Assert.assertEquals(plugin.getMinorNumbersOfGpusUsableByYarn().size(),
+    Assert.assertEquals(plugin.getGpusUsableByYarn().size(),
         info.getGpus().size());
   }
 
   @Test
   public void getNumberOfUsableGpusFromConfig() throws YarnException {
     Configuration conf = new Configuration(false);
-    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0,1,2,4");
+
+    // Illegal format
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1,2:2,3");
     GpuDiscoverer plugin = new GpuDiscoverer();
+    try {
+      plugin.initialize(conf);
+      plugin.getGpusUsableByYarn();
+      Assert.fail("Illegal format, should fail.");
+    } catch (YarnException e) {
+      // Expected
+    }
+
+    // Valid format
+    conf.set(YarnConfiguration.NM_GPU_ALLOWED_DEVICES, "0:0,1:1,2:2,3:4");
+    plugin = new GpuDiscoverer();
     plugin.initialize(conf);
 
-    List<Integer> minorNumbers = plugin.getMinorNumbersOfGpusUsableByYarn();
-    Assert.assertEquals(4, minorNumbers.size());
+    List<GpuDevice> usableGpuDevices = plugin.getGpusUsableByYarn();
+    Assert.assertEquals(4, usableGpuDevices.size());
+
+    Assert.assertTrue(0 == usableGpuDevices.get(0).getIndex());
+    Assert.assertTrue(1 == usableGpuDevices.get(1).getIndex());
+    Assert.assertTrue(2 == usableGpuDevices.get(2).getIndex());
+    Assert.assertTrue(3 == usableGpuDevices.get(3).getIndex());
 
-    Assert.assertTrue(0 == minorNumbers.get(0));
-    Assert.assertTrue(1 == minorNumbers.get(1));
-    Assert.assertTrue(2 == minorNumbers.get(2));
-    Assert.assertTrue(4 == minorNumbers.get(3));
+    Assert.assertTrue(0 == usableGpuDevices.get(0).getMinorNumber());
+    Assert.assertTrue(1 == usableGpuDevices.get(1).getMinorNumber());
+    Assert.assertTrue(2 == usableGpuDevices.get(2).getMinorNumber());
+    Assert.assertTrue(4 == usableGpuDevices.get(3).getMinorNumber());
   }
 }
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
index 5d424ad..4364709 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
@@ -43,6 +43,7 @@ import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.LogDelet
 import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
 import org.apache.hadoop.yarn.server.api.records.MasterKey;
 import org.apache.hadoop.yarn.server.api.records.impl.pb.MasterKeyPBImpl;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
 import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 
 
@@ -515,14 +516,17 @@ public class NMMemoryStateStoreService extends NMStateStoreService {
   }
 
   @Override
-  public void storeAssignedResources(ContainerId containerId,
+  public void storeAssignedResources(Container container,
       String resourceType, List<Serializable> assignedResources)
       throws IOException {
     ResourceMappings.AssignedResources ar =
         new ResourceMappings.AssignedResources();
     ar.updateAssignedResources(assignedResources);
-    containerStates.get(containerId).getResourceMappings()
+    containerStates.get(container.getContainerId()).getResourceMappings()
         .addAssignedResources(resourceType, ar);
+
+    // update container resource mapping.
+    updateContainerResourceMapping(container, resourceType, assignedResources);
   }
 
   private static class TrackerState {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
index 270b8af..20c5240 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
@@ -29,6 +29,7 @@ import static org.mockito.Mockito.isNull;
 import static org.mockito.Mockito.mock;
 import static org.mockito.Mockito.timeout;
 import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
 
 import java.io.File;
 import java.io.IOException;
@@ -69,6 +70,8 @@ import org.apache.hadoop.yarn.proto.YarnServerNodemanagerRecoveryProtos.LogDelet
 import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
 import org.apache.hadoop.yarn.server.api.records.MasterKey;
 import org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyTokenSecretManager;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ResourceMappings;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.LocalResourceTrackerState;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.RecoveredAMRMProxyState;
 import org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.RecoveredApplicationsState;
@@ -1143,17 +1146,22 @@ public class TestNMLeveldbStateStoreService {
     ContainerId containerId = ContainerId.newContainerId(appAttemptId, 5);
     storeMockContainer(containerId);
 
+    Container container = mock(Container.class);
+    when(container.getContainerId()).thenReturn(containerId);
+    ResourceMappings resourceMappings = new ResourceMappings();
+    when(container.getResourceMappings()).thenReturn(resourceMappings);
+
     // Store ResourceMapping
-    stateStore.storeAssignedResources(containerId, "gpu",
+    stateStore.storeAssignedResources(container, "gpu",
         Arrays.<Serializable>asList("1", "2", "3"));
     // This will overwrite above
     List<Serializable> gpuRes1 = Arrays.<Serializable>asList("1", "2", "4");
-    stateStore.storeAssignedResources(containerId, "gpu", gpuRes1);
+    stateStore.storeAssignedResources(container, "gpu", gpuRes1);
     List<Serializable> fpgaRes =
         Arrays.<Serializable>asList("3", "4", "5", "6");
-    stateStore.storeAssignedResources(containerId, "fpga", fpgaRes);
+    stateStore.storeAssignedResources(container, "fpga", fpgaRes);
     List<Serializable> numaRes = Arrays.<Serializable>asList("numa1");
-    stateStore.storeAssignedResources(containerId, "numa", numaRes);
+    stateStore.storeAssignedResources(container, "numa", numaRes);
 
     // add a invalid key
     restartStateStore();
@@ -1163,12 +1171,18 @@ public class TestNMLeveldbStateStoreService {
     List<Serializable> res = rcs.getResourceMappings()
         .getAssignedResources("gpu");
     Assert.assertTrue(res.equals(gpuRes1));
+    Assert.assertTrue(
+        resourceMappings.getAssignedResources("gpu").equals(gpuRes1));
 
     res = rcs.getResourceMappings().getAssignedResources("fpga");
     Assert.assertTrue(res.equals(fpgaRes));
+    Assert.assertTrue(
+        resourceMappings.getAssignedResources("fpga").equals(fpgaRes));
 
     res = rcs.getResourceMappings().getAssignedResources("numa");
     Assert.assertTrue(res.equals(numaRes));
+    Assert.assertTrue(
+        resourceMappings.getAssignedResources("numa").equals(numaRes));
   }
 
   private StartContainerRequest storeMockContainer(ContainerId containerId)


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 02/20: YARN-7270 addendum: Reapplied changes after YARN-3926 backports

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 93fe781e8c7afd8b6b2db957cb2789fdde5a1fb8
Author: Daniel Templeton <te...@apache.org>
AuthorDate: Mon Oct 16 11:43:54 2017 -0700

    YARN-7270 addendum: Reapplied changes after YARN-3926 backports
---
 .../src/main/java/org/apache/hadoop/yarn/api/records/Resource.java    | 4 ++--
 .../org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java  | 4 ++--
 .../org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java    | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
index 37b50f2..be0ab58 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
@@ -285,7 +285,7 @@ public abstract class Resource implements Comparable<Resource> {
       return;
     }
     if (resource.equals(ResourceInformation.VCORES_URI)) {
-      this.setVirtualCores((int) resourceInformation.getValue());
+      this.setVirtualCores(castToIntSafely(resourceInformation.getValue()));
       return;
     }
     ResourceInformation storedResourceInfo = getResourceInformation(resource);
@@ -331,7 +331,7 @@ public abstract class Resource implements Comparable<Resource> {
       return;
     }
     if (resource.equals(ResourceInformation.VCORES_URI)) {
-      this.setVirtualCores((int)value);
+      this.setVirtualCores(castToIntSafely(value));
       return;
     }
 
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
index b80e133..a64d242 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
@@ -92,7 +92,7 @@ public class LightWeightResource extends Resource {
   @Override
   @SuppressWarnings("deprecation")
   public int getMemory() {
-    return (int) memoryResInfo.getValue();
+    return castToIntSafely(memoryResInfo.getValue());
   }
 
   @Override
@@ -113,7 +113,7 @@ public class LightWeightResource extends Resource {
 
   @Override
   public int getVirtualCores() {
-    return (int) vcoresResInfo.getValue();
+    return castToIntSafely(vcoresResInfo.getValue());
   }
 
   @Override
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java
index 06c30ff..4ae64c2 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourcePBImpl.java
@@ -117,7 +117,7 @@ public class ResourcePBImpl extends Resource {
   @Override
   public int getVirtualCores() {
     // vcores should always be present
-    return (int) resources[VCORES_INDEX].getValue();
+    return castToIntSafely(resources[VCORES_INDEX].getValue());
   }
 
   @Override


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 16/20: YARN-7345. GPU Isolation: Incorrect minor device numbers written to devices.deny file. (Jonathan Hung via wangda)

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 05292fef5ef3be612c184472ec578b31af907e56
Author: Wangda Tan <wa...@apache.org>
AuthorDate: Thu Oct 19 14:45:44 2017 -0700

    YARN-7345. GPU Isolation: Incorrect minor device numbers written to devices.deny file. (Jonathan Hung via wangda)
---
 .../native/container-executor/impl/modules/gpu/gpu-module.c |  2 +-
 .../container-executor/test/modules/gpu/test-gpu-module.cc  | 13 +++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.c b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.c
index f96645d..1a1b164 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.c
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.c
@@ -108,7 +108,7 @@ static int internal_handle_gpu_request(
     char param_value[128];
     memset(param_value, 0, sizeof(param_value));
     snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
-             major_device_number, i);
+             major_device_number, minor_devices[i]);
 
     int rc = update_cgroups_parameters_func_p("devices", "deny",
       container_id, param_value);
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/gpu/test-gpu-module.cc b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/gpu/test-gpu-module.cc
index 7e41fb4..b3d93dc 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/gpu/test-gpu-module.cc
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/gpu/test-gpu-module.cc
@@ -165,6 +165,19 @@ TEST_F(TestGpuModule, test_verify_gpu_module_calls_cgroup_parameter) {
 
   // Verify cgroups parameters
   verify_param_updated_to_cgroups(0, NULL);
+
+  /* Test case 3: block 2 non-sequential devices */
+  cgroups_parameters_invoked.clear();
+  char* argv_2[] = { (char*) "--module-gpu", (char*) "--excluded_gpus", (char*) "1,3",
+                   (char*) "--container_id", container_id };
+  rc = handle_gpu_request(&mock_update_cgroups_parameters,
+     "gpu", 5, argv_2);
+  ASSERT_EQ(0, rc) << "Should success.\n";
+
+  // Verify cgroups parameters
+  const char* expected_cgroups_argv_2[] = { "devices", "deny", container_id, "c 195:1 rwm",
+    "devices", "deny", container_id, "c 195:3 rwm"};
+  verify_param_updated_to_cgroups(8, expected_cgroups_argv_2);
 }
 
 TEST_F(TestGpuModule, test_illegal_cli_parameters) {


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


[hadoop] 11/20: YARN-7573. Gpu Information page could be empty for nodes without GPU. (Sunil G via wangda)

Posted by jh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jhung pushed a commit to branch YARN-8200
in repository https://gitbox.apache.org/repos/asf/hadoop.git

commit 2116edd368ab2ae19b54ea5ce674785344d42759
Author: Wangda Tan <wa...@apache.org>
AuthorDate: Wed Nov 29 17:43:37 2017 -0800

    YARN-7573. Gpu Information page could be empty for nodes without GPU. (Sunil G via wangda)
    
    Change-Id: I7f614e5a589a09ce4e4286c84b706e05c29abd14
---
 .../apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java | 4 +---
 .../hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js       | 6 ++++--
 .../src/main/webapp/app/templates/components/node-menu-panel.hbs    | 2 +-
 .../hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-apps.hbs | 2 +-
 .../src/main/webapp/app/templates/yarn-node-containers.hbs          | 2 +-
 .../src/main/webapp/app/templates/yarn-node/yarn-nm-gpu.hbs         | 4 ++++
 6 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java
index 7476d75..7702004 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java
@@ -510,9 +510,7 @@ public class NMWebServices {
       }
     }
 
-    throw new YarnException(
-        "Could not get detailed resource information for given resource-name="
-            + resourceName);
+    return new NMResourceInfo();
   }
 
   private long parseLongParam(String bytes) {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js
index b1b1518..aa5efbe 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-rm-node.js
@@ -97,7 +97,9 @@ export default DS.Model.extend({
     var used = 0;
     var ri;
 
-    var resourceInformations = this.get("usedResource").resourcesInformations;
+    const usedResource = this.get("usedResource");
+    const availableResource = this.get("availableResource");
+    var resourceInformations = usedResource ? usedResource.resourcesInformations : [];
     for (var i = 0; i < resourceInformations.length; i++) {
       ri = resourceInformations[i];
       if (ri.name === "yarn.io/gpu") {
@@ -106,7 +108,7 @@ export default DS.Model.extend({
     }
 
     var available = 0;
-    resourceInformations = this.get("availableResource").resourcesInformations;
+    resourceInformations = availableResource ? availableResource.resourcesInformations : [];
     for (i = 0; i < resourceInformations.length; i++) {
       ri = resourceInformations[i];
       if (ri.name === "yarn.io/gpu") {
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs
index fffae30..966e408 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs
@@ -36,7 +36,7 @@
               {{#link-to 'yarn-node-containers' nodeId nodeAddr}}List of Containers
               {{/link-to}}
             {{/link-to}}
-            {{#if nmGpuInfo}}
+            {{#if (and nmGpuInfo nmGpuInfo.info.totalGpuDevices)}}
               {{#link-to 'yarn-node.yarn-nm-gpu' tagName="li"}}
                 {{#link-to 'yarn-node.yarn-nm-gpu' nodeId nodeAddr }}GPU Information
                 {{/link-to}}
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-apps.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-apps.hbs
index 52f0c86..919e54d 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-apps.hbs
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-apps.hbs
@@ -20,7 +20,7 @@
 
 <div class="col-md-12 container-fluid">
   <div class="row">
-    {{node-menu-panel path="yarn-node-apps" nodeAddr=model.nodeInfo.addr nodeId=model.nodeInfo.id}}
+    {{node-menu-panel path="yarn-node-apps" nodeAddr=model.nodeInfo.addr nodeId=model.nodeInfo.id nmGpuInfo=model.nmGpuInfo}}
     {{#if model.apps}}
     <div class="col-md-10 container-fluid">
       <table id="node-apps-table" class="display table table-striped table-bordered" cellspacing="0" width="100%">
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-containers.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-containers.hbs
index f520c46..1f31272 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-containers.hbs
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node-containers.hbs
@@ -20,7 +20,7 @@
 
 <div class="col-md-12 container-fluid">
   <div class="row">
-    {{node-menu-panel path="yarn-node-containers" nodeAddr=model.nodeInfo.addr nodeId=model.nodeInfo.id}}
+    {{node-menu-panel path="yarn-node-containers" nodeAddr=model.nodeInfo.addr nodeId=model.nodeInfo.id nmGpuInfo=model.nmGpuInfo}}
     {{#if model.containers}}
     <div class="col-md-10 container-fluid">
       <table id="node-containers-table" class="display table table-striped table-bordered" cellspacing="0" width="100%">
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/yarn-nm-gpu.hbs b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/yarn-nm-gpu.hbs
index 55840ad..0464cc8 100644
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/yarn-nm-gpu.hbs
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-node/yarn-nm-gpu.hbs
@@ -23,6 +23,7 @@
 
     {{node-menu-panel path="yarn-node" nodeId=model.rmNode.id
                       nodeAddr=model.node.id nmGpuInfo=model.nmGpuInfo}}
+    {{#if model.nmGpuInfo.info.totalGpuDevices}}
 
     <div class="col-md-10 container-fluid">
       <div class="panel panel-default">
@@ -49,5 +50,8 @@
         {{yarn-nm-gpu-info gpu=gpu}}
       {{/each}}
     </div>
+    {{else}}
+      <h4 align = "center">No GPUs are found on this node.</h4>
+    {{/if}}
   </div>
 </div>
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org