You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Sandor Molnar (JIRA)" <ji...@apache.org> on 2018/05/14 09:02:00 UTC

[jira] [Updated] (AMBARI-23831) Ambari YARN Changes needed to enable CGroups + CPU Scheduling + LinuxContainerExecutor in both secure & Unsecure clusters

     [ https://issues.apache.org/jira/browse/AMBARI-23831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandor Molnar updated AMBARI-23831:
-----------------------------------
    Description: 
The following changes should be implemented:

1) For both secure and non-secure cluster.
 - Use LinuxContainerExecutor:
{code:java}
"yarn.nodemanager.container-executor.class" => "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor"

"yarn.nodemanager.linux-container-executor.resources-handler.class" => "org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler"

"yarn.nodemanager.linux-container-executor.cgroups.mount" => true (assume admin won't mount cgroup ahead)
{code}

 - Properly setup permission of container-executor / container-executor.cfg (use today's permissions in security mode).
 - Further changes:
{code:java}
"yarn.nodemanager.resource.memory.enabled"
// the default value is false, we need to set to true here to enable the cgroups based memory monitoring.


"yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage"
// the default value is 90.0f, which means in memory congestion case, the container can still keep/reserve 90% resource for its claimed value. It cannot be set to above 100 or set as negative value.

"yarn.nodemanager.resource.memory.cgroups.swappiness"
// The percentage that memory can be swapped or not. default value is 0, which means container memory cannot be swapped out. If not set, linux cgroup setting by default set to 60 which means 60% of memory can potentially be swapped out when system memory is not enough.

"yarn.nodemanager.linux-container-executor.group" set to Unix group of the NodeManager which should match the setting in “container-executor.cfg” (hadoop for ambari?).
{code}

 - For cgroups limitations:
{code:java}
"yarn.nodemanager.resource.percentage-physical-cpu-limit" - 
this setting lets you limit the cpu usage of all YARN containers. It sets a hard upper limit on the cumulative CPU usage of the containers. For example, if set to 60, the combined CPU usage of all YARN containers will not exceed 60%. The yarn by default value is 100.

"yarn.nodemanager.resource.cpu-vcores" - number of vcores can be assign to yarn containers, default value is 8 for yarn, but ambari should set a proper value in considering of NM size, etc.

"yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" - 
CGroups allows cpu usage limits to be hard or soft. When this setting is true, containers cannot use more CPU usage than allocated even if spare CPU is available. This ensures that containers can only use CPU that they were allocated. When set to false, containers can use spare CPU if available. It should be noted that irrespective of whether set to true or false, at no time can the combined CPU usage of all containers exceed the value specified in 
“yarn.nodemanager.resource.percentage-physical-cpu-limit”.
Talked with peers, we run into kernel panic when set hard limit before, so we should know there is risk to set this to true. May need a documentation? 
{code}

2) For non-secure cluster (this needs to be done when we move from secure to non-secure):
 - In container-executor.cfg: Remove "yarn" from banned user ({{banned.users}}). And set {{min.user.id}} to 50.
 - In yarn-site.xml: change:
{code:java}
 
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=true
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=yarn
{code}

3) When moving from non-secure to secure:
 - In container-executor.cfg:
 Add "yarn" user to banned user ({{banned.users}})
 Set {{min.user.id}} to existing default in Ambari (IIRC it's 1000).
 - Revert yarn-site.xml following configs to:
{code:java}
 
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=false
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=nobody 
{code}
 

  was:
The following changes should be implemented:

1) For both secure and non-secure cluster.
 - Use LinuxContainerExecutor:
{code:java}
"yarn.nodemanager.container-executor.class" => "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor"

"yarn.nodemanager.linux-container-executor.resources-handler.class" => "org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler"

"yarn.nodemanager.linux-container-executor.cgroups.mount" => true (assume admin won't mount cgroup ahead)
{code}

 - Properly setup permission of container-executor / container-executor.cfg (use today's permissions in security mode).
 - Further changes:
{code:java}
"yarn.nodemanager.resource.memory.enabled"
// the default value is false, we need to set to true here to enable the cgroups based memory monitoring.


"yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage"
// the default value is 90.0f, which means in memory congestion case, the container can still keep/reserve 90% resource for its claimed value. It cannot be set to above 100 or set as negative value.

"yarn.nodemanager.resource.memory.cgroups.swappiness"
// The percentage that memory can be swapped or not. default value is 0, which means container memory cannot be swapped out. If not set, linux cgroup setting by default set to 60 which means 60% of memory can potentially be swapped out when system memory is not enough.

"yarn.nodemanager.linux-container-executor.group" set to Unix group of the NodeManager which should match the setting in “container-executor.cfg” (hadoop for ambari?).
{code}

 - For cgroups limitations:
{code:java}
"yarn.nodemanager.resource.percentage-physical-cpu-limit" - 
this setting lets you limit the cpu usage of all YARN containers. It sets a hard upper limit on the cumulative CPU usage of the containers. For example, if set to 60, the combined CPU usage of all YARN containers will not exceed 60%. The yarn by default value is 100.

"yarn.nodemanager.resource.cpu-vcores" - number of vcores can be assign to yarn containers, default value is 8 for yarn, but ambari should set a proper value in considering of NM size, etc.

"yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" - 
CGroups allows cpu usage limits to be hard or soft. When this setting is true, containers cannot use more CPU usage than allocated even if spare CPU is available. This ensures that containers can only use CPU that they were allocated. When set to false, containers can use spare CPU if available. It should be noted that irrespective of whether set to true or false, at no time can the combined CPU usage of all containers exceed the value specified in 
“yarn.nodemanager.resource.percentage-physical-cpu-limit”.
Talked with [~skumpf], we run into kernel panic when set hard limit before, so we should know there is risk to set this to true. May need a documentation? 
{code}

2) For non-secure cluster (this needs to be done when we move from secure to non-secure):
 - In container-executor.cfg: Remove "yarn" from banned user ({{banned.users}}). And set {{min.user.id}} to 50.
 - In yarn-site.xml: change:
{code:java}
 
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=true
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=yarn
{code}

3) When moving from non-secure to secure:
 - In container-executor.cfg:
Add "yarn" user to banned user ({{banned.users}})
Set {{min.user.id}} to existing default in Ambari (IIRC it's 1000).
 - Revert yarn-site.xml following configs to:
{code:java}
 
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=false
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=nobody 
{code}
 


> Ambari YARN Changes needed to enable CGroups + CPU Scheduling + LinuxContainerExecutor in both secure & Unsecure clusters
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-23831
>                 URL: https://issues.apache.org/jira/browse/AMBARI-23831
>             Project: Ambari
>          Issue Type: Task
>          Components: ambari-server
>            Reporter: Sandor Molnar
>            Assignee: Sandor Molnar
>            Priority: Blocker
>             Fix For: 2.7.0
>
>
> The following changes should be implemented:
> 1) For both secure and non-secure cluster.
>  - Use LinuxContainerExecutor:
> {code:java}
> "yarn.nodemanager.container-executor.class" => "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor"
> "yarn.nodemanager.linux-container-executor.resources-handler.class" => "org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler"
> "yarn.nodemanager.linux-container-executor.cgroups.mount" => true (assume admin won't mount cgroup ahead)
> {code}
>  - Properly setup permission of container-executor / container-executor.cfg (use today's permissions in security mode).
>  - Further changes:
> {code:java}
> "yarn.nodemanager.resource.memory.enabled"
> // the default value is false, we need to set to true here to enable the cgroups based memory monitoring.
> "yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage"
> // the default value is 90.0f, which means in memory congestion case, the container can still keep/reserve 90% resource for its claimed value. It cannot be set to above 100 or set as negative value.
> "yarn.nodemanager.resource.memory.cgroups.swappiness"
> // The percentage that memory can be swapped or not. default value is 0, which means container memory cannot be swapped out. If not set, linux cgroup setting by default set to 60 which means 60% of memory can potentially be swapped out when system memory is not enough.
> "yarn.nodemanager.linux-container-executor.group" set to Unix group of the NodeManager which should match the setting in “container-executor.cfg” (hadoop for ambari?).
> {code}
>  - For cgroups limitations:
> {code:java}
> "yarn.nodemanager.resource.percentage-physical-cpu-limit" - 
> this setting lets you limit the cpu usage of all YARN containers. It sets a hard upper limit on the cumulative CPU usage of the containers. For example, if set to 60, the combined CPU usage of all YARN containers will not exceed 60%. The yarn by default value is 100.
> "yarn.nodemanager.resource.cpu-vcores" - number of vcores can be assign to yarn containers, default value is 8 for yarn, but ambari should set a proper value in considering of NM size, etc.
> "yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" - 
> CGroups allows cpu usage limits to be hard or soft. When this setting is true, containers cannot use more CPU usage than allocated even if spare CPU is available. This ensures that containers can only use CPU that they were allocated. When set to false, containers can use spare CPU if available. It should be noted that irrespective of whether set to true or false, at no time can the combined CPU usage of all containers exceed the value specified in 
> “yarn.nodemanager.resource.percentage-physical-cpu-limit”.
> Talked with peers, we run into kernel panic when set hard limit before, so we should know there is risk to set this to true. May need a documentation? 
> {code}
> 2) For non-secure cluster (this needs to be done when we move from secure to non-secure):
>  - In container-executor.cfg: Remove "yarn" from banned user ({{banned.users}}). And set {{min.user.id}} to 50.
>  - In yarn-site.xml: change:
> {code:java}
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=true
> yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=yarn
> {code}
> 3) When moving from non-secure to secure:
>  - In container-executor.cfg:
>  Add "yarn" user to banned user ({{banned.users}})
>  Set {{min.user.id}} to existing default in Ambari (IIRC it's 1000).
>  - Revert yarn-site.xml following configs to:
> {code:java}
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=false
> yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=nobody 
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)