You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "meng.ye (Jira)" <ji...@apache.org> on 2019/09/24 14:45:00 UTC

[jira] [Comment Edited] (YARN-8645) Yarn NM fail to start when remount cpu control group

    [ https://issues.apache.org/jira/browse/YARN-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936875#comment-16936875 ] 

meng.ye edited comment on YARN-8645 at 9/24/19 2:44 PM:
--------------------------------------------------------

I met the same issue with YARN 3.1.1 of HDP3.1 after enabling GPU by Ambari
{code:java}
yarn version Hadoop 3.1.1.3.1.0.0-78
{code}

OS version:
{code:java}
cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
{code}

NodeManager log:
{code:java}
2019-09-24 14:21:30,159 INFO  resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:mountCGroupController(317)) - Mounting controller cpu at /sys/fs/cgroup/cpu
2019-09-24 14:21:30,161 WARN  privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell execution returned exit code: 32. Privileged Execution Operation Stderr:
Feature disabled: mount cgroup

Stdout:
Full command array for failed execution:
[/usr/hdp/3.1.0.0-78/hadoop-yarn/bin/container-executor, --mount-cgroups, yarn, cpu,cpuacct=/sys/fs/cgroup/cpu]
2019-09-24 14:21:30,161 ERROR resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:mountCGroupController(324)) - Failed to mount controller: cpu
2019-09-24 14:21:30,161 ERROR nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:init(323)) - Failed to bootstrap configured resource subsystems!
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Failed to mount controller: cpu
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.mountCGroupController(CGroupsHandlerImpl.java:326)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:372)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:98)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:87)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
2019-09-24 14:21:30,163 INFO  service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED
{code}



was (Author: ym8468):
I met the same issue with YARN 3.1.1 of HDP3.1 after enabling GPU by Ambari
{code:java}
yarn version Hadoop 3.1.1.3.1.0.0-78
{code}

> Yarn NM fail to start when remount cpu control group
> ----------------------------------------------------
>
>                 Key: YARN-8645
>                 URL: https://issues.apache.org/jira/browse/YARN-8645
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>
> NM failed to start when we update Yarn to latest version. NM logs are as follows:
> {code:java}
> 2018-08-08 16:07:01,244 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: Mounting controller cpu at /sys/fs/cgroup/cpu
> 2018-08-08 16:07:01,246 WARN [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 32. Privileged Execution Operation Stderr:
> Feature disabled: mount cgroup
> Stdout:
> Full command array for failed execution:
> [/home/hadoop/hadoop_hbase/hadoop-current/bin/container-executor, --mount-cgroups, hadoop-yarn, cpu,cpuset,cpuacct=/sys/fs/cgroup/cpu]
> 2018-08-08 16:07:01,247 ERROR [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: Failed to mount controller: cpu
> 2018-08-08 16:07:01,247 ERROR [main] org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Failed to mount controller: cpu
>  {code}
> The cause of error is that 351cf87c92872d90f62c476f85ae4d02e485769c disable mounting cgroups by default in container-executor, which make container-executor return non-zero when executing mount-cgroups



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org