You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "meng.ye (Jira)" <ji...@apache.org> on 2019/09/24 14:45:00 UTC
[jira] [Comment Edited] (YARN-8645) Yarn NM fail to start when
remount cpu control group
[ https://issues.apache.org/jira/browse/YARN-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936875#comment-16936875 ]
meng.ye edited comment on YARN-8645 at 9/24/19 2:44 PM:
--------------------------------------------------------
I met the same issue with YARN 3.1.1 of HDP3.1 after enabling GPU by Ambari
{code:java}
yarn version Hadoop 3.1.1.3.1.0.0-78
{code}
OS version:
{code:java}
cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
{code}
NodeManager log:
{code:java}
2019-09-24 14:21:30,159 INFO resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:mountCGroupController(317)) - Mounting controller cpu at /sys/fs/cgroup/cpu
2019-09-24 14:21:30,161 WARN privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell execution returned exit code: 32. Privileged Execution Operation Stderr:
Feature disabled: mount cgroup
Stdout:
Full command array for failed execution:
[/usr/hdp/3.1.0.0-78/hadoop-yarn/bin/container-executor, --mount-cgroups, yarn, cpu,cpuacct=/sys/fs/cgroup/cpu]
2019-09-24 14:21:30,161 ERROR resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:mountCGroupController(324)) - Failed to mount controller: cpu
2019-09-24 14:21:30,161 ERROR nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:init(323)) - Failed to bootstrap configured resource subsystems!
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Failed to mount controller: cpu
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.mountCGroupController(CGroupsHandlerImpl.java:326)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:372)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:98)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:87)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
2019-09-24 14:21:30,163 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED
{code}
was (Author: ym8468):
I met the same issue with YARN 3.1.1 of HDP3.1 after enabling GPU by Ambari
{code:java}
yarn version Hadoop 3.1.1.3.1.0.0-78
{code}
> Yarn NM fail to start when remount cpu control group
> ----------------------------------------------------
>
> Key: YARN-8645
> URL: https://issues.apache.org/jira/browse/YARN-8645
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Jiandan Yang
> Priority: Major
>
> NM failed to start when we update Yarn to latest version. NM logs are as follows:
> {code:java}
> 2018-08-08 16:07:01,244 INFO [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: Mounting controller cpu at /sys/fs/cgroup/cpu
> 2018-08-08 16:07:01,246 WARN [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 32. Privileged Execution Operation Stderr:
> Feature disabled: mount cgroup
> Stdout:
> Full command array for failed execution:
> [/home/hadoop/hadoop_hbase/hadoop-current/bin/container-executor, --mount-cgroups, hadoop-yarn, cpu,cpuset,cpuacct=/sys/fs/cgroup/cpu]
> 2018-08-08 16:07:01,247 ERROR [main] org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: Failed to mount controller: cpu
> 2018-08-08 16:07:01,247 ERROR [main] org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Failed to mount controller: cpu
> {code}
> The cause of error is that 351cf87c92872d90f62c476f85ae4d02e485769c disable mounting cgroups by default in container-executor, which make container-executor return non-zero when executing mount-cgroups
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org