You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sidharta Seethana (JIRA)" <ji...@apache.org> on 2016/03/04 03:50:40 UTC

[jira] [Commented] (YARN-4762) NMs failing on DelegatingLinuxContainerRuntime init with LCE on

    [ https://issues.apache.org/jira/browse/YARN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179194#comment-15179194 ] 

Sidharta Seethana commented on YARN-4762:
-----------------------------------------

/cc [~vvasudev]

When the new resource handler mechanism was introduced a CGroupHandlerImpl instance was only created/initialized if one of the resource handlers was enabled. Initialization does one of the following : 

#  if mounting of cgroups is enabled, does not mount anything because mounting is done on demand for individual resource handlers 
#  If mounting of cgroups is disabled, ‘initializeControllerPathsFromMtab’ gets called - which checks for writability for each of the cgroup mounts.  

(2) was correct behavior because the cgroups handler wasn’t created unless at least one of the (cgroups based) resource handlers was in use. However, with YARN-4553 , a CGroupsHandler is always created, even if there are no cgroups-based handlers in use. This (incorrectly) leads to an attempt to check if cgroups' mount paths are writable. 

I'll take a look at fixing this.

> NMs failing on DelegatingLinuxContainerRuntime init with LCE on
> ---------------------------------------------------------------
>
>                 Key: YARN-4762
>                 URL: https://issues.apache.org/jira/browse/YARN-4762
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>
> Seeing this exception and the NMs crash.
> {code}
> 2016-03-03 16:47:57,807 DEBUG org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService is started
> 2016-03-03 16:47:58,027 DEBUG org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: checkLinuxExecutorSetup: [/hadoop/hadoop-yarn-nodemanager/bin/container-executor, --checksetup]
> 2016-03-03 16:47:58,043 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: Mount point Based on mtab file: /proc/mounts. Controller mount point not writable for: cpu
> 2016-03-03 16:47:58,043 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime: Unable to get cgroups handle.
> 2016-03-03 16:47:58,044 DEBUG org.apache.hadoop.service.AbstractService: noteFailure org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
> 2016-03-03 16:47:58,044 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container runtime(s)!
>         at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
>         ... 3 more
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: Service: NodeManager entered state STOPPED
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.CompositeService: NodeManager: stopping services, size=0
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: Service: org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService entered state STOPPED
> 2016-03-03 16:47:58,047 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container runtime(s)!
>         at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
>         ... 3 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)