You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by GitBox <gi...@apache.org> on 2020/03/01 10:14:14 UTC

[GitHub] [submarine] ChanaLii opened a new issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM and container-executor.cfg

ChanaLii opened a new issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM  and container-executor.cfg
URL: https://github.com/apache/submarine/issues/198
 
 
   I am following the documentation to set up GPU for ResourceManager, NodeManager and container-executor.cfg in my environment.
   Then I turned to restart hadoop with the following code:
   `
   ARN_LOGFILE=resourcemanager.log ./sbin/yarn-daemon.sh start resourcemanager
   YARN_LOGFILE=nodemanager.log ./sbin/yarn-daemon.sh start nodemanager
   YARN_LOGFILE=timeline.log ./sbin/yarn-daemon.sh start timelineserver
   YARN_LOGFILE=mr-historyserver.log ./sbin/mr-jobhistory-daemon.sh start historyserver
   `
   
   I used the ** jps ** command to see if the service was running. Unfortunately, I found that the nodemanager service was not started. Then I found some errors in hadoop-root-nodemanager-71192c388b55.log
   
   `2020-03-01 09:52:38,744 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Failed to bootstrap configured resource subsystems! 
   org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Controller devices not mounted. You either need to mount it with yarn.nodemanager.linux-container-executor.cgroups.mount or mount cgroups before launching Yarn
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:392)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:370)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.bootstrap(GpuResourceHandlerImpl.java:93)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler.serviceInit(ContainerScheduler.java:146)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:323)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:516)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054)
   2020-03-01 09:52:38,744 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler failed in state INITED
   java.io.IOException: Failed to bootstrap configured resource subsystems!
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler.serviceInit(ContainerScheduler.java:150)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:323)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:516)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054)
   2020-03-01 09:52:38,745 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state INITED
   org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to bootstrap configured resource subsystems!
   	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
   	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:323)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:516)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974)
   	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054)
   Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems!
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler.serviceInit(ContainerScheduler.java:150)
   	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
   	... 8 more
   `
   
   It seems the env didn't mount "/sys/fs/cgroup",here's my docker started command:
   `
   ➜  Downloads docker run -it -v /data/docker-images/:/sys/fs/cgroup -m 10G 968d612886ee bash
   `
   somebody can help me ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] ChanaLii commented on issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM and container-executor.cfg

Posted by GitBox <gi...@apache.org>.
ChanaLii commented on issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM  and container-executor.cfg
URL: https://github.com/apache/submarine/issues/198#issuecomment-596042040
 
 
   I didn't configure it in the container, but I mounted "/sys/fs/cgroup" from the host machine when I started the Docker container.
   ```
   docker run -it -v /sys/fs/cgroup:/sys/fs/cgroup -m 10G 968d612886ee bash
   ```
   
   My docker-mirror is not configured with 'sys/fs/cgroup', but my host machine is already configured, so I think mounting '/sys/fs/cgroup' from the host can solve this problem. It turned out that I was right.The best solution is configure it in the container, not mount '/sys/fs/cgroup' from the host machine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] ChanaLii closed issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM and container-executor.cfg

Posted by GitBox <gi...@apache.org>.
ChanaLii closed issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM  and container-executor.cfg
URL: https://github.com/apache/submarine/issues/198
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [submarine] tangzhankun commented on issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM and container-executor.cfg

Posted by GitBox <gi...@apache.org>.
tangzhankun commented on issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM  and container-executor.cfg
URL: https://github.com/apache/submarine/issues/198#issuecomment-602022000
 
 
   @ChanaLii Thanks for the update. Glad that you fixed it.
   Can we resolve this issue?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [submarine] yuanzac commented on issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM and container-executor.cfg

Posted by GitBox <gi...@apache.org>.
yuanzac commented on issue #198: restart hadoop services occurred an error when I finished the GPU setting for RM、NM  and container-executor.cfg
URL: https://github.com/apache/submarine/issues/198#issuecomment-594298559
 
 
   It looks like there is something wrong with cgroup configuration. 
   Please refer to https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org