You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by sunww <sp...@outlook.com> on 2015/12/16 04:07:33 UTC

container fail after nodemanager restart

HI
    I'm using hadoop 2.7.1 witch kerberos enabled. After I restart a nodemanager, some of the nodemanager's containers  sometimes failed.
    I find some error log in the nodemanage. Any suggestion will be appreciated. Thanks.
    And the container-executor is like this:
    ---Sr-s--- 1 root hadoop 114398 Oct  1 02:31 container-executor
    
2015-12-16 09:06:55,478 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(286)) - The configured nodemanager group 1001 is different from the group of the executable 0
2015-12-16 09:06:55,479 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88)) - Unable to recover container container_e14_1449136946007_0006_01_000010
java.io.IOException: Problem signalling container 967 with NULL; output: The configured nodemanager group 1001 is different from the group of the executable 0
 and exitCode: 22
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:483)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerProcessAlive(LinuxContainerExecutor.java:538)
        at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:182)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:441)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary.

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
        at org.apache.hadoop.util.Shell.run(Shell.java:487)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:474)
        ... 9 more