You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Prabhu Joseph (JIRA)" <ji...@apache.org> on 2018/09/13 22:33:00 UTC

[jira] [Created] (YARN-8773) Blacklisting support for scheduling AMs for Apple HDP-2.2.9

Prabhu Joseph created YARN-8773:
-----------------------------------

             Summary: Blacklisting support for scheduling AMs for Apple HDP-2.2.9
                 Key: YARN-8773
                 URL: https://issues.apache.org/jira/browse/YARN-8773
             Project: Hadoop YARN
          Issue Type: Bug
          Components: scheduler
    Affects Versions: 2.2.0
            Reporter: Prabhu Joseph
            Assignee: Wangda Tan


MapReduce jobs failed with both AM attempts failing on same node - the node had some issue. Both AM attempts are placed on same node as there is no blacklisting feature. Customer is expecting a fix for YARN-2005 + YARN-4389. Is it possible to backport it to HDP-2.2.9 and do we have any better workaround to avoid this issue. 

{code}
"2018-08-18 11:32:57,855 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=edwaefrp        OPERATION=Application Finished - Failed TARGET=RMAppManager     RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       PERMISSIONS=Application application_1529242338015_465184 failed 2 times due to AM Container for appattempt_1529242338015_465184_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:https://ma4-gbihrcp-lnn14.corp.apple.com:8078/proxy/application_1529242338015_465184/Then, click on links to logs of each attempt.
Diagnostics: Error while running command to get file permissions : ExitCodeException exitCode=139:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
        at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1103)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1411)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1375)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$1000(ResourceLocalizationService.java:139)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1102)
Failing this attempt. Failing the application.  APPID=application_1529242338015_465184","2018-08-18T11:32:57.855+0000","ma4-gbihrcp-lnn14.corp.apple.com","gbi_hadoop_prod_hrc_core",16,"/ngs/app/yarn/hadoop/logs/yarn/yarn-yarn-resourcemanager-ma4-gbihrcp-lnn14.corp.apple.com.log","yarn_resourcemanager_log","in-gncs-159.corp.apple.com",
"2018-08-18 11:32:57,855 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1529242338015_465184 failed 2 times due to AM Container for appattempt_1529242338015_465184_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:https://ma4-gbihrcp-lnn14.corp.apple.com:8078/proxy/application_1529242338015_465184/Then, click on links to logs of each attempt.
Diagnostics: Error while running command to get file permissions : ExitCodeException exitCode=139:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
        at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1103)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1411)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1375)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$1000(ResourceLocalizationService.java:139)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1102)
Failing this attempt. Failing the application.","2018-08-18T11:32:57.855+0000","ma4-gbihrcp-lnn14.corp.apple.com","gbi_hadoop_prod_hrc_core",16,"/ngs/app/yarn/hadoop/logs/yarn/yarn-yarn-resourcemanager-ma4-gbihrcp-lnn14.corp.apple.com.log","yarn_resourcemanager_log","in-gncs-159.corp.apple.com",

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org