You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Prabhu Joseph (JIRA)" <ji...@apache.org> on 2018/09/13 22:33:00 UTC
[jira] [Created] (YARN-8773) Blacklisting support for scheduling
AMs for Apple HDP-2.2.9
Prabhu Joseph created YARN-8773:
-----------------------------------
Summary: Blacklisting support for scheduling AMs for Apple HDP-2.2.9
Key: YARN-8773
URL: https://issues.apache.org/jira/browse/YARN-8773
Project: Hadoop YARN
Issue Type: Bug
Components: scheduler
Affects Versions: 2.2.0
Reporter: Prabhu Joseph
Assignee: Wangda Tan
MapReduce jobs failed with both AM attempts failing on same node - the node had some issue. Both AM attempts are placed on same node as there is no blacklisting feature. Customer is expecting a fix for YARN-2005 + YARN-4389. Is it possible to backport it to HDP-2.2.9 and do we have any better workaround to avoid this issue.
{code}
"2018-08-18 11:32:57,855 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=edwaefrp OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1529242338015_465184 failed 2 times due to AM Container for appattempt_1529242338015_465184_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:https://ma4-gbihrcp-lnn14.corp.apple.com:8078/proxy/application_1529242338015_465184/Then, click on links to logs of each attempt.
Diagnostics: Error while running command to get file permissions : ExitCodeException exitCode=139:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1103)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1411)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1375)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$1000(ResourceLocalizationService.java:139)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1102)
Failing this attempt. Failing the application. APPID=application_1529242338015_465184","2018-08-18T11:32:57.855+0000","ma4-gbihrcp-lnn14.corp.apple.com","gbi_hadoop_prod_hrc_core",16,"/ngs/app/yarn/hadoop/logs/yarn/yarn-yarn-resourcemanager-ma4-gbihrcp-lnn14.corp.apple.com.log","yarn_resourcemanager_log","in-gncs-159.corp.apple.com",
"2018-08-18 11:32:57,855 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1529242338015_465184 failed 2 times due to AM Container for appattempt_1529242338015_465184_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:https://ma4-gbihrcp-lnn14.corp.apple.com:8078/proxy/application_1529242338015_465184/Then, click on links to logs of each attempt.
Diagnostics: Error while running command to get file permissions : ExitCodeException exitCode=139:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1103)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1411)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1375)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$1000(ResourceLocalizationService.java:139)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1102)
Failing this attempt. Failing the application.","2018-08-18T11:32:57.855+0000","ma4-gbihrcp-lnn14.corp.apple.com","gbi_hadoop_prod_hrc_core",16,"/ngs/app/yarn/hadoop/logs/yarn/yarn-yarn-resourcemanager-ma4-gbihrcp-lnn14.corp.apple.com.log","yarn_resourcemanager_log","in-gncs-159.corp.apple.com",
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org