You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2014/11/10 16:37:34 UTC

[jira] [Commented] (YARN-2839) YARN minicluster doesn't bail out if all the NM disks are dead

    [ https://issues.apache.org/jira/browse/YARN-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204901#comment-14204901 ] 

Steve Loughran commented on YARN-2839:
--------------------------------------

We don't see a stack trace; we see ERRORS in the logs

{code}
2014-11-10 03:02:18,431 [Thread-2] INFO  nodemanager.LocalDirsHandlerService (LocalDirsHandlerService.java:logDiskStatus(339)) - Disk(s) failed: 1/1 local-dirs are bad: /tmp/jenkins/workspace/slider-core/target/testexistsfailswithunknowncluster/testexistsfailswithunknowncluster-localDir-nm-0_0; 1/1 log-dirs are bad: /tmp/jenkins/workspace/slider-core/target/testexistsfailswithunknowncluster/testexistsfailswithunknowncluster-logDir-nm-0_0
2014-11-10 03:02:18,432 [Thread-2] ERROR nodemanager.LocalDirsHandlerService (LocalDirsHandlerService.java:updateDirsAfterTest(332)) - Most of the disks failed. 1/1 local-dirs are bad: /tmp/jenkins/workspace/slider-core/target/testexistsfailswithunknowncluster/testexistsfailswithunknowncluster-localDir-nm-0_0; 1/1 log-dirs are bad: /tmp/jenkins/workspace/slider-core/target/testexistsfailswithunknowncluster/testexistsfailswithunknowncluster-logDir-nm-0_0
2014-11-10 03:02:18,433 [Thread-2] INFO  localizer.ResourceLocalizationService (ResourceLocalizationService.java:validateConf(216)) - per directory file limit = 8192
{code}

> YARN minicluster doesn't bail out if all the NM disks are dead
> --------------------------------------------------------------
>
>                 Key: YARN-2839
>                 URL: https://issues.apache.org/jira/browse/YARN-2839
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.6.0
>            Reporter: Steve Loughran
>
> Some jenkins tests of mine have been failing deep in the resource localization process. Iif all the disks of the NMs are considered bad they refuse to work, but the Yarn Minicluster doesn't fail itself.
> YARN-90 assumes that the NM disks will come back. This isn't likely to hold in a short-lived mini cluster —better to have it probe the NMs and fail if they aren't healthy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)