You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2013/08/16 22:27:48 UTC
[jira] [Updated] (YARN-1073) NM to recognise when it can't spawn
process and stop accepting containers
[ https://issues.apache.org/jira/browse/YARN-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated YARN-1073:
---------------------------------
Summary: NM to recognise when it can't spawn process and stop accepting containers (was: NM to recognise when it can't span process and stop accepting containers)
> NM to recognise when it can't spawn process and stop accepting containers
> -------------------------------------------------------------------------
>
> Key: YARN-1073
> URL: https://issues.apache.org/jira/browse/YARN-1073
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 2.1.0-beta
> Environment: OS/X with not enough file handles
> Reporter: Steve Loughran
> Priority: Minor
>
> when creating too many containers with a claimed resource use of 0 RAM or vCores, the NM got to the state where exec() was continually failing -but nothing seemed to recognise this and blacklist the node.
> Something should be noting that all container launches for an app/container are failing and do something. While AMs can/should code this, NM failure is something at the YARN-level
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira