You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Peter Simon (JIRA)" <ji...@apache.org> on 2017/12/28 13:38:00 UTC
[jira] [Created] (YARN-7686) Yarn containers failover if
datanode/nodemanager fails
Peter Simon created YARN-7686:
---------------------------------
Summary: Yarn containers failover if datanode/nodemanager fails
Key: YARN-7686
URL: https://issues.apache.org/jira/browse/YARN-7686
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Peter Simon
While running an application on Yarn, one of the datanodes/nodemanagers went offline due to power issues. The first application attempt was failed due to lost containers. When the second attempt started, there were no heartbeat interval happened to the Namenode, and the second attempt still got the datanode/nodemanager as possible worker node for the containers. While the host was unreachable, therefore the container attempts were failed, led to the second application attempt also failed, caused the application failure.
There could be a failover process for container attempts, so if on one node new container can't be brought up, the ResourceManager should try to allocate the new container on a different node.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org