You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Daniel Templeton (JIRA)" <ji...@apache.org> on 2015/11/30 22:59:11 UTC
[jira] [Created] (YARN-4401) A failed app recovery should not
prevent the RM from starting
Daniel Templeton created YARN-4401:
--------------------------------------
Summary: A failed app recovery should not prevent the RM from starting
Key: YARN-4401
URL: https://issues.apache.org/jira/browse/YARN-4401
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical
There are many different reasons why an app recovery could fail with an exception, causing the RM start to be aborted. If that happens the RM will fail to start. Presumably, the reason the RM is trying to do a recovery is that it's the standby trying to fill in for the active. Failing to come up defeats the purpose of the HA configuration. Instead of preventing the RM from starting, a failed app recovery should log an error and skip the application.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)