You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2014/06/19 20:50:25 UTC

[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

    [ https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037664#comment-14037664 ] 

Junping Du commented on YARN-2019:
----------------------------------

[~kasha], sorry that I ignored your comments as my email/company changed during that time. My thought on right behave is:
If any issue in ZK cluster side, although it is distributed and should be more robust but could down due to bug or bad configuration, we can let ActiveRM continue to run as no-HA case. In addition, we should report Admin that the HA is not playing well, and let admin to decide when it is the proper timeline to bring down RM and reconfigure the HA things. Make sense?

> Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2019
>                 URL: https://issues.apache.org/jira/browse/YARN-2019
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Junping Du
>            Priority: Critical
>              Labels: ha
>         Attachments: YARN-2019.1-wip.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal exception to crash RM down. As shown in YARN-1924, it could due to RM HA internal bug itself, but not fatal exception. We should retrospect some decision here as HA feature is designed to protect key component but not disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)