You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Daniel Templeton (JIRA)" <ji...@apache.org> on 2017/02/17 16:54:41 UTC

[jira] [Commented] (HADOOP-10584) ActiveStandbyElector goes down if ZK quorum become unavailable

    [ https://issues.apache.org/jira/browse/HADOOP-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872106#comment-15872106 ] 

Daniel Templeton commented on HADOOP-10584:
-------------------------------------------

I'm looking at this issue now, and it seems to me that the issue could be resolved by reseting the retry counts when the session is reconnected.  If we've lost the session, then whatever retry counts we had previously don't really apply anymore, so we should reset them on reconnect.  It looks like this issue is happening only in the case that the ZK connection is flaky.

> ActiveStandbyElector goes down if ZK quorum become unavailable
> --------------------------------------------------------------
>
>                 Key: HADOOP-10584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10584
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.4.0
>            Reporter: Karthik Kambatla
>            Priority: Critical
>         Attachments: hadoop-10584-prelim.patch, rm.log
>
>
> ActiveStandbyElector retries operations for a few times. If the ZK quorum itself is down, it goes down and the daemons will have to be brought up again. 
> Instead, it should log the fact that it is unable to talk to ZK, call becomeStandby on its client, and continue to attempt connecting to ZK.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org