You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Todd Lipcon (Updated) (JIRA)" <ji...@apache.org> on 2012/03/26 06:58:31 UTC

[jira] [Updated] (HADOOP-8212) Improve ActiveStandbyElector's behavior when session expires

     [ https://issues.apache.org/jira/browse/HADOOP-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-8212:
--------------------------------

    Attachment: hadoop-8212.txt

Attached patch fixes the behavior to not notifyFatalError when the session is expired. The existing code already handled rejoining.

I also fixed a race bug I turned up where, after rejoining with a new zkClient, some old notifications from the previous zkClient could end up getting through. The watchers and callbacks now pass along the zkClient used to set them, and then in the callback, we check to make sure it is still current.

I also simplified the test case to no longer be multi-threaded, since it's much easier to follow as a linear progression, and the threads didn't buy us anything. I added test coverage around session expiration to cover the new code.
                
> Improve ActiveStandbyElector's behavior when session expires
> ------------------------------------------------------------
>
>                 Key: HADOOP-8212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8212
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.23.3, 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8212.txt
>
>
> Currently when the ZK session expires, it results in a fatal error being sent to the application callback. This is not the best behavior -- for example, in the case of HA, if ZK goes down, we would like the current state to be maintained, rather than causing either NN to abort. When the ZK clients are able to reconnect, they should sort out the correct leader based on the normal locking schemes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira