You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@curator.apache.org by "Jordan Zimmerman (JIRA)" <ji...@apache.org> on 2015/09/22 21:15:05 UTC

[jira] [Comment Edited] (CURATOR-264) Leader election: Duplicate ephemeral nodes with same owner id

    [ https://issues.apache.org/jira/browse/CURATOR-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903255#comment-14903255 ] 

Jordan Zimmerman edited comment on CURATOR-264 at 9/22/15 7:14 PM:
-------------------------------------------------------------------

I see the problem. It's visible in the log you posted:

{noformat}
2015-09-21 11:16:09,564 DEBUG [tor-LeaderSelector-0] o.a.c.framework.imps.FailedDeleteManager T: S: U: A: D: - Path
 being added to guaranteed delete set: /test/leader/_c_6a48bcc8-593c-48d6-8f78-ee8ed6416d5a-lock-0000000775
{noformat}

The path being added to the FailedDeleteManager does not contain the namespace! Doh!


was (Author: randgalt):
I see the problem. It's visible in the log you posted:

{noformat}
2015-09-21 11:16:09,564 DEBUG [tor-LeaderSelector-0] o.a.c.framework.imps.FailedDeleteManager T: S: U: A: D: - Path being added to guaranteed delete set: /test/leader/_c_6a48bcc8-593c-48d6-8f78-ee8ed6416d5a-lock-0000000775
{noformat}

The path being added to the FailedDeleteManager does not contain the namespace! Doh!

> Leader election: Duplicate ephemeral nodes with same owner id
> -------------------------------------------------------------
>
>                 Key: CURATOR-264
>                 URL: https://issues.apache.org/jira/browse/CURATOR-264
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework, Recipes
>    Affects Versions: 2.8.0
>            Reporter: Ole Hjalmar Herje
>            Assignee: Jordan Zimmerman
>            Priority: Blocker
>             Fix For: 2.9.1
>
>         Attachments: testLog.txt, zkNodes.txt, zkTransactionLog.txt
>
>
> We sometimes experience failure in our leader-election functionality when we have network issues. When this situation occurs we see that there are two ephemeral nodes in the zookeeper cluster for the same session but there is no active leader. 
> I have managed to recreate the same scenario by running a test locally and use iptables to simulate network issues. The debug log (see attachment) shows that findAndDeleteProtectedNodeInBackground does not delete the node because processResult in FindProtectedNodeCB receives a -101 (NoNode) resultcode. I suspect this can happen if the read is not synched? (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)
> This also seems to be related to: 
> https://issues.apache.org/jira/browse/CURATOR-45 and
> https://issues.apache.org/jira/browse/CURATOR-79 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)