You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Henrik Nordvik (JIRA)" <ji...@apache.org> on 2014/02/12 16:56:19 UTC

[jira] [Updated] (CURATOR-73) No reliable way to restart leadership in LeaderSelector when connection fails due to edge cases

     [ https://issues.apache.org/jira/browse/CURATOR-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henrik Nordvik updated CURATOR-73:
----------------------------------

    Attachment: CURATOR-73.patch

I believe I have a possible fix for this issue. A patch is attached.

What the patch does is move the clearing of isQueued-flag out so that it is
 done whenever the actual callable executing takeLeadership() returns.

The isQueued-flag was not cleared when mutex.acquire() failed.
mutex.acquire() can fail when
1. you are not the current leader, but you are waiting for leadership. and
2. connection is lost or suspended.
Then the leader-selector cannot be requeued when the connection comes back.

Test case is also attached.

> No reliable way to restart leadership in LeaderSelector when connection fails due to edge cases
> -----------------------------------------------------------------------------------------------
>
>                 Key: CURATOR-73
>                 URL: https://issues.apache.org/jira/browse/CURATOR-73
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.3.0
>            Reporter: Henrik Nordvik
>         Attachments: CURATOR-73.patch
>
>
> This is related to CURATOR-54, and possibly also CURATOR-62.
> If a LeaderSelector-thread is cancelled (e.g. because of lost connection to zookeeper), there is no way of restarting it. 
> First it jumps out of the doWork-loop, because the interrupt flag is set.
> The isQueued flag is not reset when this happens, so requeue() does nothing, even though the thread has been parked.
> I'm using curator 2.3.0 with the new ListenerAdapter-way of handling stateChange().



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)