You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Antal Sasvári (JIRA)" <ji...@apache.org> on 2013/10/08 12:52:41 UTC

[jira] [Commented] (CURATOR-45) LeaderSelector threw exception, but still created ephemeral node, breaking everything

    [ https://issues.apache.org/jira/browse/CURATOR-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789093#comment-13789093 ] 

Antal Sasvári commented on CURATOR-45:
--------------------------------------

Was this patch also tested with autoRequeue enabled?

I have changed TestLeaderSelectorEdges.flappingTest() to enable autoRequeue() for leaderSelector1, and it seems that more and more ephemeral nodes keep getting created and the deleted (with increasing sequence numbers), and leaderSelector1 is getting and loosing leadership all the time.

It looks like the new LE ephemeral node would be constantly deleted in the background, and then recreated again because of autoRequeue.


> LeaderSelector threw exception, but still created ephemeral node, breaking everything
> -------------------------------------------------------------------------------------
>
>                 Key: CURATOR-45
>                 URL: https://issues.apache.org/jira/browse/CURATOR-45
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework, Recipes
>    Affects Versions: 2.2.0-incubating
>            Reporter: Shevek
>            Assignee: Jordan Zimmerman
>             Fix For: 2.3.0
>
>         Attachments: CURATOR-45.patch
>
>
> ZooKeeper hiccupped, and then this happened:
>     2013-06-19 02:23:35,561 DEBUG [LeaderSelector-1] com.netflix.curator.RetryLoop.takeException (RetryLoop.java:184) - Retry-able exception received
>     org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /[REMOVED]/election/_c_1ccdb2b9-7f9a-4570-9555-201c91ec2dcb-lock-
>             at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.5.0.jar:3.5.0--1]
>             at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.5.0.jar:3.5.0--1]
>             at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:876) ~[zookeeper-3.5.0.jar:3.5.0--1]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625) ~[curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609) ~[curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106) [curator-client-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605) [curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428) [curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41) [curator-framework-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:218) [curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218) [curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74) [curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:314) [curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:373) [curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:46) [curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at com.netflix.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:195) [curator-recipes-1.3.5-SNAPSHOT.jar:?]
>             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.6.0_27]
>             at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.6.0_27]
>             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) [?:1.6.0_27]
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.6.0_27]
>             at java.lang.Thread.run(Thread.java:679) [?:1.6.0_27]
> However, the ephemeral node got created, and this hung leader election for this path.
> I'm investigating to work out where to put an extra guaranteed-delete. I see the case in LockInternals, which sometimes triggers to do this cleanup, but it didn't trigger in this case.
> You must really love our bugs by now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)