You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Julio Lopez (JIRA)" <ji...@apache.org> on 2013/06/24 23:32:21 UTC

[jira] [Commented] (CURATOR-15) LeaderSelector may (undetectably) fail to elect

    [ https://issues.apache.org/jira/browse/CURATOR-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692407#comment-13692407 ] 

Julio Lopez commented on CURATOR-15:
------------------------------------

Here is an occurrence, caused by UnknownHostException.  Perhaps, either LeaderSelector or InterProcessMutex should handle these cases and retry.

{{
E 06-23 03:30:07.106 LeaderSelector-0 c.n.c.f.r.l.LeaderSelector:349 |::] mutex.acquire() threw an exception
java.net.UnknownHostException: xyz.example.com
        at java.net.InetAddress.getAllByName0(Unknown Source) ~[...]
        at java.net.InetAddress.getAllByName(Unknown Source) ~[...]
        at java.net.InetAddress.getAllByName(Unknown Source) ~[...]
        at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60) ~[...]
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) ~[...]
        at com.netflix.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:27) ~[...]
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:166) ~[...]
        at com.netflix.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) ~[...]
        at com.netflix.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) ~[...]
        at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:112) ~[...]
        at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107) ~[...]
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:448) ~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625) ~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609) ~[...]
        at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106) ~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605) ~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428) ~[...]
        at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41) ~[...]
        at com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:218) ~[...]
        at com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218) ~[...]
        at com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74) ~[...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:313) [...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:374) [...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:45) [...]
        at com.netflix.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:194) [...]
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) [na:1.6.0_32]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.6.0_32]
}}
                
> LeaderSelector may (undetectably) fail to elect
> -----------------------------------------------
>
>                 Key: CURATOR-15
>                 URL: https://issues.apache.org/jira/browse/CURATOR-15
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.0.0-incubating
>            Reporter: Shevek
>             Fix For: TBD
>
>
> In LeaderSelector, if mutex.acquire() throws an Exception, for example because CuratorFramework.getZooKeeper() threw a previously-enqueued background exception, then that failure will propagate out of doWork and doWorkLoop, and kill the background submission onto the executor service.
> This means that a leaderselector which was start()ed will NEVER elect, and this situation is NOT DETECTABLE externally, since that exception happens on a private executorservice thread and is not client visible. It's impossible to look at a LeaderSelector and decide whether it is still "viable".
> This can leave a machine/process "hung" and not automatically recoverable within curator.
> Either isQueued() needs to be exposed, which means that a leader is either elected or queued; or the finally{} block which calls clearIsQueued() needs also to set state to CLOSED or FAILED, so that we can query this failure externally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira