You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Julio Lopez (JIRA)" <ji...@apache.org> on 2013/06/24 23:32:21 UTC
[jira] [Commented] (CURATOR-15) LeaderSelector may (undetectably)
fail to elect
[ https://issues.apache.org/jira/browse/CURATOR-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692407#comment-13692407 ]
Julio Lopez commented on CURATOR-15:
------------------------------------
Here is an occurrence, caused by UnknownHostException. Perhaps, either LeaderSelector or InterProcessMutex should handle these cases and retry.
{{
E 06-23 03:30:07.106 LeaderSelector-0 c.n.c.f.r.l.LeaderSelector:349 |::] mutex.acquire() threw an exception
java.net.UnknownHostException: xyz.example.com
at java.net.InetAddress.getAllByName0(Unknown Source) ~[...]
at java.net.InetAddress.getAllByName(Unknown Source) ~[...]
at java.net.InetAddress.getAllByName(Unknown Source) ~[...]
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60) ~[...]
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) ~[...]
at com.netflix.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:27) ~[...]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:166) ~[...]
at com.netflix.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) ~[...]
at com.netflix.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) ~[...]
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:112) ~[...]
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107) ~[...]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:448) ~[...]
at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625) ~[...]
at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609) ~[...]
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106) ~[...]
at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605) ~[...]
at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428) ~[...]
at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41) ~[...]
at com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:218) ~[...]
at com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218) ~[...]
at com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74) ~[...]
at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:313) [...]
at com.netflix.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:374) [...]
at com.netflix.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:45) [...]
at com.netflix.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:194) [...]
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) [na:1.6.0_32]
at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.6.0_32]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) [na:1.6.0_32]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.6.0_32]
}}
> LeaderSelector may (undetectably) fail to elect
> -----------------------------------------------
>
> Key: CURATOR-15
> URL: https://issues.apache.org/jira/browse/CURATOR-15
> Project: Apache Curator
> Issue Type: Bug
> Components: Recipes
> Affects Versions: 2.0.0-incubating
> Reporter: Shevek
> Fix For: TBD
>
>
> In LeaderSelector, if mutex.acquire() throws an Exception, for example because CuratorFramework.getZooKeeper() threw a previously-enqueued background exception, then that failure will propagate out of doWork and doWorkLoop, and kill the background submission onto the executor service.
> This means that a leaderselector which was start()ed will NEVER elect, and this situation is NOT DETECTABLE externally, since that exception happens on a private executorservice thread and is not client visible. It's impossible to look at a LeaderSelector and decide whether it is still "viable".
> This can leave a machine/process "hung" and not automatically recoverable within curator.
> Either isQueued() needs to be exposed, which means that a leader is either elected or queued; or the finally{} block which calls clearIsQueued() needs also to set state to CLOSED or FAILED, so that we can query this failure externally.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira