You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Su Robi (JIRA)" <ji...@apache.org> on 2017/09/24 15:36:00 UTC

[jira] [Commented] (CURATOR-325) Background retry falls into infinite loop of SessionExpiredException

    [ https://issues.apache.org/jira/browse/CURATOR-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178253#comment-16178253 ] 

Su Robi commented on CURATOR-325:
---------------------------------

[~randgalt] [~wlongdu]

Hi, I seem meet a similar problem..

After div into code, I found this problem is caused by read data(getData or getChild) with `RetryForever like` retry policy in our custom watcher implements.

As result,  when session closed, EventThread maybe fall into retry infinite loop in custom watcher, and no any chance to give curator's watcher --- `ConnectionState#process` to handleExpiredSession and make `ClientCnxn#state` alive again(which is needed to break infinite loop).

This problem can be solve if  we don't modify zookeeper/curator:

- not use forever retry policy..and infinite loop for "a while" - -
- or like `PathCache` does, send task to another thread after receive WatchedEvent

but I think it seems a hole that user defined watcher may block framework watcher, but framework watcher  is vital to user's watcher..

Is any ideal curator can do to improve this problem ^ ^?


> Background retry falls into infinite loop of SessionExpiredException
> --------------------------------------------------------------------
>
>                 Key: CURATOR-325
>                 URL: https://issues.apache.org/jira/browse/CURATOR-325
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.9.1, 2.10.0
>         Environment: sun java jdk 1.7.0_55, curator 2.9.1, zookeeper 3.4.6
>            Reporter: clive du
>              Labels: SessionExpiredException, loop
>
> after long time gc pause,which longer than zookeeper session time,the zookeeper cluster invalidate the session id holding by the client and waiting the client to reconnect,but client consider the  SessionExpiredException as retry exception and re-put to the background queue,so wo get the stacktrace infinitely.
> 12:50:54.337 [configuration-0-EventThread] DEBUG org.apache.curator.RetryLoop - Retrying operation
> 12:50:54.337 [configuration-0-EventThread] DEBUG org.apache.curator.RetryLoop - Retry-able exception received
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /dynamic/apps/258741001/DEV
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:304) ~[curator-framework-2.10.0.jar:na]
>     at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:293) ~[curator-framework-2.10.0.jar:na]
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) ~[curator-client-2.10.0.jar:na]
>     at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:290) [curator-framework-2.10.0.jar:na]
>     at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:281) [curator-framework-2.10.0.jar:na]
>     at org.apache.curator.framework.imps.GetDataBuilderImpl$1.forPath(GetDataBuilderImpl.java:105) [curator-framework-2.10.0.jar:na]
>     at org.apache.curator.framework.imps.GetDataBuilderImpl$1.forPath(GetDataBuilderImpl.java:65) [curator-framework-2.10.0.jar:na]
>     at com.ctrip.flight.configuration.client.AbstractZookeeperClient.getData(AbstractZookeeperClient.java:68) [classes/:na]
>     at com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource.getPublishNodeValue(ZooKeeperConfigurationSource.java:258) [classes/:na]
>     at com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource.access$100(ZooKeeperConfigurationSource.java:45) [classes/:na]
>     at com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource$1.nodeChanged(ZooKeeperConfigurationSource.java:105) [classes/:na]
>     at org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:310) [curator-recipes-2.10.0.jar:na]
>     at org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:304) [curator-recipes-2.10.0.jar:na]
>     at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.10.0.jar:na]
>     at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:310) [guava-19.0.jar:na]
>     at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.10.0.jar:na]
>     at org.apache.curator.framework.recipes.cache.NodeCache.setNewData(NodeCache.java:302) [curator-recipes-2.10.0.jar:na]
>     at org.apache.curator.framework.recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269) [curator-recipes-2.10.0.jar:na]
>     at org.apache.curator.framework.recipes.cache.NodeCache.access$300(NodeCache.java:56) [curator-recipes-2.10.0.jar:na]
>     at org.apache.curator.framework.recipes.cache.NodeCache$3.processResult(NodeCache.java:122) [curator-recipes-2.10.0.jar:na]
>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749) [curator-framework-2.10.0.jar:na]
>     at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522) [curator-framework-2.10.0.jar:na]
>     at org.apache.curator.framework.imps.GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:256) [curator-framework-2.10.0.jar:na]
>     at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:561) [zookeeper-3.4.6.jar:3.4.6-1569965]
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) [zookeeper-3.4.6.jar:3.4.6-1569965]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)