You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Wang XiaoTian (JIRA)" <ji...@apache.org> on 2016/02/01 08:15:39 UTC

[jira] [Commented] (CURATOR-293) Curator can NOT reconnect after connection lost and session expired when the connection come up while the DNS server is not ready yet.(zookeeper connection string using domain names)

    [ https://issues.apache.org/jira/browse/CURATOR-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125861#comment-15125861 ] 

Wang XiaoTian commented on CURATOR-293:
---------------------------------------

We can solve the issue by calling the API"client.getZookeeperClient().getZooKeeper()" periodically when receiving the "ConnectionState.LOST" event and using a handler thread pool to process the arriving state events concurrently, so that the event will not blocked, obviously the client.getZookeeperClient().getZooKeeper() is a thread-safe API.

Actually the framework can do the same thing for the sake of fault-tolerant feature and do not enforce the user to handle it, just catch the exception and handle it appropriately instead of  putting it in a background exception queue and ignore it, by the way, I don't think the "client.getZookeeperClient().getZooKeeper()" is a public friendly API to the user.

Another issue is about the StaticHostProvider.java, it is implemented by InetAddress.java, and there is an addressCache in the InetAddress.java, see "https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/sun/net/InetAddressCachePolicy.java", the addressCache will cache the resolved hostname and when a given unresolved hostname be passed, the InetAddress try to resolve the hostname by querying the address cache at first time, I don't know why the last resolved hostname be lost in the cache. (perhaps for the reason of the cache policy)


> Curator can NOT reconnect after connection lost and session expired when the connection come up while the DNS server is not ready yet.(zookeeper connection string using domain names)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CURATOR-293
>                 URL: https://issues.apache.org/jira/browse/CURATOR-293
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.9.1
>            Reporter: huanhuan li
>            Priority: Critical
>         Attachments: CuratorConnectionLostEventTest.java
>
>
> 1. Add following lines to the /etc/hosts:
> x.x.x.x zk1.test.com
> x.x.x.x  zk2.test.com
> x.x.x.x  zk3.test.com
> 2. RUN the test programme
> 3. shutdown the network connection to x.x.x.x
> 4. wait until the session expires (for example 10 min)
> 5. remove the added 3 lines in /etc/hosts
> 6. open the network connection to x.x.x.x
> 7. watch that curator cannot reconnect
> 8. add the 3 lines to /etc/hosts
> 9. watch that curator cannot reconnect either
> The log may look like the following:
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.005 [ClientCnxn.logStartConnect] - Opening socket connection to server 172.24.2.35/172.24.2.35:2181. Will not attempt to authenticate using SASL (unknown error)
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.050 [ClientCnxn.primeConnection] - Socket connection established to 172.24.2.35/172.24.2.35:2181, initiating session
> [main-EventThread][WARN ]2016-01-26 11:07:45.093 [ConnectionState.handleExpiredSession] - Session expired event received
> [main-EventThread][DEBUG]2016-01-26 11:07:45.093 [ConnectionState.reset] - reset
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.093 [ClientCnxn.run] - Unable to reconnect to ZooKeeper service, session 0x1525d9593a537af has expired, closing socket connection
> [main-EventThread][INFO ]2016-01-26 11:07:45.095 [ZooKeeper.<init>] - Initiating client connection, connectString=zk1.test.com:2181,zk2.test.com:2181,zk3.test.com:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@7e7d611f
> [main-EventThread][INFO ]2016-01-26 11:07:45.488 [ClientCnxn.run] - EventThread shut down
> [main-SendThread(111.206.227.147:2181)][INFO ]2016-01-26 11:07:45.615 [ClientCnxn.logStartConnect] - Opening socket connection to server 111.206.227.147/111.206.227.147:2181. Will not attempt to authenticate using SASL (unknown error)
> [Curator-ConnectionStateManager-0][DEBUG]2016-01-26 11:07:58.523 [CuratorZookeeperClient.blockUntilConnectedOrTimedOut] - blockUntilConnectedOrTimedOut() end. isConnected: false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)