You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Wang XiaoTian (JIRA)" <ji...@apache.org> on 2016/02/01 08:17:39 UTC
[jira] [Issue Comment Deleted] (CURATOR-293) Curator can NOT
reconnect after connection lost and session expired when the connection
come up while the DNS server is not ready yet.(zookeeper connection string
using domain names)
[ https://issues.apache.org/jira/browse/CURATOR-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wang XiaoTian updated CURATOR-293:
----------------------------------
Comment: was deleted
(was: We can solve the issue by calling the API"client.getZookeeperClient().getZooKeeper()" periodically when receiving the "ConnectionState.LOST" event and using a handler thread pool to process the arriving state events concurrently, so that the event will not blocked, obviously the client.getZookeeperClient().getZooKeeper() is a thread-safe API.
Actually the framework can do the same thing for the sake of fault-tolerant feature and do not enforce the user to handle it, just catch the exception and handle it appropriately instead of putting it in a background exception queue and ignore it, by the way, I don't think the "client.getZookeeperClient().getZooKeeper()" is a public friendly API to the user.
Another issue is about the StaticHostProvider.java, it is implemented by InetAddress.java, and there is an addressCache in the InetAddress.java, see "https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/sun/net/InetAddressCachePolicy.java", the addressCache will cache the resolved hostname and when a given unresolved hostname be passed, the InetAddress try to resolve the hostname by querying the address cache at first time, I don't know why the last resolved hostname be lost in the cache. (perhaps for the reason of the cache policy)
)
> Curator can NOT reconnect after connection lost and session expired when the connection come up while the DNS server is not ready yet.(zookeeper connection string using domain names)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CURATOR-293
> URL: https://issues.apache.org/jira/browse/CURATOR-293
> Project: Apache Curator
> Issue Type: Bug
> Components: Client
> Affects Versions: 2.9.1
> Reporter: huanhuan li
> Priority: Critical
> Attachments: CuratorConnectionLostEventTest.java
>
>
> 1. Add following lines to the /etc/hosts:
> x.x.x.x zk1.test.com
> x.x.x.x zk2.test.com
> x.x.x.x zk3.test.com
> 2. RUN the test programme
> 3. shutdown the network connection to x.x.x.x
> 4. wait until the session expires (for example 10 min)
> 5. remove the added 3 lines in /etc/hosts
> 6. open the network connection to x.x.x.x
> 7. watch that curator cannot reconnect
> 8. add the 3 lines to /etc/hosts
> 9. watch that curator cannot reconnect either
> The log may look like the following:
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.005 [ClientCnxn.logStartConnect] - Opening socket connection to server 172.24.2.35/172.24.2.35:2181. Will not attempt to authenticate using SASL (unknown error)
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.050 [ClientCnxn.primeConnection] - Socket connection established to 172.24.2.35/172.24.2.35:2181, initiating session
> [main-EventThread][WARN ]2016-01-26 11:07:45.093 [ConnectionState.handleExpiredSession] - Session expired event received
> [main-EventThread][DEBUG]2016-01-26 11:07:45.093 [ConnectionState.reset] - reset
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.093 [ClientCnxn.run] - Unable to reconnect to ZooKeeper service, session 0x1525d9593a537af has expired, closing socket connection
> [main-EventThread][INFO ]2016-01-26 11:07:45.095 [ZooKeeper.<init>] - Initiating client connection, connectString=zk1.test.com:2181,zk2.test.com:2181,zk3.test.com:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@7e7d611f
> [main-EventThread][INFO ]2016-01-26 11:07:45.488 [ClientCnxn.run] - EventThread shut down
> [main-SendThread(111.206.227.147:2181)][INFO ]2016-01-26 11:07:45.615 [ClientCnxn.logStartConnect] - Opening socket connection to server 111.206.227.147/111.206.227.147:2181. Will not attempt to authenticate using SASL (unknown error)
> [Curator-ConnectionStateManager-0][DEBUG]2016-01-26 11:07:58.523 [CuratorZookeeperClient.blockUntilConnectedOrTimedOut] - blockUntilConnectedOrTimedOut() end. isConnected: false
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)