You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Kai Sun (Jira)" <ji...@apache.org> on 2020/12/05 00:15:00 UTC

[jira] [Created] (ZOOKEEPER-4022) ZooKeeper client session establishment deficiency

Kai Sun created ZOOKEEPER-4022:
----------------------------------

             Summary: ZooKeeper client session establishment deficiency
                 Key: ZOOKEEPER-4022
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4022
             Project: ZooKeeper
          Issue Type: Bug
          Components: java client
    Affects Versions: 3.4.14, 3.4.13, 3.4.11
            Reporter: Kai Sun


Here I want to share some deficiency of ZooKeeper client connection deficiency we debugged and met in large scale operation.
 * Dead IP. Let us say one Zookeeper server is dead. The connection string just has one DNS name that can be resolved to N IPs. For >= 3.4.13 ZooKeeper client, HostProvider would size() would be 1 and next() go resolve the single DNS name which contains one bad IP of N IPs. There is 1/N chance to use this dead host and can't establish TCP connection. Next try, you still have 1/N chance to hit the same IP. So on and so forth till application level timeout. For a large number of clients, there are bound to be some application level session establishment failure. Here we probably need make sure second round of try we will exclude the previously tried IP address.
 * TCP connection timeout. If the observer size is very large say M. The TCP connection timeout is set as initial session timeout divided by HostProvider.size(). If you have a hundred observers, this can cause cross data center TCP connection not being able to established. This is especially problem for ZooKeeper version < =3.4.11. As the ZooKeeper (client) would call DNS resolving first and one connection string (DNS name) can be mapped to 100 IP address. 
 * IP address of ZooKeeper server (observers) configuration can't be picked up by client timely: This issue is mostly affecting older version of Zookeeper. As they ZooKeeper (client) object would only resolve DNS name once upon construction. Say after running for a month, IT gradually adding more servers to the meet traffic growth. The newly added ip to the DNS name won't be seen. If IT retired some servers, the client would still try to connect to them and may cause session timeout etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)