You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Mate Szalay-Beko (Jira)" <ji...@apache.org> on 2020/01/28 11:13:00 UTC

[jira] [Updated] (ZOOKEEPER-3705) Filtering unreachable hosts without using ICMP

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mate Szalay-Beko updated ZOOKEEPER-3705:
----------------------------------------
    Fix Version/s: 3.6.1
                   3.7.0

> Filtering unreachable hosts without using ICMP
> ----------------------------------------------
>
>                 Key: ZOOKEEPER-3705
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3705
>             Project: ZooKeeper
>          Issue Type: Improvement
>    Affects Versions: 3.6.0
>            Reporter: Mate Szalay-Beko
>            Assignee: Mate Szalay-Beko
>            Priority: Major
>             Fix For: 3.7.0, 3.6.1
>
>
> This is a follow-up ticket for ZOOKEEPER-3698, what was a quick fix to make the multi-address feature (introduced in ZOOKEEPER-3188) working on mac if ICMP throttling is enabled.
> The whole purpose of the multi-address feature is to always try to use an address which works. The current implementation is (in case of the leader election) always filters the address list using {{InetAddress.isReachable()}} calls to find out which is the working server address. This will cause ICMP calls (or TCP connections on port 7 (Echo) of the destination host), depending on the native implementation (see the [Oracle docs|https://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#isReachable(int)])
> So if the {{InetAddress.isReachable}} can not reach the host, then the current multi-address feature will not able to take the given address as a working one. Basically right now it can not distinguish between the case of a broken network link (when the whole node is unreachable) and the case of a disabled ICMP (when only the ICMP port and the port 7 is disabled in the firewall of the destination host). 
> A few ideas how to handle this better: 
>  * One way to improve this could be to implement something like the {{ruok}} 4LW command for the server ports. Some simple request-response messages that only shows that the server is alive and listen on the given election / quorum port. Then we could use that instead of the ICMP calls.
>  * One other way can be to implement something like how the Learner is doing this right now (if I remember correctly, it basically starts to connect to all known Quorum ports in parallel, then keep the connection which is established first). However, it might be more tricky in case of the Leader Election protocol...
>  * An other way would be just to try to establish a connection to the election addresses one-by-one, and go to the next one if the call fails. It would be slower, but we wouldn't rely on {{InetAddress.isReachable()}}.
> A few challenges we also need to consider:
>  * it can be tricky to detect if the current election address become unavailable. This is an other edge case where we currently use {{InetAddress.isReachable()}}. (this is why we call the {{SendWorker.asyncValidateIfSocketIsStillReachable()}})
>  * we also need to take the backward-compatibility into consideration for the leader election protocol during rolling upgrades
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)