You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2020/01/20 12:58:06 UTC

[GitHub] [zookeeper] symat opened a new pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

symat opened a new pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228
 
 
   When we tested RC 3.6.0, we had a problem of starting ZooKeeper cluster with large
   number (11+) of ensemble members locally on mac. We found exceptions in the logs
   when the new MultiAddress feature tries to filter the unreachable hosts from the
   address list. This involves the calling of the InetAddress.isReachable method with
   a default timeout of 500ms, which goes down to a native call in java and basically
   try to do a ping (an ICMP echo request) to the host. Naturally, the localhost should
   be always reachable.
   
   The problem was that on mac we have the ICMP rate limit set to 250 by default.
   
   In this patch we:
   - changed the reachability check behavior by disabling the check if there is only
   a single address provided (so we wouldn't be able to filter the unreachable
   addresses anyway).
   - added and documented a configuration parameter to disable the reachability check
   for testing. (default: enabled)
   - added and documented a configuration parameter to set the timeout for the
   reachability checks. (default: 1000ms)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] symat commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
symat commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#discussion_r368575308
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1542,6 +1555,22 @@ the variable does.
     ZAB protocol and the Fast Leader Election protocol. Default
     value is **false**.
 
+* *multiAddress.reachabilityCheckEnabled* :
+    (Java system property: **zookeeper.multiAddress.reachabilityCheckEnabled**)
+    **New in 3.6.0:**
+    Since ZooKeeper 3.6.0 you can also [specify multiple addresses](#id_multi_address) 
+    for each ZooKeeper server instance (this can increase availability when multiple physical 
+    network interfaces can be used parallel in the cluster). ZooKeeper will perform ICMP ECHO requests
+    or try to establish a TCP connection on port 7 (Echo) of the destination host in order to find 
+    the reachable addresses. This happens only if you provide multiple addresses in the configuration.
+    The reachable check can fail if you hit some ICMP rate-limitation, (e.g. on MacOS) when you try to 
+    start a large (e.g. 11+) ensemble members cluster on a single machine for testing. 
+    
+    Default value is **true**. By setting this parameter to 'false' you can disable the reachability checks. 
+    Please note, disabling the reachability check will cause the cluster not to be able to reconfigure 
+    itself properly during network problems, so the disabling is advised only during testing. 
 
 Review comment:
   Thanks for checking! :)
   
   The whole purpose of the multi-address feature is to always try to use an address which works. The current implementation is (in case of the leader election) always filters the address list using `InetAddress.isReachable()` calls to find out which is the working server address. This will cause ICMP calls (or TCP connections on port 7 (Echo) of the destination host), depending on the native implementation (see: https://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#isReachable(int) )
   
   So if the `InetAddress.isReachable` can not reach the host, then the current multi-address feature will not able to take the given address as a working one. Basically right now it can not distinguish between the case of a broken network link (when the whole node is unreachable) and the case of a disabled ICMP (when only the ICMP port and the port 7 is disabled in the firewall of the destination host). I am not an expert in cluster / firewall operation, so I can not tell how serious is this limitation.
   
   One way to improve this could be to implement something like the `ruok` 4LW command for the server ports. Some simple request-response messages that only shows that the server is alive and listen on the given election / quorum port. Then we could use that instead of the ICMP calls. I think this would be a reasonable improvement, but maybe more like a separate task, out of the scope of 3.6.0.
   
   What do you think?
   
   (also: do you think I should extend the documentation, or you just wanted to elaborate here in the PR?)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on issue #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
anmolnar commented on issue #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#issuecomment-577666069
 
 
   Merged to master and branch-3.6
   Thanks @symat !

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#discussion_r370094227
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1542,6 +1555,22 @@ the variable does.
     ZAB protocol and the Fast Leader Election protocol. Default
     value is **false**.
 
+* *multiAddress.reachabilityCheckEnabled* :
+    (Java system property: **zookeeper.multiAddress.reachabilityCheckEnabled**)
+    **New in 3.6.0:**
+    Since ZooKeeper 3.6.0 you can also [specify multiple addresses](#id_multi_address) 
+    for each ZooKeeper server instance (this can increase availability when multiple physical 
+    network interfaces can be used parallel in the cluster). ZooKeeper will perform ICMP ECHO requests
+    or try to establish a TCP connection on port 7 (Echo) of the destination host in order to find 
+    the reachable addresses. This happens only if you provide multiple addresses in the configuration.
+    The reachable check can fail if you hit some ICMP rate-limitation, (e.g. on MacOS) when you try to 
+    start a large (e.g. 11+) ensemble members cluster on a single machine for testing. 
+    
+    Default value is **true**. By setting this parameter to 'false' you can disable the reachability checks. 
+    Please note, disabling the reachability check will cause the cluster not to be able to reconfigure 
+    itself properly during network problems, so the disabling is advised only during testing. 
 
 Review comment:
   @symat Agreed. All suggestions would be nice improvements, but we definitely need separate Jira / patch for it. `isReachable()` is suitable for now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] symat commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
symat commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#discussion_r368626386
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1542,6 +1555,22 @@ the variable does.
     ZAB protocol and the Fast Leader Election protocol. Default
     value is **false**.
 
+* *multiAddress.reachabilityCheckEnabled* :
+    (Java system property: **zookeeper.multiAddress.reachabilityCheckEnabled**)
+    **New in 3.6.0:**
+    Since ZooKeeper 3.6.0 you can also [specify multiple addresses](#id_multi_address) 
+    for each ZooKeeper server instance (this can increase availability when multiple physical 
+    network interfaces can be used parallel in the cluster). ZooKeeper will perform ICMP ECHO requests
+    or try to establish a TCP connection on port 7 (Echo) of the destination host in order to find 
+    the reachable addresses. This happens only if you provide multiple addresses in the configuration.
+    The reachable check can fail if you hit some ICMP rate-limitation, (e.g. on MacOS) when you try to 
+    start a large (e.g. 11+) ensemble members cluster on a single machine for testing. 
+    
+    Default value is **true**. By setting this parameter to 'false' you can disable the reachability checks. 
+    Please note, disabling the reachability check will cause the cluster not to be able to reconfigure 
+    itself properly during network problems, so the disabling is advised only during testing. 
 
 Review comment:
   After thinking a bit more:
   
   One other improvement can be to implement something like how the Learner is doing this right now (if I remember correctly, it basically starts to connect to all known Quorum ports in parallel, then keep the connection which is established first). However, it might be more tricky in case of the Leader Election protocol...
   
   An other way would be just to try to establish a connection to the election addresses one-by-one, and go to the next one if the call fails. It would be slower, but we wouldn't rely on `InetAddress.isReachable()`.
   
   However, in both cases, it can be tricky to detect if the current election address become unavailable. This is an other edge case where we use `InetAddress.isReachable()`. (this is why we call the `SendWorker.asyncValidateIfSocketIsStillReachable()`)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#discussion_r368557610
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1542,6 +1555,22 @@ the variable does.
     ZAB protocol and the Fast Leader Election protocol. Default
     value is **false**.
 
+* *multiAddress.reachabilityCheckEnabled* :
+    (Java system property: **zookeeper.multiAddress.reachabilityCheckEnabled**)
+    **New in 3.6.0:**
+    Since ZooKeeper 3.6.0 you can also [specify multiple addresses](#id_multi_address) 
+    for each ZooKeeper server instance (this can increase availability when multiple physical 
+    network interfaces can be used parallel in the cluster). ZooKeeper will perform ICMP ECHO requests
+    or try to establish a TCP connection on port 7 (Echo) of the destination host in order to find 
+    the reachable addresses. This happens only if you provide multiple addresses in the configuration.
+    The reachable check can fail if you hit some ICMP rate-limitation, (e.g. on MacOS) when you try to 
+    start a large (e.g. 11+) ensemble members cluster on a single machine for testing. 
+    
+    Default value is **true**. By setting this parameter to 'false' you can disable the reachability checks. 
+    Please note, disabling the reachability check will cause the cluster not to be able to reconfigure 
+    itself properly during network problems, so the disabling is advised only during testing. 
 
 Review comment:
   Would you please elaborate a little bit on this? Why will the cluster not able to reconfigure if the ping check is disabled?
   On the other hand, does that mean that multi-address feature only works properly if hosts respond to ICMP Echo requests?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] eolivelli commented on issue #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#issuecomment-576886852
 
 
   @anmolnar are you okay with this patch?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] symat commented on issue #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
symat commented on issue #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#issuecomment-577369701
 
 
   FYI: I just rebased to the current branch-3.6, as there was a conflict in the `zookeeperAdmin.md` file (as the recently pushed ZOOKEEPER-3482 changed that too)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#discussion_r368551678
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/MultipleAddresses.java
 ##########
 @@ -150,6 +150,12 @@ public InetSocketAddress getReachableAddress() throws NoRouteToHostException {
      */
     public InetSocketAddress getReachableOrOne() {
         InetSocketAddress address;
+
+        // if there is only a single address provided then we don't do any reachability check
+        if (addresses.size() == 1) {
 
 Review comment:
   This is a very nice improvement.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] asfgit closed pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1228: ZOOKEEPER-3698: fixing NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services