You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Scott Gasch <sc...@gasch.org> on 2022/08/15 19:12:15 UTC

Question about multiple addresses and quorum

Hi,

I have a configuration where nodes talk to each other over a VPN link.  But
I do not want that VPN to be a single point of failure.  So I've configured
my ensemble with multiple addresses:

server.1=10.0.0.241:2888:3888|external-a.org:12888:13888
server.2=192.168.0.252:2888:3888|external-b.org:2888:3888
server.3=10.0.0.232:2888:3888|external-a.org:2888:3888

The two addresses per node are its internal IP address (viable when the VPN
is active) and its external IP address (viable anytime, in theory).  My
thought was that, if the VPN drops, the zookeeper ensemble would be able to
"fall back" and use the external addresses.  I've set up my SSL
certificates with alternate names and jailed the zookeeper servers before
opening holes in my firewall to accept traffic at ports 2888 and 3888.

However, when testing this by dropping the VPN link, I run into trouble:
the two nodes on one side degrade into a 2 node ensemble and continue to
serve requests while the 1 node continually tries to connect in and fails.

Looking at the logs, it seems like the nodes *are* using the public IP
addresses to get back in contact but that the code expects to be able to
open both the external and internal address before accepting a new
participant.  Is this the case?

2022-08-15 11:58:36,887 [myid:] - INFO  [ListenerHandler-/10.0.0.232:3888
:o.a.z.s.q.UnifiedServerSocket$UnifiedSocket@266] - Accepted TLS connection
from /<external-b.org>:17715 - TLSv1.2 -
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
2022-08-15 11:58:43,951 [myid:] - WARN  [QuorumConnectionThread-
*[myid=3]-11:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 2 at
election address /192.168.0.252:3888|/<external-b.org
<http://external-b.org>>:3888*java.net.SocketTimeoutException: connect
timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:607)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:293)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Can someone please help me understand:

1. Is this a bad thing to do?  I thought multiple addresses were probably
added for this exact use case but if the code expects to be able to open
both IP addresses I may have misunderstood.

2. Is a better way to handle this to just let the ensemble degrade and,
instead, try to connect to both external and internal addresses from the
client side -- thus enabling clients on the disconnected side to see the
degraded server on the other side?  What's the best practice here?

Thanks!
Scott