You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/11/05 10:35:00 UTC
[jira] [Updated] (ZOOKEEPER-3991) QuorumCnxManager Listener port
bind retry does not retry DNS lookup
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ZOOKEEPER-3991:
--------------------------------------
Labels: pull-request-available (was: )
> QuorumCnxManager Listener port bind retry does not retry DNS lookup
> -------------------------------------------------------------------
>
> Key: ZOOKEEPER-3991
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3991
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.6.2
> Reporter: Lander Visterin
> Priority: Minor
> Labels: pull-request-available
> Attachments: RecreateAddress.patch, repro.tar.gz
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We run Zookeeper in a container environment where DNS is not stable. As recommended by the documentation, we set _electionPortBindRetry_ to 0 (keeps retrying forever).
> On some instances, we get the following exception in an infinite loop, even though the address already became resolve-able:
>
> {noformat}
> zk-2_1 | 2020-11-03 10:57:08,407 [myid:3] - ERROR [ListenerHandler-zk-2.test:3888:QuorumCnxManager$Listener$ListenerHandler@1093] - Exception while listening
> zk-2_1 | java.net.SocketException: Unresolved address
> zk-2_1 | at java.base/java.net.ServerSocket.bind(Unknown Source)
> zk-2_1 | at java.base/java.net.ServerSocket.bind(Unknown Source)
> zk-2_1 | at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1140)
> zk-2_1 | at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064)
> zk-2_1 | at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033)
> zk-2_1 | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> zk-2_1 | at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
> zk-2_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> zk-2_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> zk-2_1 | at java.base/java.lang.Thread.run(Unknown Source){noformat}
> Zookeeper does not actually retry the DNS resolution, it just keeps using the old failed result.
>
> This happens because the InetSocketAddress is created once and the DNS lookup happens when it is created.
> This issue has come up previously in https://issues.apache.org/jira/browse/ZOOKEEPER-1506 but it appears to still happen here.
> I have attached a repro.tar.gz to help reproduce this issue. Steps:
> * Untar repro.tar.gz
> * docker-compose up
> * See the exception keeps happening for zk-2, not for the others
> * Open db.test and uncomment the zk-2 line, increment the serial and save
> * Wait a few seconds for the DNS to refresh
> * Verify that you can resolve zk-2.test now (dig @172.16.60.2 zk-2.test) but the error keeps appearing
> I have also attached a patch that resolves this. The patch will retry DNS resolution if the address is still unresolved every time it tries to create the server socket.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)