You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/11/05 10:35:00 UTC

[jira] [Updated] (ZOOKEEPER-3991) QuorumCnxManager Listener port bind retry does not retry DNS lookup

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ZOOKEEPER-3991:
--------------------------------------
    Labels: pull-request-available  (was: )

> QuorumCnxManager Listener port bind retry does not retry DNS lookup
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3991
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3991
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.6.2
>            Reporter: Lander Visterin
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: RecreateAddress.patch, repro.tar.gz
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We run Zookeeper in a container environment where DNS is not stable. As recommended by the documentation, we set _electionPortBindRetry_ to 0 (keeps retrying forever).
> On some instances, we get the following exception in an infinite loop, even though the address already became resolve-able:
>  
> {noformat}
> zk-2_1  | 2020-11-03 10:57:08,407 [myid:3] - ERROR [ListenerHandler-zk-2.test:3888:QuorumCnxManager$Listener$ListenerHandler@1093] - Exception while listening
> zk-2_1  | java.net.SocketException: Unresolved address
> zk-2_1  | 	at java.base/java.net.ServerSocket.bind(Unknown Source)
> zk-2_1  | 	at java.base/java.net.ServerSocket.bind(Unknown Source)
> zk-2_1  | 	at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1140)
> zk-2_1  | 	at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064)
> zk-2_1  | 	at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033)
> zk-2_1  | 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> zk-2_1  | 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
> zk-2_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> zk-2_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> zk-2_1  | 	at java.base/java.lang.Thread.run(Unknown Source){noformat}
> Zookeeper does not actually retry the DNS resolution, it just keeps using the old failed result.
>  
> This happens because the InetSocketAddress is created once and the DNS lookup happens when it is created.
> This issue has come up previously in https://issues.apache.org/jira/browse/ZOOKEEPER-1506 but it appears to still happen here.
> I have attached a repro.tar.gz to help reproduce this issue. Steps:
>  * Untar repro.tar.gz
>  * docker-compose up
>  * See the exception keeps happening for zk-2, not for the others
>  * Open db.test and uncomment the zk-2 line, increment the serial and save
>  * Wait a few seconds for the DNS to refresh
>  * Verify that you can resolve zk-2.test now (dig @172.16.60.2 zk-2.test) but the error keeps appearing
> I have also attached a patch that resolves this. The patch will retry DNS resolution if the address is still unresolved every time it tries to create the server socket.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)