You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Steve Jerman (Jira)" <ji...@apache.org> on 2020/02/20 16:37:00 UTC

[jira] [Commented] (ZOOKEEPER-3723) Zookeeper Client should not fail with ZSYSTEMERROR if DNS does not resolve one of the servers in the zk ensemble.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041126#comment-17041126 ] 

Steve Jerman commented on ZOOKEEPER-3723:
-----------------------------------------

This can be triggered by race condition issues on start up....

> Zookeeper Client should not fail with ZSYSTEMERROR if DNS does not resolve one of the servers in the zk ensemble. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3723
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3723
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: c client, java client
>            Reporter: Suhas Dantkale
>            Priority: Minor
>
> This is a minor enhancement request to not fail the session initiation if the DNS is not able to resolve the hostname of one of the servers in the Zookeeper ensemble.
>  
> The Zookeeper client resolves all the hostnames in the ensemble while establishing the session.
> In Kubernetes environment with coreDNS, the hostname entry gets removed from coreDNS during the POD restarts. Though we can manipulate the coreDNS settings to delay the removal of the hostname entry from DNS, we don't want to leave any race where Zookeeper clinet is trying to establish a session and it fails because the DNS temporarily is not able to resolve the hostname. So as long as one of the servers in the ensemble is able to be DNS resolvable, should we not fail the session establishment with hard error and instead try to establish the connection with one of the other servers?
>  
> Look at the below snippet where  resolve_hosts() fails with ZSYSTEMERROR.
> {code:java}
> if ((rc = getaddrinfo(host, port_spec, &hints, &res0)) != 0) {
>             //bug in getaddrinfo implementation when it returns
>             //EAI_BADFLAGS or EAI_ADDRFAMILY with AF_UNSPEC and
>             // ai_flags as AI_ADDRCONFIG
> #ifdef AI_ADDRCONFIG
>             if ((hints.ai_flags == AI_ADDRCONFIG) &&
> // ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
> #ifdef EAI_ADDRFAMILY
>                 ((rc ==EAI_BADFLAGS) || (rc == EAI_ADDRFAMILY))) {
> #else
>                 (rc == EAI_BADFLAGS)) {
> #endif
>                 //reset ai_flags to null
>                 hints.ai_flags = 0;
>                 //retry getaddrinfo
>                 rc = getaddrinfo(host, port_spec, &hints, &res0);
>             }
> #endif
>             if (rc != 0) {
>                 errno = getaddrinfo_errno(rc);
> #ifdef _WIN32
>                 LOG_ERROR(LOGCALLBACK(zh), "Win32 message: %s\n", gai_strerror(rc));
> #elif __linux__ && __GNUC__
>                 LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", gai_strerror(rc));
> #else
>                 LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", strerror(errno));
> #endif
>                 rc=ZSYSTEMERROR;
>                 goto fail;
>             }
>         }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)