You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2015/10/22 06:34:27 UTC
[jira] [Created] (MESOS-3790) Zk connection should retry on
EAI_NONAME
Neil Conway created MESOS-3790:
----------------------------------
Summary: Zk connection should retry on EAI_NONAME
Key: MESOS-3790
URL: https://issues.apache.org/jira/browse/MESOS-3790
Project: Mesos
Issue Type: Bug
Reporter: Neil Conway
Assignee: Neil Conway
Priority: Minor
The zookeeper interface is designed to retry (once per second for up to ten minutes) if one or more of the Zookeeper hostnames can't be resolved (see [MESOS-1326] and [MESOS-1523]).
However, the current implementation assumes that a DNS resolution failure is indicated by zookeeper_init() returning NULL and errno being set to EINVAL (Zk translates getaddrinfo() failures into errno values). However, the current Zk code does:
{code}
static int getaddrinfo_errno(int rc) {
switch(rc) {
case EAI_NONAME:
// ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
#if defined EAI_NODATA && EAI_NODATA != EAI_NONAME
case EAI_NODATA:
#endif
return ENOENT;
case EAI_MEMORY:
return ENOMEM;
default:
return EINVAL;
}
}
{code}
getaddrinfo() returns EAI_NONAME when "the node or service is not known"; per discussion in [MESOS-2186], this seems to happen intermittently due to DNS failures.
Proposed fix: looking at errno is always going to be somewhat fragile, but if we're going to continue doing that, we should check for ENOENT as well as EINVAL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)