You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "jugosag (JIRA)" <ji...@apache.org> on 2017/10/25 07:16:00 UTC

[jira] [Commented] (CURATOR-229) No retry on DNS lookup failure

    [ https://issues.apache.org/jira/browse/CURATOR-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218200#comment-16218200 ] 

jugosag commented on CURATOR-229:
---------------------------------

I would like to second the request to fix this problem. While DNS outages might be rare in classical deployment scenarios, they are much more likely in Docker-based environments: Docker containers can be given DNS names, but this name is only resolvable once the containers is actually started. During startup of our stack, often not all Zookeeper containers of our cluster are started yet (because fixing a certain startup order is hard to do and an anti-pattern anyway), but some Zookeeper clients containing curator are already starting up, trying to connect to the ensemble, and failing due to UnknownHostException (which, as was already mentioned below, is not even thrown but a background exception, making it even more convoluted to do one's own retry loop).

So (maybe optionally)= making a DNS lookup error (UnknownHostException) a retryable error (not only during curator startup, but also during failover situations when Curator/ZookeeperClient switches from one Zookeeper instance that failed to another of the zookeeper connect string) would be really helpful here.


> No retry on DNS lookup failure
> ------------------------------
>
>                 Key: CURATOR-229
>                 URL: https://issues.apache.org/jira/browse/CURATOR-229
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.7.0
>            Reporter: Michael Putters
>
> Our environment is setup so that host names (rather than IP addresses) are used when registering services.
> When disconnecting a node from the network, it will attempt to reconnect and - in order to do this - attempts to resolve a host name, which fails (since we have no network connectivity and a DNS server is used).
> It appears this type of exception is not retryable, and the node simply gives up and never reconnects, even when the network connectivity is back.
> Is this the expected behavior? Is there any way to configure Curator so that this type of exception is retryable? I had a look at {{CuratorFrameworkImpl.java}} around line 768 but there doesn't seem to be anything configurable.
> If this is not the expected behavior (or if it is but you don't mind making it configurable), I should be able to provide a patch via a pull request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)