You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "David Glasser (JIRA)" <ji...@apache.org> on 2018/05/01 20:54:00 UTC

[jira] [Created] (KAFKA-6843) Document issue with DNS TTL

David Glasser created KAFKA-6843:
------------------------------------

             Summary: Document issue with DNS TTL
                 Key: KAFKA-6843
                 URL: https://issues.apache.org/jira/browse/KAFKA-6843
             Project: Kafka
          Issue Type: Bug
            Reporter: David Glasser


We run Kafka and Zookeeper in Google Kubernetes Engine. We have recently had problems where our brokers had serious problems when GKE replaced our cluster (cycling both Zookeeper and Kafka in parallel).  Kafka (1.0) brokers lost the ability the talk to Zookeeper, and eventually failed their controlled shutdown, leading to slow startup times for the new broker and outages for our system.

We eventually tracked this down to the fact that (at least in our environment) the default JVM DNS caching behavior is to cache results forever.  We rely on DNS to connect to Zookeeper, and the DNS resolution changes when the Zookeeper pods are replaced.

The fix is straightforward: setting the property networkaddress.cache.ttl or sun.net.inetaddr.ttl to make the caching non-infinite (or use a "security manager"). See [https://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html] for details.

I think this gotcha should be documented. Probably at [https://kafka.apache.org/11/documentation/#java] ? I'm happy to submit a PR if people agree this is the right place.  (I suppose somehow fixing this in code would be nice too.)

By the way, if you search the Apache issue tracker for [networkaddress.cache.ttl|https://issues.apache.org/jira/browse/JAMES-774?jql=text%20~%20%22%5C%22networkaddress.cache.ttl%5C%22%22], you'll learn that this is a common issue faced by many Apache Java projects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)