You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Parth Brahmbhatt (JIRA)" <ji...@apache.org> on 2015/07/11 01:47:06 UTC

[jira] [Commented] (KAFKA-2182) zkClient dies if there is any exception while reconnecting

    [ https://issues.apache.org/jira/browse/KAFKA-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623077#comment-14623077 ] 

Parth Brahmbhatt commented on KAFKA-2182:
-----------------------------------------

[~junrao] I think I took care of this as part of KAFKA-2169. We now just system.exit when this exception is caught at least on the borker side. Can we close this jira as Fixed? Or am I missing the intent of this jira?

> zkClient dies if there is any exception while reconnecting
> ----------------------------------------------------------
>
>                 Key: KAFKA-2182
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2182
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Igor Maravić
>            Assignee: Parth Brahmbhatt
>            Priority: Critical
>
> We, Spotify, have just been hit by a BUG that's related to ZkClient. It made a whole Kafka cluster go down.
> Long story short, we've restarted TOR switch and all of our brokers from the cluster lost all the connectivity with the zookeeper cluster, which was living in another rack. Because of that, all the connections to Zookeeper got expired.
> Everything would be fine if we haven't lost hostname to IP Address mapping for some reason. Since we did lost that mapping, we got a UnknownHostException when the broker tried to reconnect. This exception got swallowed up, and we never got reconnected to Zookeeper, which in turn made our cluster of brokers useless.
> If we had "handleSessionEstablishmentError" function, the whole exception could be caught, we could just completely kill KafkaServer process and start it cleanly, since this kind of exception is fatal for the KafkaCluster.
> {code}
> 2015-05-05T12:49:01.709+00:00 127.0.0.1 apache-kafka[main-EventThread] INFO  zookeeper.ZooKeeper  - Initiating client connection, connectString=zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@7303d690
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 apache-kafka[main-EventThread] ERROR zookeeper.ClientCnxn  - Error while calling watcher
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 java.lang.RuntimeException: Exception while restarting zk client
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:462)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.process(ZkClient.java:368)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 Caused by: org.I0Itec.zkclient.exception.ZkException: Unable to connect to zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:66)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:939)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 ... 3 more
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 Caused by: java.net.UnknownHostException: zookeeper1.spotify.net: Name or service not known
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at java.net.InetAddress.getAllByName(InetAddress.java:1162)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at java.net.InetAddress.getAllByName(InetAddress.java:1098)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:64)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 ... 5 more
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 apache-kafka[ZkClient-EventThread-18-zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local] ERROR zkclient.ZkEventThread  - Error handling event ZkEvent[Children of /config/changes changed sent to kafka.server.TopicConfigManager$ConfigChangeListener$@17638f6]
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 java.lang.NullPointerException
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:439)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:436)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:436)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 apache-kafka[main-EventThread] INFO  zookeeper.ClientCnxn  - EventThread shut down
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 apache-kafka[ZkClient-EventThread-18-zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local] ERROR zkclient.ZkEventThread  - Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@18360394]
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 java.lang.NullPointerException
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:439)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:436)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:436)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:544)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)