You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Krzysztof Piecuch (Jira)" <ji...@apache.org> on 2021/03/25 11:05:00 UTC

[jira] [Resolved] (KAFKA-12513) Kafka zookeeper client can't connect when the first zookeeper server is offline

     [ https://issues.apache.org/jira/browse/KAFKA-12513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krzysztof Piecuch resolved KAFKA-12513.
---------------------------------------
    Resolution: Invalid

I've just read the docs, looks like everything is fine on kafka & zookeeper side.

 

sorry for the confusion.

> Kafka zookeeper client can't connect when the first zookeeper server is offline
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-12513
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12513
>             Project: Kafka
>          Issue Type: Bug
>          Components: zkclient
>    Affects Versions: 2.3.1, 2.4.1, 2.7.0
>         Environment: kafka_2.13-2.7.0, kernel 5.4.0-52-generic (Ubuntu), Scala 2.13.3-400
>            Reporter: Krzysztof Piecuch
>            Priority: Critical
>
> Kafka zookeeper client library will not connect to any zookeepers in the "zookeeper string" when the first zookeeper is offline. This causes the cluster to crash hard and in order to get the cluster back into healthy state the first zookeeper node must be resurrected.
> The crash does not always happen immediately after zk0 goes offline, because kafka might have connections established to different zookeeper instances. When the connection gets dropped and kafka needs to reconnect everything crashes hard.
>  
> Demo:
> This works:
> {code:java}
>  root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper zk0.gambit:2181/hex8c,zk1.gambit:2181/hex8c,zk2.gambit:2181/hex8c --describe  --topic duma
> Topic: duma	PartitionCount: 6	ReplicationFactor: 3	Configs: compression.type=uncompressed,retention.bytes=322122547200
> 	Topic: duma	Partition: 0	Leader: 1	Replicas: 1,0,2	Isr: 1,0,2
> 	Topic: duma	Partition: 1	Leader: 2	Replicas: 2,1,0	Isr: 0,1,2
> 	Topic: duma	Partition: 2	Leader: 0	Replicas: 0,2,1	Isr: 0,1,2
> 	Topic: duma	Partition: 3	Leader: 1	Replicas: 1,2,0	Isr: 1,0,2
> 	Topic: duma	Partition: 4	Leader: 2	Replicas: 2,0,1	Isr: 1,0,2
> 	Topic: duma	Partition: 5	Leader: 0	Replicas: 0,1,2	Isr: 0,1,2
> {code}
> Now let's mess with the zookeeper string and see how zookeeper client reacts:
> Changing the last server in the zookeeper string works as expected, {{kafka-topics.sh}} connected to zookeeper but couldn't find the topic (because of bogus zookeeper string):
> {code:java}
> root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper zk0.gambit:2181/hex8c,zk1.gambit:2181/hex8c,1.1.1.1:2181/hex8c --describe --topic duma
> Error while executing topic command : Topic 'duma' does not exist as expected
> [2021-03-20 23:01:45,535] ERROR java.lang.IllegalArgumentException: Topic 'duma' does not exist as expected
> 	at kafka.admin.TopicCommand$.kafka$admin$TopicCommand$$ensureTopicExists(TopicCommand.scala:484)
> 	at kafka.admin.TopicCommand$ZookeeperTopicService.describeTopic(TopicCommand.scala:390)
> 	at kafka.admin.TopicCommand$.main(TopicCommand.scala:67)
> 	at kafka.admin.TopicCommand.main(TopicCommand.scala)
>  (kafka.admin.TopicCommand$) {code}
> However, in case the first server in the zookeeper cluster is unavailable zookeeper client won't connect to any of the zookeepers:
> {code:java}
> root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper 1.1.1.1:2181/hex8c,zk1.gambit:2181/hex8c,zk2.gambit:2181/hex8c --describe --topic duma
> [2021-03-20 23:02:43,888] WARN Client session timed out, have not heard from server in 30012ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
> Exception in thread "main" kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
> 	at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259)
> 	at kafka.zookeeper.ZooKeeperClient$$Lambda$31.000000005D399170.apply$mcV$sp(Unknown Source)
> 	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> 	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> 	at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255)
> 	at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:113)
> 	at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858)
> 	at kafka.admin.TopicCommand$ZookeeperTopicService$.apply(TopicCommand.scala:321)
> 	at kafka.admin.TopicCommand$.main(TopicCommand.scala:54)
> 	at kafka.admin.TopicCommand.main(TopicCommand.scala) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)