You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Manikumar (JIRA)" <ji...@apache.org> on 2018/07/21 09:13:00 UTC

[jira] [Commented] (KAFKA-7165) Error while creating ephemeral at /brokers/ids/BROKER_ID

    [ https://issues.apache.org/jira/browse/KAFKA-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551591#comment-16551591 ] 

Manikumar commented on KAFKA-7165:
----------------------------------

Looks like related to *KAFKA-6584*

> Error while creating ephemeral at /brokers/ids/BROKER_ID
> --------------------------------------------------------
>
>                 Key: KAFKA-7165
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7165
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, zkclient
>    Affects Versions: 1.1.0
>            Reporter: Jonathan Santilli
>            Priority: Major
>
> Kafka version: 1.1.0
> Zookeeper version: 3.4.12
> 4 Kafka Brokers
> 4 Zookeeper servers
>  
> In one of the 4 brokers of the cluster, we detect the following error:
> [2018-07-14 04:38:23,784] INFO Unable to read additional data from server sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,509] INFO Opening socket connection to server *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,510] INFO Socket connection established to *ZOOKEEPER_SERVER_1:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,513] INFO Unable to read additional data from server sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,287] INFO Opening socket connection to server *ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,287] INFO Socket connection established to *ZOOKEEPER_SERVER_2:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,954] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* broker=1] Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition)
>  [2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, session 0x3000c2420cb458d has expired, closing socket connection (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,445] INFO EventThread shut down for session: 0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. (kafka.zookeeper.ZooKeeperClient)
>  [2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session to *ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*. (kafka.zookeeper.ZooKeeperClient)
>  [2018-07-14 04:38:26,459] INFO Initiating client connection, connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT* sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 (org.apache.zookeeper.ZooKeeper)
>  [2018-07-14 04:38:26,465] INFO Opening socket connection to server *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,477] INFO Socket connection established to *ZOOKEEPER_SERVER_1:PORT*, initiating session (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,484] INFO Session establishment complete on server *ZOOKEEPER_SERVER_1:PORT*, sessionid = 0x4005b59eb6a0000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,496] *INFO Creating /brokers/ids/1* (is it secure? false) (kafka.zk.KafkaZkClient)
>  [2018-07-14 04:38:26,500] INFO Processing notification(s) to /config/changes (kafka.common.ZkNodeChangeNotificationListener)
>  *[2018-07-14 04:38:26,547] ERROR Error while creating ephemeral at /brokers/ids/1, node already exists and owner '216186131422332301' does not match current session '288330817911521280' (kafka.zk.KafkaZkClient$CheckedEphemeral)*
>  [2018-07-14 04:38:26,547] *INFO Result of znode creation at /brokers/ids/1 is: NODEEXISTS* (kafka.zk.KafkaZkClient)
>  [2018-07-14 04:38:26,559] ERROR Uncaught exception in scheduled task 'isr-expiration' (kafka.utils.KafkaScheduler)
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /brokers/topics/*TOPIC_NAME*/partitions/*PARTITION-#*/state
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  at kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:465)
>  at kafka.zk.KafkaZkClient.conditionalUpdatePath(KafkaZkClient.scala:621)
>  at kafka.utils.ReplicationUtils$.updateLeaderAndIsr(ReplicationUtils.scala:33)
>  at kafka.cluster.Partition.kafka$cluster$Partition$$updateIsr(Partition.scala:669)
>  at kafka.cluster.Partition$$anonfun$4.apply$mcZ$sp(Partition.scala:513)
>  at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:504)
>  at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:504)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
>  at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:258)
>  at kafka.cluster.Partition.maybeShrinkIsr(Partition.scala:503)
>  at kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:1335)
>  at kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:1335)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>  at kafka.server.ReplicaManager.kafka$server$ReplicaManager$$maybeShrinkIsr(ReplicaManager.scala:1335)
>  at kafka.server.ReplicaManager$$anonfun$2.apply$mcV$sp(ReplicaManager.scala:322)
>  at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> [2018-07-14 04:38:45,938] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* broker=1] Shrinking ISR from 1,3,4,2 to 1 (kafka.cluster.Partition)
>  [2018-07-14 04:38:45,952] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* broker=1] Cached zkVersion [0] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
>  [2018-07-14 04:38:45,952] INFO [Partition __consumer_offsets-# broker=1|#* broker=1] Shrinking ISR from 1,2,3,4 to 1 (kafka.cluster.Partition)
>  [2018-07-14 04:38:45,992] INFO [Partition __consumer_offsets-# broker=1|#* broker=1] Cached zkVersion [139] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> ...
> ...
>  
> The previous exception continues showing constantly without stopping. Since the cluster was still up, I have left the error showing up for about 6 hours to check if the broker recovers itself.
> The only solution, to put back the broker into the cluster, was restarting.
> After the restart, the replicas sync each other and the broker was able to serve again.
> I imagine this is not a normal behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)