You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Yu, Libo " <li...@citi.com> on 2013/08/23 19:28:26 UTC

KeeperErrorCode = BadVersion

Hi team,

During normal operation, all of a sudden, we found many exceptions in the log like this:

It seems one thread' zookeeper's data is written unexpectedly by some other thread.
Any expertise will be appreciated.

[2013-08-23 13:17:00,622] INFO Partition [our.own.topic
one.default,0] on broker 1: Cached zkVersion [4] not equal to that in zo
okeeper, skip updating ISR (kafka.cluster.Partition)
[2013-08-23 13:17:00,622] INFO Partition [our.own.topic.two.default,2]
on broker 1: Shrinking ISR for partition [our.own.topic.two,2] from 1,2,3 to 1 (kafka.cluster.Partition)
[2013-08-23 13:17:00,623] ERROR Conditional update of path /brokers/t
opics/our.own.topic.two/partitions/2/state with d
ata { "controller_epoch":81, "isr":[ 1 ], "leader":1, "leader_epoch":
0, "version":1 } and expected version 10 failed (kafka.utils.ZkUtils$
)
org.I0Itec.zkclient.exception.ZkBadVersionException: org.apache.zooke
eper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/our.own.topic.two/partitions/2/state
        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
        at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809)
        at kafka.utils.ZkUtils$.conditionalUpdatePersistentPath(ZkUtils.scala:330)
        at kafka.cluster.Partition.updateIsr(Partition.scala:347)
        at kafka.cluster.Partition.maybeShrinkIsr(Partition.scala:291)
        at kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:285)
        at kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:285)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:61)
        at kafka.server.ReplicaManager.kafka$server$ReplicaManager$$maybeShrinkIsr(ReplicaManager.scala:285)
        at kafka.server.ReplicaManager$$anonfun$startup$1.apply$mcV$sp(ReplicaManager.scala:108)
        at kafka.utils.Utils$$anon$2.run(Utils.scala:67)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/our.own.topic.two/partitions/2/state
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1044)
        at org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111)
        at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
        ... 18 more

Regards,

Libo


Re: KeeperErrorCode = BadVersion

Posted by Neha Narkhede <ne...@gmail.com>.
I see. Ideally, this error state is recoverable since we re-read the new
value from zookeeper before retrying the conditional write. Can you
describe how to reproduce this issue? Did you try deleting topics manually
that might've somehow led to this issue?

Thanks,
Neha


On Fri, Aug 23, 2013 at 10:51 AM, Yu, Libo <li...@citi.com> wrote:

> Hi Neha,
>
> This is not a logging issue. That broker' log is flooded by this exception.
> The exception first appeared early this morning and kept flooding the log.
> The broker is not in any topic's ISR so it will not be a leader.
>
> Regards,
>
> Libo
>
>
> -----Original Message-----
> From: Neha Narkhede [mailto:neha.narkhede@gmail.com]
> Sent: Friday, August 23, 2013 1:40 PM
> To: users@kafka.apache.org
> Subject: Re: KeeperErrorCode = BadVersion
>
> I think we can certainly improve the logging, but this is normal when the
> leader tries to shrink/expand the ISR when the controller is in the middle
> of shrinking ISR or electing a new leader for the same partition.
>
> Could you please file a JIRA to improve the quality of logging in this
> case?
>
> Thanks,
> Neha
>
>
> On Fri, Aug 23, 2013 at 10:28 AM, Yu, Libo <li...@citi.com> wrote:
>
> > Hi team,
> >
> > During normal operation, all of a sudden, we found many exceptions in
> > the log like this:
> >
> > It seems one thread' zookeeper's data is written unexpectedly by some
> > other thread.
> > Any expertise will be appreciated.
> >
> > [2013-08-23 13:17:00,622] INFO Partition [our.own.topic one.default,0]
> > on broker 1: Cached zkVersion [4] not equal to that in zo okeeper,
> > skip updating ISR (kafka.cluster.Partition)
> > [2013-08-23 13:17:00,622] INFO Partition [our.own.topic.two.default,2]
> > on broker 1: Shrinking ISR for partition [our.own.topic.two,2] from
> > 1,2,3 to 1 (kafka.cluster.Partition)
> > [2013-08-23 13:17:00,623] ERROR Conditional update of path /brokers/t
> > opics/our.own.topic.two/partitions/2/state with d ata {
> > "controller_epoch":81, "isr":[ 1 ], "leader":1, "leader_epoch":
> > 0, "version":1 } and expected version 10 failed (kafka.utils.ZkUtils$
> > )
> > org.I0Itec.zkclient.exception.ZkBadVersionException: org.apache.zooke
> > eper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
> > for /brokers/topics/our.own.topic.two/partitions/2/state
> >         at
> > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
> >         at
> > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> >         at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809)
> >         at
> > kafka.utils.ZkUtils$.conditionalUpdatePersistentPath(ZkUtils.scala:330)
> >         at kafka.cluster.Partition.updateIsr(Partition.scala:347)
> >         at kafka.cluster.Partition.maybeShrinkIsr(Partition.scala:291)
> >         at
> >
> kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:285)
> >         at
> >
> kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:285)
> >         at scala.collection.mutable.HashSet.foreach(HashSet.scala:61)
> >         at
> >
> kafka.server.ReplicaManager.kafka$server$ReplicaManager$$maybeShrinkIsr(ReplicaManager.scala:285)
> >         at
> >
> kafka.server.ReplicaManager$$anonfun$startup$1.apply$mcV$sp(ReplicaManager.scala:108)
> >         at kafka.utils.Utils$$anon$2.run(Utils.scala:67)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> >         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >         at java.lang.Thread.run(Thread.java:722)
> > Caused by: org.apache.zookeeper.KeeperException$BadVersionException:
> > KeeperErrorCode = BadVersion for
> > /brokers/topics/our.own.topic.two/partitions/2/state
> >         at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
> >         at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1044)
> >         at
> > org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111)
> >         at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813)
> >         at
> > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> >         ... 18 more
> >
> > Regards,
> >
> > Libo
> >
> >
>

RE: KeeperErrorCode = BadVersion

Posted by "Yu, Libo " <li...@citi.com>.
Hi Neha,

This is not a logging issue. That broker' log is flooded by this exception.
The exception first appeared early this morning and kept flooding the log. 
The broker is not in any topic's ISR so it will not be a leader.

Regards,

Libo


-----Original Message-----
From: Neha Narkhede [mailto:neha.narkhede@gmail.com] 
Sent: Friday, August 23, 2013 1:40 PM
To: users@kafka.apache.org
Subject: Re: KeeperErrorCode = BadVersion

I think we can certainly improve the logging, but this is normal when the leader tries to shrink/expand the ISR when the controller is in the middle of shrinking ISR or electing a new leader for the same partition.

Could you please file a JIRA to improve the quality of logging in this case?

Thanks,
Neha


On Fri, Aug 23, 2013 at 10:28 AM, Yu, Libo <li...@citi.com> wrote:

> Hi team,
>
> During normal operation, all of a sudden, we found many exceptions in 
> the log like this:
>
> It seems one thread' zookeeper's data is written unexpectedly by some 
> other thread.
> Any expertise will be appreciated.
>
> [2013-08-23 13:17:00,622] INFO Partition [our.own.topic one.default,0] 
> on broker 1: Cached zkVersion [4] not equal to that in zo okeeper, 
> skip updating ISR (kafka.cluster.Partition)
> [2013-08-23 13:17:00,622] INFO Partition [our.own.topic.two.default,2] 
> on broker 1: Shrinking ISR for partition [our.own.topic.two,2] from 
> 1,2,3 to 1 (kafka.cluster.Partition)
> [2013-08-23 13:17:00,623] ERROR Conditional update of path /brokers/t 
> opics/our.own.topic.two/partitions/2/state with d ata { 
> "controller_epoch":81, "isr":[ 1 ], "leader":1, "leader_epoch":
> 0, "version":1 } and expected version 10 failed (kafka.utils.ZkUtils$
> )
> org.I0Itec.zkclient.exception.ZkBadVersionException: org.apache.zooke
> eper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion 
> for /brokers/topics/our.own.topic.two/partitions/2/state
>         at
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>         at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>         at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809)
>         at
> kafka.utils.ZkUtils$.conditionalUpdatePersistentPath(ZkUtils.scala:330)
>         at kafka.cluster.Partition.updateIsr(Partition.scala:347)
>         at kafka.cluster.Partition.maybeShrinkIsr(Partition.scala:291)
>         at
> kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:285)
>         at
> kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:285)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:61)
>         at
> kafka.server.ReplicaManager.kafka$server$ReplicaManager$$maybeShrinkIsr(ReplicaManager.scala:285)
>         at
> kafka.server.ReplicaManager$$anonfun$startup$1.apply$mcV$sp(ReplicaManager.scala:108)
>         at kafka.utils.Utils$$anon$2.run(Utils.scala:67)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /brokers/topics/our.own.topic.two/partitions/2/state
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1044)
>         at
> org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111)
>         at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813)
>         at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>         ... 18 more
>
> Regards,
>
> Libo
>
>

Re: KeeperErrorCode = BadVersion

Posted by Neha Narkhede <ne...@gmail.com>.
I think we can certainly improve the logging, but this is normal when the
leader tries to shrink/expand the ISR when the controller is in the middle
of shrinking ISR or electing a new leader for the same partition.

Could you please file a JIRA to improve the quality of logging in this case?

Thanks,
Neha


On Fri, Aug 23, 2013 at 10:28 AM, Yu, Libo <li...@citi.com> wrote:

> Hi team,
>
> During normal operation, all of a sudden, we found many exceptions in the
> log like this:
>
> It seems one thread' zookeeper's data is written unexpectedly by some
> other thread.
> Any expertise will be appreciated.
>
> [2013-08-23 13:17:00,622] INFO Partition [our.own.topic
> one.default,0] on broker 1: Cached zkVersion [4] not equal to that in zo
> okeeper, skip updating ISR (kafka.cluster.Partition)
> [2013-08-23 13:17:00,622] INFO Partition [our.own.topic.two.default,2]
> on broker 1: Shrinking ISR for partition [our.own.topic.two,2] from 1,2,3
> to 1 (kafka.cluster.Partition)
> [2013-08-23 13:17:00,623] ERROR Conditional update of path /brokers/t
> opics/our.own.topic.two/partitions/2/state with d
> ata { "controller_epoch":81, "isr":[ 1 ], "leader":1, "leader_epoch":
> 0, "version":1 } and expected version 10 failed (kafka.utils.ZkUtils$
> )
> org.I0Itec.zkclient.exception.ZkBadVersionException: org.apache.zooke
> eper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for
> /brokers/topics/our.own.topic.two/partitions/2/state
>         at
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>         at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>         at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809)
>         at
> kafka.utils.ZkUtils$.conditionalUpdatePersistentPath(ZkUtils.scala:330)
>         at kafka.cluster.Partition.updateIsr(Partition.scala:347)
>         at kafka.cluster.Partition.maybeShrinkIsr(Partition.scala:291)
>         at
> kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:285)
>         at
> kafka.server.ReplicaManager$$anonfun$kafka$server$ReplicaManager$$maybeShrinkIsr$2.apply(ReplicaManager.scala:285)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:61)
>         at
> kafka.server.ReplicaManager.kafka$server$ReplicaManager$$maybeShrinkIsr(ReplicaManager.scala:285)
>         at
> kafka.server.ReplicaManager$$anonfun$startup$1.apply$mcV$sp(ReplicaManager.scala:108)
>         at kafka.utils.Utils$$anon$2.run(Utils.scala:67)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /brokers/topics/our.own.topic.two/partitions/2/state
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1044)
>         at
> org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111)
>         at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813)
>         at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>         ... 18 more
>
> Regards,
>
> Libo
>
>