You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by noah <ia...@gmail.com> on 2015/10/14 22:47:11 UTC

Strange ZK Error precedes frequent rebalances

A number of our developers are seeing errors like the one below in their
console when running a consumer on their laptop. The error is always
followed by logging indicating that the local consumer is rebalancing, and
in the meantime we are not making much progress.

I'm reading this as the consumer trying to read a ZK node for another
consumer in the same group (running on a different machine,) but the node
is no longer there. I can't tell if that is triggering a rebalance, or if
it's just coincident.

In our dev environment, we have a lot (hundreds) of consumers coming and
going from the same consumer group, but they are mostly subscribed to
different topics. Is this setup (sharing a consumer group across topics)
potentially causing more rebalances than we would otherwise need? Or is
something else entirely going on?

LOG:

INFO  [2015-10-14 20:32:49,138] kafka.consumer.ZookeeperConsumerConnector:
[real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5],
exception during rebalance
! org.I0Itec.zkclient.exception.ZkNoNodeException:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for
/consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af
! at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
~[zkclient-0.3.jar:0.3]
! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
~[zkclient-0.3.jar:0.3]
! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
~[zkclient-0.3.jar:0.3]
! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
~[zkclient-0.3.jar:0.3]
! at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443)
~[kafka_2.10-0.8.2.1.jar:na]
! at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61)
~[kafka_2.10-0.8.2.1.jar:na]
! at
kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:665)
~[kafka_2.10-0.8.2.1.jar:na]
! at
kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:664)
~[kafka_2.10-0.8.2.1.jar:na]
! at scala.collection.Iterator$class.foreach(Iterator.scala:727)
~[scala-library-2.10.4.jar:na]
! at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
~[scala-library-2.10.4.jar:na]
! at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
~[scala-library-2.10.4.jar:na]
! at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
~[scala-library-2.10.4.jar:na]
! at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:664)
~[kafka_2.10-0.8.2.1.jar:na]
! at kafka.consumer.AssignmentContext.<init>(PartitionAssignor.scala:52)
~[kafka_2.10-0.8.2.1.jar:na]
! at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659)
[kafka_2.10-0.8.2.1.jar:na]
! at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608)
~[kafka_2.10-0.8.2.1.jar:na]
! at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
[scala-library-2.10.4.jar:na]
! at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602)
[kafka_2.10-0.8.2.1.jar:na]
! at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
[kafka_2.10-0.8.2.1.jar:na]
! at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
[kafka_2.10-0.8.2.1.jar:na]
! at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
[kafka_2.10-0.8.2.1.jar:na]
! at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598)
[kafka_2.10-0.8.2.1.jar:na]
! at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551)
[kafka_2.10-0.8.2.1.jar:na]
Caused by: ! org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af
! at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
~[zookeeper-3.4.6.jar:3.4.6-1569965]
! at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
~[zookeeper-3.4.6.jar:3.4.6-1569965]
! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
~[zookeeper-3.4.6.jar:3.4.6-1569965]
! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
~[zookeeper-3.4.6.jar:3.4.6-1569965]
! at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
~[zkclient-0.3.jar:0.3]
! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
~[zkclient-0.3.jar:0.3]
! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
~[zkclient-0.3.jar:0.3]
! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
~[zkclient-0.3.jar:0.3]
!... 21 common frames omitted
INFO  [2015-10-14 20:32:49,139] kafka.consumer.ZookeeperConsumerConnector:
[real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5], end
rebalancing consumer
real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5 try #0

Re: Strange ZK Error precedes frequent rebalances

Posted by Gwen Shapira <gw...@confluent.io>.
Yes. The rebalance is on consumers in the group and does not take topics
into account.

On Wed, Oct 14, 2015 at 1:59 PM, noah <ia...@gmail.com> wrote:

> Thanks Gwen.
>
> So am I right in deducing that any consumer in the same group dropping will
> cause a rebalance, regardless of which topics they are subscribed to?
>
> On Wed, Oct 14, 2015 at 3:52 PM Gwen Shapira <gw...@confluent.io> wrote:
>
> > It is not strange, it means that one of the consumers lost connectivity
> to
> > Zookeeper, its session timed-out and this caused ephemeral ZK nodes (like
> > /consumers/real-time-updates/ids/real-time-updates_infra-
> > buildagent-06-1444854764478-4dd4d6af) to be removed and ultimately cause
> > the rebalance.
> >
> > What you need is to make sure your consumers don't lose connectivity to
> > Zookeeper or that sessions don't time out. You do this by:
> > 1. Tuning garbage collection on the consumer apps (G1 is recommended) to
> > avoid long GC pauses - leading cause for timeouts
> > 2. Increasing Zookeeper session timeout on the consumer
> >
> > Gwen
> >
> > On Wed, Oct 14, 2015 at 1:47 PM, noah <ia...@gmail.com> wrote:
> >
> > > A number of our developers are seeing errors like the one below in
> their
> > > console when running a consumer on their laptop. The error is always
> > > followed by logging indicating that the local consumer is rebalancing,
> > and
> > > in the meantime we are not making much progress.
> > >
> > > I'm reading this as the consumer trying to read a ZK node for another
> > > consumer in the same group (running on a different machine,) but the
> node
> > > is no longer there. I can't tell if that is triggering a rebalance, or
> if
> > > it's just coincident.
> > >
> > > In our dev environment, we have a lot (hundreds) of consumers coming
> and
> > > going from the same consumer group, but they are mostly subscribed to
> > > different topics. Is this setup (sharing a consumer group across
> topics)
> > > potentially causing more rebalances than we would otherwise need? Or is
> > > something else entirely going on?
> > >
> > > LOG:
> > >
> > > INFO  [2015-10-14 20:32:49,138]
> > kafka.consumer.ZookeeperConsumerConnector:
> > > [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5],
> > > exception during rebalance
> > > ! org.I0Itec.zkclient.exception.ZkNoNodeException:
> > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> > > NoNode for
> > >
> > >
> >
> /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af
> > > ! at
> > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> > > ~[zkclient-0.3.jar:0.3]
> > > ! at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> > > ~[zkclient-0.3.jar:0.3]
> > > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> > > ~[zkclient-0.3.jar:0.3]
> > > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> > > ~[zkclient-0.3.jar:0.3]
> > > ! at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443)
> > > ~[kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61)
> > > ~[kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:665)
> > > ~[kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:664)
> > > ~[kafka_2.10-0.8.2.1.jar:na]
> > > ! at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> > > ~[scala-library-2.10.4.jar:na]
> > > ! at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> > > ~[scala-library-2.10.4.jar:na]
> > > ! at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> > > ~[scala-library-2.10.4.jar:na]
> > > ! at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> > > ~[scala-library-2.10.4.jar:na]
> > > ! at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:664)
> > > ~[kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> kafka.consumer.AssignmentContext.<init>(PartitionAssignor.scala:52)
> > > ~[kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659)
> > > [kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608)
> > > ~[kafka_2.10-0.8.2.1.jar:na]
> > > ! at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> > > [scala-library-2.10.4.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602)
> > > [kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
> > > [kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
> > > [kafka_2.10-0.8.2.1.jar:na]
> > > ! at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> > > [kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598)
> > > [kafka_2.10-0.8.2.1.jar:na]
> > > ! at
> > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551)
> > > [kafka_2.10-0.8.2.1.jar:na]
> > > Caused by: ! org.apache.zookeeper.KeeperException$NoNodeException:
> > > KeeperErrorCode = NoNode for
> > >
> > >
> >
> /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af
> > > ! at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > > ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> > > ! at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> > > ! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> > > ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> > > ! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
> > > ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> > > ! at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> > > ~[zkclient-0.3.jar:0.3]
> > > ! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> > > ~[zkclient-0.3.jar:0.3]
> > > ! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> > > ~[zkclient-0.3.jar:0.3]
> > > ! at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> > > ~[zkclient-0.3.jar:0.3]
> > > !... 21 common frames omitted
> > > INFO  [2015-10-14 20:32:49,139]
> > kafka.consumer.ZookeeperConsumerConnector:
> > > [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5], end
> > > rebalancing consumer
> > > real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5 try #0
> > >
> >
>

Re: Strange ZK Error precedes frequent rebalances

Posted by noah <ia...@gmail.com>.
Thanks Gwen.

So am I right in deducing that any consumer in the same group dropping will
cause a rebalance, regardless of which topics they are subscribed to?

On Wed, Oct 14, 2015 at 3:52 PM Gwen Shapira <gw...@confluent.io> wrote:

> It is not strange, it means that one of the consumers lost connectivity to
> Zookeeper, its session timed-out and this caused ephemeral ZK nodes (like
> /consumers/real-time-updates/ids/real-time-updates_infra-
> buildagent-06-1444854764478-4dd4d6af) to be removed and ultimately cause
> the rebalance.
>
> What you need is to make sure your consumers don't lose connectivity to
> Zookeeper or that sessions don't time out. You do this by:
> 1. Tuning garbage collection on the consumer apps (G1 is recommended) to
> avoid long GC pauses - leading cause for timeouts
> 2. Increasing Zookeeper session timeout on the consumer
>
> Gwen
>
> On Wed, Oct 14, 2015 at 1:47 PM, noah <ia...@gmail.com> wrote:
>
> > A number of our developers are seeing errors like the one below in their
> > console when running a consumer on their laptop. The error is always
> > followed by logging indicating that the local consumer is rebalancing,
> and
> > in the meantime we are not making much progress.
> >
> > I'm reading this as the consumer trying to read a ZK node for another
> > consumer in the same group (running on a different machine,) but the node
> > is no longer there. I can't tell if that is triggering a rebalance, or if
> > it's just coincident.
> >
> > In our dev environment, we have a lot (hundreds) of consumers coming and
> > going from the same consumer group, but they are mostly subscribed to
> > different topics. Is this setup (sharing a consumer group across topics)
> > potentially causing more rebalances than we would otherwise need? Or is
> > something else entirely going on?
> >
> > LOG:
> >
> > INFO  [2015-10-14 20:32:49,138]
> kafka.consumer.ZookeeperConsumerConnector:
> > [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5],
> > exception during rebalance
> > ! org.I0Itec.zkclient.exception.ZkNoNodeException:
> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> > NoNode for
> >
> >
> /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af
> > ! at
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> > ~[zkclient-0.3.jar:0.3]
> > ! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> > ~[zkclient-0.3.jar:0.3]
> > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> > ~[zkclient-0.3.jar:0.3]
> > ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> > ~[zkclient-0.3.jar:0.3]
> > ! at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443)
> > ~[kafka_2.10-0.8.2.1.jar:na]
> > ! at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61)
> > ~[kafka_2.10-0.8.2.1.jar:na]
> > ! at
> >
> >
> kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:665)
> > ~[kafka_2.10-0.8.2.1.jar:na]
> > ! at
> >
> >
> kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:664)
> > ~[kafka_2.10-0.8.2.1.jar:na]
> > ! at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> > ~[scala-library-2.10.4.jar:na]
> > ! at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> > ~[scala-library-2.10.4.jar:na]
> > ! at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> > ~[scala-library-2.10.4.jar:na]
> > ! at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> > ~[scala-library-2.10.4.jar:na]
> > ! at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:664)
> > ~[kafka_2.10-0.8.2.1.jar:na]
> > ! at kafka.consumer.AssignmentContext.<init>(PartitionAssignor.scala:52)
> > ~[kafka_2.10-0.8.2.1.jar:na]
> > ! at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659)
> > [kafka_2.10-0.8.2.1.jar:na]
> > ! at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608)
> > ~[kafka_2.10-0.8.2.1.jar:na]
> > ! at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> > [scala-library-2.10.4.jar:na]
> > ! at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602)
> > [kafka_2.10-0.8.2.1.jar:na]
> > ! at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
> > [kafka_2.10-0.8.2.1.jar:na]
> > ! at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
> > [kafka_2.10-0.8.2.1.jar:na]
> > ! at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> > [kafka_2.10-0.8.2.1.jar:na]
> > ! at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598)
> > [kafka_2.10-0.8.2.1.jar:na]
> > ! at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551)
> > [kafka_2.10-0.8.2.1.jar:na]
> > Caused by: ! org.apache.zookeeper.KeeperException$NoNodeException:
> > KeeperErrorCode = NoNode for
> >
> >
> /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af
> > ! at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> > ! at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> > ! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> > ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> > ! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
> > ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> > ! at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> > ~[zkclient-0.3.jar:0.3]
> > ! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> > ~[zkclient-0.3.jar:0.3]
> > ! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> > ~[zkclient-0.3.jar:0.3]
> > ! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> > ~[zkclient-0.3.jar:0.3]
> > !... 21 common frames omitted
> > INFO  [2015-10-14 20:32:49,139]
> kafka.consumer.ZookeeperConsumerConnector:
> > [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5], end
> > rebalancing consumer
> > real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5 try #0
> >
>

Re: Strange ZK Error precedes frequent rebalances

Posted by Gwen Shapira <gw...@confluent.io>.
It is not strange, it means that one of the consumers lost connectivity to
Zookeeper, its session timed-out and this caused ephemeral ZK nodes (like
/consumers/real-time-updates/ids/real-time-updates_infra-
buildagent-06-1444854764478-4dd4d6af) to be removed and ultimately cause
the rebalance.

What you need is to make sure your consumers don't lose connectivity to
Zookeeper or that sessions don't time out. You do this by:
1. Tuning garbage collection on the consumer apps (G1 is recommended) to
avoid long GC pauses - leading cause for timeouts
2. Increasing Zookeeper session timeout on the consumer

Gwen

On Wed, Oct 14, 2015 at 1:47 PM, noah <ia...@gmail.com> wrote:

> A number of our developers are seeing errors like the one below in their
> console when running a consumer on their laptop. The error is always
> followed by logging indicating that the local consumer is rebalancing, and
> in the meantime we are not making much progress.
>
> I'm reading this as the consumer trying to read a ZK node for another
> consumer in the same group (running on a different machine,) but the node
> is no longer there. I can't tell if that is triggering a rebalance, or if
> it's just coincident.
>
> In our dev environment, we have a lot (hundreds) of consumers coming and
> going from the same consumer group, but they are mostly subscribed to
> different topics. Is this setup (sharing a consumer group across topics)
> potentially causing more rebalances than we would otherwise need? Or is
> something else entirely going on?
>
> LOG:
>
> INFO  [2015-10-14 20:32:49,138] kafka.consumer.ZookeeperConsumerConnector:
> [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5],
> exception during rebalance
> ! org.I0Itec.zkclient.exception.ZkNoNodeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for
>
> /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af
> ! at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> ~[zkclient-0.3.jar:0.3]
> ! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> ~[zkclient-0.3.jar:0.3]
> ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> ~[zkclient-0.3.jar:0.3]
> ! at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> ~[zkclient-0.3.jar:0.3]
> ! at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443)
> ~[kafka_2.10-0.8.2.1.jar:na]
> ! at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61)
> ~[kafka_2.10-0.8.2.1.jar:na]
> ! at
>
> kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:665)
> ~[kafka_2.10-0.8.2.1.jar:na]
> ! at
>
> kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:664)
> ~[kafka_2.10-0.8.2.1.jar:na]
> ! at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> ~[scala-library-2.10.4.jar:na]
> ! at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> ~[scala-library-2.10.4.jar:na]
> ! at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> ~[scala-library-2.10.4.jar:na]
> ! at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> ~[scala-library-2.10.4.jar:na]
> ! at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:664)
> ~[kafka_2.10-0.8.2.1.jar:na]
> ! at kafka.consumer.AssignmentContext.<init>(PartitionAssignor.scala:52)
> ~[kafka_2.10-0.8.2.1.jar:na]
> ! at
>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659)
> [kafka_2.10-0.8.2.1.jar:na]
> ! at
>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608)
> ~[kafka_2.10-0.8.2.1.jar:na]
> ! at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> [scala-library-2.10.4.jar:na]
> ! at
>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602)
> [kafka_2.10-0.8.2.1.jar:na]
> ! at
>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
> [kafka_2.10-0.8.2.1.jar:na]
> ! at
>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
> [kafka_2.10-0.8.2.1.jar:na]
> ! at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> [kafka_2.10-0.8.2.1.jar:na]
> ! at
>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598)
> [kafka_2.10-0.8.2.1.jar:na]
> ! at
>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551)
> [kafka_2.10-0.8.2.1.jar:na]
> Caused by: ! org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
>
> /consumers/real-time-updates/ids/real-time-updates_infra-buildagent-06-1444854764478-4dd4d6af
> ! at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> ! at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> ! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> ! at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> ! at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> ~[zkclient-0.3.jar:0.3]
> ! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> ~[zkclient-0.3.jar:0.3]
> ! at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> ~[zkclient-0.3.jar:0.3]
> ! at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> ~[zkclient-0.3.jar:0.3]
> !... 21 common frames omitted
> INFO  [2015-10-14 20:32:49,139] kafka.consumer.ZookeeperConsumerConnector:
> [real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5], end
> rebalancing consumer
> real-time-updates_Noahs-MacBook-Pro.local-1444853969114-7b52ecb5 try #0
>