You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Peter Thygesen <pt...@gmail.com> on 2012/03/20 10:42:11 UTC

Shutdown/Ctrl-C and ConsumerRebalanceFailedException

When I shutdown my consumer with crtl-c and tries to restart it quickly
afterwards, I usually get ConsumerRebalanceFailedException (see below). The
application then seems to hang.. or at least I'm sure if it is running any
more.. If this exception is thrown, will the consumer the intelligently
wait for the rebalancing to complete? and then resume consumption?

I found a page https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
describes something about Consumer Co-ordinator.. according to this
the consumer
group remains in this state until the next rebalancing attempt is
triggered. But when is it triggered?

Could a shutdown hock with a consumer.commitOffsets help?
Does the consumer.shutdown implicit commitOffsets?


Exception in thread "main" kafka.common.ConsumerRebalanceFailedException:
contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f can't
rebalance after 4 retries
        at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
        at
kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
        at
kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
        at
kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
        at
com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
        at
com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)


Brgds,
Peter Thygesen

BTW: Great work, very interesting project.

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Chris Burroughs <ch...@gmail.com>.

Makes sense.  Should we have that expanded in documentation somewhere?
ConsumerConnector.scala just has "Shut down the connector" for shutdown().

On 03/25/2012 11:47 PM, Jun Rao wrote:
> Chris,
> 
> When ConsumerConnector.close is called, we close ZK connection, which
> should cause all ephemeral nodes to be deleted. Of course, the consumer app
> itself needs to add a shutdown hook that calls ConsumerConnector.close when
> the app is cleanly killed.
> 
> Thanks,
> 
> Jun
> 
> On Sun, Mar 25, 2012 at 7:10 PM, Chris Burroughs
> <ch...@gmail.com>wrote:
> 
>> On 03/25/2012 10:08 PM, Neha Narkhede wrote:
>>> If the consumer is shutdown, cleanly or not, zookeeper deletes the
>>> ephemeral nodes from its database.
>>> (unless you are using zk 3.3.3)
>>
>> Right but only after the timeout. I'm suggesting (and sorry if I'm
>> confusing by thinking we already did this) that we explicitly delete on
>> on clean shutdown so we don't force a n second service disruption on
>> every restart.
>>
>

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Chris Burroughs <ch...@gmail.com>.

On 03/25/2012 10:08 PM, Neha Narkhede wrote:
> If the consumer is shutdown, cleanly or not, zookeeper deletes the
> ephemeral nodes from its database.
> (unless you are using zk 3.3.3)

Right but only after the timeout. I'm suggesting (and sorry if I'm
confusing by thinking we already did this) that we explicitly delete on
on clean shutdown so we don't force a n second service disruption on
every restart.

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Neha Narkhede <ne...@gmail.com>.

Peter,

If you are using zk namespaces on zk 3.3.3, you will most certainly
hit the ephemeral node issue.

Chris,

If the consumer is shutdown, cleanly or not, zookeeper deletes the
ephemeral nodes from its database.
(unless you are using zk 3.3.3)

Thanks,
Neha

On Sun, Mar 25, 2012 at 7:01 PM, Chris Burroughs
<ch...@gmail.com> wrote:
> If the consumer is cleanly shutdown (Control-C is kill, not kill -9,
> right?), shoudn't we remove the ephemeral ZNodes in our shutdown
> handler?  I thought this was already the case, at least for the broker.
>
> On 03/20/2012 12:21 PM, Jun Rao wrote:
>> Peter,
>>
>> When you kill your consumer, it takes sometime (by default 6 secs) for ZK
>> server to release the ephemeral nodes hold by the consumer. If you restart
>> your consumer quickly, the consumer may not be able to acquire the
>> necessary zk nodes and therefore can fail to rebalance. One way is to
>> implement a shutdown hook in your consumer application and make sure
>> consumer connector is shut down when the consumer is killed.
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt...@gmail.com>wrote:
>>
>>> When I shutdown my consumer with crtl-c and tries to restart it quickly
>>> afterwards, I usually get ConsumerRebalanceFailedException (see below). The
>>> application then seems to hang.. or at least I'm sure if it is running any
>>> more.. If this exception is thrown, will the consumer the intelligently
>>> wait for the rebalancing to complete? and then resume consumption?
>>>
>>> I found a page
>>> https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
>>> describes something about Consumer Co-ordinator.. according to this
>>> the consumer
>>> group remains in this state until the next rebalancing attempt is
>>> triggered. But when is it triggered?
>>>
>>> Could a shutdown hock with a consumer.commitOffsets help?
>>> Does the consumer.shutdown implicit commitOffsets?
>>>
>>>
>>> Exception in thread "main" kafka.common.ConsumerRebalanceFailedException:
>>> contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f can't
>>> rebalance after 4 retries
>>>        at
>>>
>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
>>>        at
>>>
>>> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
>>>        at
>>>
>>> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
>>>        at
>>>
>>> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
>>>        at
>>>
>>> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
>>>        at
>>>
>>> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
>>>
>>>
>>> Brgds,
>>> Peter Thygesen
>>>
>>> BTW: Great work, very interesting project.
>>>
>>
>

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Chris Burroughs <ch...@gmail.com>.

If the consumer is cleanly shutdown (Control-C is kill, not kill -9,
right?), shoudn't we remove the ephemeral ZNodes in our shutdown
handler?  I thought this was already the case, at least for the broker.

On 03/20/2012 12:21 PM, Jun Rao wrote:
> Peter,
> 
> When you kill your consumer, it takes sometime (by default 6 secs) for ZK
> server to release the ephemeral nodes hold by the consumer. If you restart
> your consumer quickly, the consumer may not be able to acquire the
> necessary zk nodes and therefore can fail to rebalance. One way is to
> implement a shutdown hook in your consumer application and make sure
> consumer connector is shut down when the consumer is killed.
> 
> Thanks,
> 
> Jun
> 
> On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt...@gmail.com>wrote:
> 
>> When I shutdown my consumer with crtl-c and tries to restart it quickly
>> afterwards, I usually get ConsumerRebalanceFailedException (see below). The
>> application then seems to hang.. or at least I'm sure if it is running any
>> more.. If this exception is thrown, will the consumer the intelligently
>> wait for the rebalancing to complete? and then resume consumption?
>>
>> I found a page
>> https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
>> describes something about Consumer Co-ordinator.. according to this
>> the consumer
>> group remains in this state until the next rebalancing attempt is
>> triggered. But when is it triggered?
>>
>> Could a shutdown hock with a consumer.commitOffsets help?
>> Does the consumer.shutdown implicit commitOffsets?
>>
>>
>> Exception in thread "main" kafka.common.ConsumerRebalanceFailedException:
>> contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f can't
>> rebalance after 4 retries
>>        at
>>
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
>>        at
>>
>> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
>>        at
>>
>> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
>>        at
>>
>> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
>>        at
>>
>> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
>>        at
>>
>> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
>>
>>
>> Brgds,
>> Peter Thygesen
>>
>> BTW: Great work, very interesting project.
>>
>

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Jun Rao <ju...@gmail.com>.

Peter,

When you kill your consumer, it takes sometime (by default 6 secs) for ZK
server to release the ephemeral nodes hold by the consumer. If you restart
your consumer quickly, the consumer may not be able to acquire the
necessary zk nodes and therefore can fail to rebalance. One way is to
implement a shutdown hook in your consumer application and make sure
consumer connector is shut down when the consumer is killed.

Thanks,

Jun

On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt...@gmail.com>wrote:

> When I shutdown my consumer with crtl-c and tries to restart it quickly
> afterwards, I usually get ConsumerRebalanceFailedException (see below). The
> application then seems to hang.. or at least I'm sure if it is running any
> more.. If this exception is thrown, will the consumer the intelligently
> wait for the rebalancing to complete? and then resume consumption?
>
> I found a page
> https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
> describes something about Consumer Co-ordinator.. according to this
> the consumer
> group remains in this state until the next rebalancing attempt is
> triggered. But when is it triggered?
>
> Could a shutdown hock with a consumer.commitOffsets help?
> Does the consumer.shutdown implicit commitOffsets?
>
>
> Exception in thread "main" kafka.common.ConsumerRebalanceFailedException:
> contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f can't
> rebalance after 4 retries
>        at
>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
>        at
>
> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
>        at
>
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
>        at
>
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
>        at
>
> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
>        at
>
> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
>
>
> Brgds,
> Peter Thygesen
>
> BTW: Great work, very interesting project.
>

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Neha Narkhede <ne...@gmail.com>.

Sam,

That seems like a bug. If you can reproduce it with Kafka 0.7.1, would you
mind filing a bug and attaching a test case ?

Thanks,
Neha

On Wed, Jul 18, 2012 at 12:04 PM, Sam William <sa...@stumbleupon.com> wrote:

> Neha,
>  Here is the full stack trace
>
> org.I0Itec.zkclient.exception.ZkNoNodeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for
> /consumers/sampd-rate2/ids/sampd-rate2_sv4r25s49-1342637842023-a4b442c4
>    at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>         at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:750)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:744)
>         at kafka.utils.ZkUtils$.readData(ZkUtils.scala:163)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$getTopicCount(ZookeeperConsumerConnector.scala:421)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:460)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:437)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:433)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.handleChildChange(ZookeeperConsumerConnector.scala:375)
>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /consumers/sampd-rate2/ids/sampd-rate2_sv4r25s49-1342637842023-a4b442c4
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
>         at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
>         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
>         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
>         at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>
>
>
> On Jul 17, 2012, at 3:03 PM, Neha Narkhede wrote:
>
> > Sam,
> >
> > Please could you send around the entire stack trace for that exception ?
> It
> > means that the consumer couldn't complete a rebalancing operation and it
> is
> > possible that the consumer is not pulling all the data for the requested
> > topics.
> >
> > Thanks,
> > Neha
> >
> > On Tue, Jul 17, 2012 at 1:29 PM, Sam William <sa...@stumbleupon.com>
> wrote:
> >
> >>
> >> On Mar 20, 2012, at 8:49 AM, Neha Narkhede wrote:
> >>
> >>> Peter,
> >>>
> >>>>> If this exception is thrown, will the consumer the intelligently wait
> >> for the rebalancing to complete? and then resume consumption?
> >>>
> >>> If this exception is thrown, it means that the consumer has failed the
> >>> current rebalancing attempt and will try only when one of the
> >>> following happens -
> >>>
> >>> 1. New partitions are added to the topic it is consuming
> >>> 2. Existing partitions become unavailable
> >>> 3. New consumer instances are brought up for the consumer group it
> >> belongs to
> >>> 4. Existing consumer instances die for the consumer group it belongs to
> >>>
> >>> Until that, the consumer is not fully functional. So, this particular
> >>> exception should be monitored and the consumer instance should be
> >>> restarted.
> >>>
> >>> Having said that, it is pretty rare for the consumer to run out of
> >>> rebalancing attempts. One of the common causes is using zookeeper
> >>> 3.3.3 which causes older ephemeral nodes to be retained.
> >>> Which version of Kafka are you using ?
> >>> Would you mind attaching the entire log for the consumer. It will help
> >>> us debug the cause of this exception and see if it is an actual bug.
> >>>
> >>> Thanks,
> >>> Neha
> >>>
> >>>
> >>
> >>
> >> Neha,
> >>   I see this exception
> >>
> >> 2012-07-17 12:58:12,238 ERROR
> >>
> [ZkClient-EventThread-17-11.zookeeper.,12.zookeeper.,13.zookeeper.,14.zookeeper.,16.zookeeper./kafka]
> >> zkclient.ZkEventThread Error handling event ZkEvent[Children of
> >> /consumers/live-event-sense-new8/ids changed sent to
> >> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener@6d9dd520
> ]
> >> java.lang.RuntimeException:
> >> live-event-sense-new8_sv4r25s49-1342554132312-c04abfef can't rebalance
> >> after 4 retires
> >>
> >>
> >> occurring very often.  I use ZK 3.4.3.    Im not handling/monitoring
> this
> >> exception . The consumer seems to continue just fine after this
> happens.  I
> >> do not see any on the 4 conditions you mentioned happening. Am I missing
> >> something ?
> >>
> >> Thanks,
> >> Sam
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>> On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt.activemq@gmail.com
> >
> >> wrote:
> >>>> When I shutdown my consumer with crtl-c and tries to restart it
> quickly
> >>>> afterwards, I usually get ConsumerRebalanceFailedException (see
> below).
> >> The
> >>>> application then seems to hang.. or at least I'm sure if it is running
> >> any
> >>>> more.. If this exception is thrown, will the consumer the
> intelligently
> >>>> wait for the rebalancing to complete? and then resume consumption?
> >>>>
> >>>> I found a page
> >> https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
> >>>> describes something about Consumer Co-ordinator.. according to this
> >>>> the consumer
> >>>> group remains in this state until the next rebalancing attempt is
> >>>> triggered. But when is it triggered?
> >>>>
> >>>> Could a shutdown hock with a consumer.commitOffsets help?
> >>>> Does the consumer.shutdown implicit commitOffsets?
> >>>>
> >>>>
> >>>> Exception in thread "main"
> >> kafka.common.ConsumerRebalanceFailedException:
> >>>> contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f
> >> can't
> >>>> rebalance after 4 retries
> >>>>       at
> >>>>
> >>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
> >>>>       at
> >>>>
> >>
> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
> >>>>       at
> >>>>
> >>
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
> >>>>       at
> >>>>
> >>
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
> >>>>       at
> >>>>
> >>
> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
> >>>>       at
> >>>>
> >>
> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
> >>>>
> >>>>
> >>>> Brgds,
> >>>> Peter Thygesen
> >>>>
> >>>> BTW: Great work, very interesting project.
> >>
> >> Sam William
> >> sampd@stumbleupon.com
> >>
> >>
> >>
> >>
>
> Sam William
> sampd@stumbleupon.com
>
>
>
>

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Sam William <sa...@stumbleupon.com>.

Neha,
 Here is the full stack trace 

org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/sampd-rate2/ids/sampd-rate2_sv4r25s49-1342637842023-a4b442c4        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:750)
        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:744)        
        at kafka.utils.ZkUtils$.readData(ZkUtils.scala:163)
        at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$getTopicCount(ZookeeperConsumerConnector.scala:421)        
        at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:460)
        at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:437)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
        at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:433)
        at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.handleChildChange(ZookeeperConsumerConnector.scala:375)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/sampd-rate2/ids/sampd-rate2_sv4r25s49-1342637842023-a4b442c4 
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
        at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
        at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
        at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)



On Jul 17, 2012, at 3:03 PM, Neha Narkhede wrote:

> Sam,
> 
> Please could you send around the entire stack trace for that exception ? It
> means that the consumer couldn't complete a rebalancing operation and it is
> possible that the consumer is not pulling all the data for the requested
> topics.
> 
> Thanks,
> Neha
> 
> On Tue, Jul 17, 2012 at 1:29 PM, Sam William <sa...@stumbleupon.com> wrote:
> 
>> 
>> On Mar 20, 2012, at 8:49 AM, Neha Narkhede wrote:
>> 
>>> Peter,
>>> 
>>>>> If this exception is thrown, will the consumer the intelligently wait
>> for the rebalancing to complete? and then resume consumption?
>>> 
>>> If this exception is thrown, it means that the consumer has failed the
>>> current rebalancing attempt and will try only when one of the
>>> following happens -
>>> 
>>> 1. New partitions are added to the topic it is consuming
>>> 2. Existing partitions become unavailable
>>> 3. New consumer instances are brought up for the consumer group it
>> belongs to
>>> 4. Existing consumer instances die for the consumer group it belongs to
>>> 
>>> Until that, the consumer is not fully functional. So, this particular
>>> exception should be monitored and the consumer instance should be
>>> restarted.
>>> 
>>> Having said that, it is pretty rare for the consumer to run out of
>>> rebalancing attempts. One of the common causes is using zookeeper
>>> 3.3.3 which causes older ephemeral nodes to be retained.
>>> Which version of Kafka are you using ?
>>> Would you mind attaching the entire log for the consumer. It will help
>>> us debug the cause of this exception and see if it is an actual bug.
>>> 
>>> Thanks,
>>> Neha
>>> 
>>> 
>> 
>> 
>> Neha,
>>   I see this exception
>> 
>> 2012-07-17 12:58:12,238 ERROR
>> [ZkClient-EventThread-17-11.zookeeper.,12.zookeeper.,13.zookeeper.,14.zookeeper.,16.zookeeper./kafka]
>> zkclient.ZkEventThread Error handling event ZkEvent[Children of
>> /consumers/live-event-sense-new8/ids changed sent to
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener@6d9dd520]
>> java.lang.RuntimeException:
>> live-event-sense-new8_sv4r25s49-1342554132312-c04abfef can't rebalance
>> after 4 retires
>> 
>> 
>> occurring very often.  I use ZK 3.4.3.    Im not handling/monitoring this
>> exception . The consumer seems to continue just fine after this happens.  I
>> do not see any on the 4 conditions you mentioned happening. Am I missing
>> something ?
>> 
>> Thanks,
>> Sam
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt...@gmail.com>
>> wrote:
>>>> When I shutdown my consumer with crtl-c and tries to restart it quickly
>>>> afterwards, I usually get ConsumerRebalanceFailedException (see below).
>> The
>>>> application then seems to hang.. or at least I'm sure if it is running
>> any
>>>> more.. If this exception is thrown, will the consumer the intelligently
>>>> wait for the rebalancing to complete? and then resume consumption?
>>>> 
>>>> I found a page
>> https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
>>>> describes something about Consumer Co-ordinator.. according to this
>>>> the consumer
>>>> group remains in this state until the next rebalancing attempt is
>>>> triggered. But when is it triggered?
>>>> 
>>>> Could a shutdown hock with a consumer.commitOffsets help?
>>>> Does the consumer.shutdown implicit commitOffsets?
>>>> 
>>>> 
>>>> Exception in thread "main"
>> kafka.common.ConsumerRebalanceFailedException:
>>>> contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f
>> can't
>>>> rebalance after 4 retries
>>>>       at
>>>> 
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
>>>>       at
>>>> 
>> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
>>>>       at
>>>> 
>> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
>>>>       at
>>>> 
>> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
>>>>       at
>>>> 
>> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
>>>>       at
>>>> 
>> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
>>>> 
>>>> 
>>>> Brgds,
>>>> Peter Thygesen
>>>> 
>>>> BTW: Great work, very interesting project.
>> 
>> Sam William
>> sampd@stumbleupon.com
>> 
>> 
>> 
>> 

Sam William
sampd@stumbleupon.com

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Neha Narkhede <ne...@gmail.com>.

Sam,

Please could you send around the entire stack trace for that exception ? It
means that the consumer couldn't complete a rebalancing operation and it is
possible that the consumer is not pulling all the data for the requested
topics.

Thanks,
Neha

On Tue, Jul 17, 2012 at 1:29 PM, Sam William <sa...@stumbleupon.com> wrote:

>
> On Mar 20, 2012, at 8:49 AM, Neha Narkhede wrote:
>
> > Peter,
> >
> >>> If this exception is thrown, will the consumer the intelligently wait
> for the rebalancing to complete? and then resume consumption?
> >
> > If this exception is thrown, it means that the consumer has failed the
> > current rebalancing attempt and will try only when one of the
> > following happens -
> >
> > 1. New partitions are added to the topic it is consuming
> > 2. Existing partitions become unavailable
> > 3. New consumer instances are brought up for the consumer group it
> belongs to
> > 4. Existing consumer instances die for the consumer group it belongs to
> >
> > Until that, the consumer is not fully functional. So, this particular
> > exception should be monitored and the consumer instance should be
> > restarted.
> >
> > Having said that, it is pretty rare for the consumer to run out of
> > rebalancing attempts. One of the common causes is using zookeeper
> > 3.3.3 which causes older ephemeral nodes to be retained.
> > Which version of Kafka are you using ?
> > Would you mind attaching the entire log for the consumer. It will help
> > us debug the cause of this exception and see if it is an actual bug.
> >
> > Thanks,
> > Neha
> >
> >
>
>
>  Neha,
>    I see this exception
>
> 2012-07-17 12:58:12,238 ERROR
> [ZkClient-EventThread-17-11.zookeeper.,12.zookeeper.,13.zookeeper.,14.zookeeper.,16.zookeeper./kafka]
> zkclient.ZkEventThread Error handling event ZkEvent[Children of
> /consumers/live-event-sense-new8/ids changed sent to
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener@6d9dd520]
> java.lang.RuntimeException:
> live-event-sense-new8_sv4r25s49-1342554132312-c04abfef can't rebalance
> after 4 retires
>
>
>  occurring very often.  I use ZK 3.4.3.    Im not handling/monitoring this
> exception . The consumer seems to continue just fine after this happens.  I
> do not see any on the 4 conditions you mentioned happening. Am I missing
> something ?
>
> Thanks,
> Sam
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> > On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt...@gmail.com>
> wrote:
> >> When I shutdown my consumer with crtl-c and tries to restart it quickly
> >> afterwards, I usually get ConsumerRebalanceFailedException (see below).
> The
> >> application then seems to hang.. or at least I'm sure if it is running
> any
> >> more.. If this exception is thrown, will the consumer the intelligently
> >> wait for the rebalancing to complete? and then resume consumption?
> >>
> >> I found a page
> https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
> >> describes something about Consumer Co-ordinator.. according to this
> >> the consumer
> >> group remains in this state until the next rebalancing attempt is
> >> triggered. But when is it triggered?
> >>
> >> Could a shutdown hock with a consumer.commitOffsets help?
> >> Does the consumer.shutdown implicit commitOffsets?
> >>
> >>
> >> Exception in thread "main"
> kafka.common.ConsumerRebalanceFailedException:
> >> contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f
> can't
> >> rebalance after 4 retries
> >>        at
> >>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
> >>        at
> >>
> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
> >>        at
> >>
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
> >>        at
> >>
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
> >>        at
> >>
> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
> >>        at
> >>
> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
> >>
> >>
> >> Brgds,
> >> Peter Thygesen
> >>
> >> BTW: Great work, very interesting project.
>
> Sam William
> sampd@stumbleupon.com
>
>
>
>

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Sam William <sa...@stumbleupon.com>.

On Mar 20, 2012, at 8:49 AM, Neha Narkhede wrote:

> Peter,
> 
>>> If this exception is thrown, will the consumer the intelligently wait for the rebalancing to complete? and then resume consumption?
> 
> If this exception is thrown, it means that the consumer has failed the
> current rebalancing attempt and will try only when one of the
> following happens -
> 
> 1. New partitions are added to the topic it is consuming
> 2. Existing partitions become unavailable
> 3. New consumer instances are brought up for the consumer group it belongs to
> 4. Existing consumer instances die for the consumer group it belongs to
> 
> Until that, the consumer is not fully functional. So, this particular
> exception should be monitored and the consumer instance should be
> restarted.
> 
> Having said that, it is pretty rare for the consumer to run out of
> rebalancing attempts. One of the common causes is using zookeeper
> 3.3.3 which causes older ephemeral nodes to be retained.
> Which version of Kafka are you using ?
> Would you mind attaching the entire log for the consumer. It will help
> us debug the cause of this exception and see if it is an actual bug.
> 
> Thanks,
> Neha
> 
> 


 Neha,
   I see this exception

2012-07-17 12:58:12,238 ERROR [ZkClient-EventThread-17-11.zookeeper.,12.zookeeper.,13.zookeeper.,14.zookeeper.,16.zookeeper./kafka] zkclient.ZkEventThread Error handling event ZkEvent[Children of /consumers/live-event-sense-new8/ids changed sent to kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener@6d9dd520]
java.lang.RuntimeException: live-event-sense-new8_sv4r25s49-1342554132312-c04abfef can't rebalance after 4 retires


 occurring very often.  I use ZK 3.4.3.    Im not handling/monitoring this exception . The consumer seems to continue just fine after this happens.  I do not see any on the 4 conditions you mentioned happening. Am I missing something ?

Thanks,
Sam














> On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt...@gmail.com> wrote:
>> When I shutdown my consumer with crtl-c and tries to restart it quickly
>> afterwards, I usually get ConsumerRebalanceFailedException (see below). The
>> application then seems to hang.. or at least I'm sure if it is running any
>> more.. If this exception is thrown, will the consumer the intelligently
>> wait for the rebalancing to complete? and then resume consumption?
>> 
>> I found a page https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
>> describes something about Consumer Co-ordinator.. according to this
>> the consumer
>> group remains in this state until the next rebalancing attempt is
>> triggered. But when is it triggered?
>> 
>> Could a shutdown hock with a consumer.commitOffsets help?
>> Does the consumer.shutdown implicit commitOffsets?
>> 
>> 
>> Exception in thread "main" kafka.common.ConsumerRebalanceFailedException:
>> contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f can't
>> rebalance after 4 retries
>>        at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
>>        at
>> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
>>        at
>> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
>>        at
>> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
>>        at
>> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
>>        at
>> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
>> 
>> 
>> Brgds,
>> Peter Thygesen
>> 
>> BTW: Great work, very interesting project.

Sam William
sampd@stumbleupon.com

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Peter Thygesen <pt...@gmail.com>.

I'm running kafka-0.7 on a hbase cluster where the zookeepers are from
Cloudera's CHD3 Update 2 = zookeeper-3.3.3

thank you for you time and support :)

I'll try and add a shutdown hook and close the consumer more "nicely".

/Peter

Den 20. mar. 2012 16.49 skrev Neha Narkhede <ne...@gmail.com>:

> Peter,
>
> >> If this exception is thrown, will the consumer the intelligently wait
> for the rebalancing to complete? and then resume consumption?
>
> If this exception is thrown, it means that the consumer has failed the
> current rebalancing attempt and will try only when one of the
> following happens -
>
> 1. New partitions are added to the topic it is consuming
> 2. Existing partitions become unavailable
> 3. New consumer instances are brought up for the consumer group it belongs
> to
> 4. Existing consumer instances die for the consumer group it belongs to
>
> Until that, the consumer is not fully functional. So, this particular
> exception should be monitored and the consumer instance should be
> restarted.
>
> Having said that, it is pretty rare for the consumer to run out of
> rebalancing attempts. One of the common causes is using zookeeper
> 3.3.3 which causes older ephemeral nodes to be retained.
> Which version of Kafka are you using ?
> Would you mind attaching the entire log for the consumer. It will help
> us debug the cause of this exception and see if it is an actual bug.
>
> Thanks,
> Neha
>
>
> On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt...@gmail.com>
> wrote:
> > When I shutdown my consumer with crtl-c and tries to restart it quickly
> > afterwards, I usually get ConsumerRebalanceFailedException (see below).
> The
> > application then seems to hang.. or at least I'm sure if it is running
> any
> > more.. If this exception is thrown, will the consumer the intelligently
> > wait for the rebalancing to complete? and then resume consumption?
> >
> > I found a page
> https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
> > describes something about Consumer Co-ordinator.. according to this
> > the consumer
> > group remains in this state until the next rebalancing attempt is
> > triggered. But when is it triggered?
> >
> > Could a shutdown hock with a consumer.commitOffsets help?
> > Does the consumer.shutdown implicit commitOffsets?
> >
> >
> > Exception in thread "main" kafka.common.ConsumerRebalanceFailedException:
> > contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f can't
> > rebalance after 4 retries
> >        at
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
> >        at
> >
> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
> >        at
> >
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
> >        at
> >
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
> >        at
> >
> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
> >        at
> >
> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
> >
> >
> > Brgds,
> > Peter Thygesen
> >
> > BTW: Great work, very interesting project.
>

Re: Shutdown/Ctrl-C and ConsumerRebalanceFailedException

Posted by Neha Narkhede <ne...@gmail.com>.

Peter,

>> If this exception is thrown, will the consumer the intelligently wait for the rebalancing to complete? and then resume consumption?

If this exception is thrown, it means that the consumer has failed the
current rebalancing attempt and will try only when one of the
following happens -

1. New partitions are added to the topic it is consuming
2. Existing partitions become unavailable
3. New consumer instances are brought up for the consumer group it belongs to
4. Existing consumer instances die for the consumer group it belongs to

Until that, the consumer is not fully functional. So, this particular
exception should be monitored and the consumer instance should be
restarted.

Having said that, it is pretty rare for the consumer to run out of
rebalancing attempts. One of the common causes is using zookeeper
3.3.3 which causes older ephemeral nodes to be retained.
Which version of Kafka are you using ?
Would you mind attaching the entire log for the consumer. It will help
us debug the cause of this exception and see if it is an actual bug.

Thanks,
Neha


On Tue, Mar 20, 2012 at 2:42 AM, Peter Thygesen <pt...@gmail.com> wrote:
> When I shutdown my consumer with crtl-c and tries to restart it quickly
> afterwards, I usually get ConsumerRebalanceFailedException (see below). The
> application then seems to hang.. or at least I'm sure if it is running any
> more.. If this exception is thrown, will the consumer the intelligently
> wait for the rebalancing to complete? and then resume consumption?
>
> I found a page https://cwiki.apache.org/KAFKA/consumer-co-ordinator.htmlthat
> describes something about Consumer Co-ordinator.. according to this
> the consumer
> group remains in this state until the next rebalancing attempt is
> triggered. But when is it triggered?
>
> Could a shutdown hock with a consumer.commitOffsets help?
> Does the consumer.shutdown implicit commitOffsets?
>
>
> Exception in thread "main" kafka.common.ConsumerRebalanceFailedException:
> contentItem-consumer-group-1_cphhdfs01node09-1332175323213-e6a3010f can't
> rebalance after 4 retries
>        at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:467)
>        at
> kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:204)
>        at
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:75)
>        at
> kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:89)
>        at
> com.infopaq.research.repository.uima.ContentItemClient.consume(ContentItemClient.java:75)
>        at
> com.infopaq.research.repository.uima.ContentItemClient.main(ContentItemClient.java:111)
>
>
> Brgds,
> Peter Thygesen
>
> BTW: Great work, very interesting project.