You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jason Rosenberg <jb...@squareup.com> on 2014/08/07 08:18:19 UTC

consumer rebalance weirdness

We've noticed that some of our consumers are more likely to repeatedly
trigger rebalancing when the app is consuming messages more slowly (e.g.
persisting data to back-end systems, etc.).

If on the other hand we 'fast-forward' the consumer (which essentially
means we tell it to consume but do nothing with the messages until all
caught up), it will never decide to do a rebalance during this time.  So it
can go hours without rebalancing while fast forwarding and consuming super
fast, while during normal processing, it might decide to rebalance every
minute or so.

Is there any simple explanation for this?

Usually the trigger for rebalance logged is that a "topic info for path X
has changed to Y, triggering rebalance".

Thanks for any ideas.

We'd like to reduce the rebalancing, as it essentially slows down
consumption each time it happens.

Thanks

Jason

Re: consumer rebalance weirdness

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.

Turn on GC logging (verbose time stamps) and see how long your pauses are. 

Sure, try increasing the timeout to see if it fixes the problem, but I would hesitate making that change permanent until you understand the problem better. 

You could also profile your consumer to see where it is spending its time.  Perhaps you can make your message consumption quicker. I am sure the core commiters would also have some ideas.

Philip

----------------------------------
http://www.philipotoole.com

> On Aug 7, 2014, at 3:06 PM, Jason Rosenberg <jb...@squareup.com> wrote:
> 
> Yeah, it's possible that's happening (but no smoking gun).  The main thing I'm seeing is that when it actually takes the time to process messages, it takes longer to get back to the ConsumerIterator for the next message.  That alone seems to be the problem (does that make any sense)?  I would have thought the zk listeners are in separate async threads (and that's what it looks like looking at the kafka consumer code).
> 
> Maybe I should increase the zk session timeout and see if that helps.
> 
> 
>> On Thu, Aug 7, 2014 at 2:56 PM, Philip O'Toole <ph...@yahoo.com.invalid> wrote:
>> A big GC pause in your application, for example, could do it.
>> 
>> Philip
>> 
>>  
>> -----------------------------------------
>> http://www.philipotoole.com
>> 
>> 
>> On Thursday, August 7, 2014 11:56 AM, Philip O'Toole <ph...@yahoo.com> wrote:
>> 
>> 
>> 
>> I think the question is what in your consuming application could cause it not to check in with ZK for longer than the timeout.
>> 
>>  
>> -----------------------------------------
>> http://www.philipotoole.com
>> 
>> 
>> On Thursday, August 7, 2014 8:16 AM, Jason Rosenberg <jb...@squareup.com> wrote:
>> 
>> 
>> 
>> Well, it's possible that when processing, it might take longer than the
>> zookeeper timeout to process a message, intermittently.  Would that cause a
>> zookeeper timeout?
>> 
>> (btw I'm usind 0.8.1.1).
>> 
>> 
>> 
>> On Thu, Aug 7, 2014 at 2:30 AM, Clark Haskins <chaskins@linkedin.com.invalid
>> > wrote:
>> 
>> > Is your application possibly timing out its zookeeper connection during
>> > consumption while doing its processing, thus triggering the rebalance?
>> >
>> > -Clark
>> >
>> > On 8/6/14, 11:18 PM, "Jason Rosenberg" <jb...@squareup.com> wrote:
>> >
>> > >We've noticed that some of our consumers are more likely to repeatedly
>> > >trigger rebalancing when the app is consuming messages more slowly (e.g.
>> > >persisting data to back-end systems, etc.).
>> > >
>> > >If on the other hand we 'fast-forward' the consumer (which essentially
>> > >means we tell it to consume but do nothing with the messages until all
>> > >caught up), it will never decide to do a rebalance during this time.  So
>> > >it
>> > >can go hours without rebalancing while fast forwarding and consuming super
>> > >fast, while during normal processing, it might decide to rebalance every
>> > >minute or
>>  so.
>> > >
>> > >Is there any simple explanation for this?
>> > >
>> > >Usually the trigger for rebalance logged is that a "topic info for path X
>> > >has changed to Y, triggering rebalance".
>> > >
>> > >Thanks for any ideas.
>> > >
>> > >We'd like to reduce the rebalancing, as it essentially slows down
>> > >consumption each time it happens.
>> > >
>> > >Thanks
>> > >
>> > >Jason
>> >
>> >
>

Re: consumer rebalance weirdness

Posted by Jason Rosenberg <jb...@squareup.com>.

Yeah, it's possible that's happening (but no smoking gun).  The main thing
I'm seeing is that when it actually takes the time to process messages, it
takes longer to get back to the ConsumerIterator for the next message.
 That alone seems to be the problem (does that make any sense)?  I would
have thought the zk listeners are in separate async threads (and that's
what it looks like looking at the kafka consumer code).

Maybe I should increase the zk session timeout and see if that helps.


On Thu, Aug 7, 2014 at 2:56 PM, Philip O'Toole <
philip.otoole@yahoo.com.invalid> wrote:

> A big GC pause in your application, for example, could do it.
>
> Philip
>
>
> -----------------------------------------
> http://www.philipotoole.com
>
>
> On Thursday, August 7, 2014 11:56 AM, Philip O'Toole <
> philip.otoole@yahoo.com> wrote:
>
>
>
> I think the question is what in your consuming application could cause it
> not to check in with ZK for longer than the timeout.
>
>
> -----------------------------------------
> http://www.philipotoole.com
>
>
> On Thursday, August 7, 2014 8:16 AM, Jason Rosenberg <jb...@squareup.com>
> wrote:
>
>
>
> Well, it's possible that when processing, it might take longer than the
> zookeeper timeout to process a message, intermittently.  Would that cause a
> zookeeper timeout?
>
> (btw I'm usind 0.8.1.1).
>
>
>
> On Thu, Aug 7, 2014 at 2:30 AM, Clark Haskins
> <chaskins@linkedin.com.invalid
> > wrote:
>
> > Is your application possibly timing out its zookeeper connection during
> > consumption while doing its processing, thus triggering the rebalance?
> >
> > -Clark
> >
> > On 8/6/14, 11:18 PM, "Jason Rosenberg" <jb...@squareup.com> wrote:
> >
> > >We've noticed that some of our consumers are more likely to repeatedly
> > >trigger rebalancing when the app is consuming messages more slowly (e.g.
> > >persisting data to back-end systems, etc.).
> > >
> > >If on the other hand we 'fast-forward' the consumer (which essentially
> > >means we tell it to consume but do nothing with the messages until all
> > >caught up), it will never decide to do a rebalance during this time.  So
> > >it
> > >can go hours without rebalancing while fast forwarding and consuming
> super
> > >fast, while during normal processing, it might decide to rebalance every
> > >minute or
>  so.
> > >
> > >Is there any simple explanation for this?
> > >
> > >Usually the trigger for rebalance logged is that a "topic info for path
> X
> > >has changed to Y, triggering rebalance".
> > >
> > >Thanks for any ideas.
> > >
> > >We'd like to reduce the rebalancing, as it essentially slows down
> > >consumption each time it happens.
> > >
> > >Thanks
> > >
> > >Jason
> >
> >
>

Re: consumer rebalance weirdness

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.

A big GC pause in your application, for example, could do it.

Philip

 
-----------------------------------------
http://www.philipotoole.com 


On Thursday, August 7, 2014 11:56 AM, Philip O'Toole <ph...@yahoo.com> wrote:
 


I think the question is what in your consuming application could cause it not to check in with ZK for longer than the timeout.

 
-----------------------------------------
http://www.philipotoole.com 


On Thursday, August 7, 2014 8:16 AM, Jason Rosenberg <jb...@squareup.com> wrote:
 


Well, it's possible that when processing, it might take longer than the
zookeeper timeout to process a message, intermittently.  Would that cause a
zookeeper timeout?

(btw I'm usind 0.8.1.1).



On Thu, Aug 7, 2014 at 2:30 AM, Clark Haskins <chaskins@linkedin.com.invalid
> wrote:

> Is your application possibly timing out its zookeeper connection during
> consumption while doing its processing, thus triggering the rebalance?
>
> -Clark
>
> On 8/6/14, 11:18 PM, "Jason Rosenberg" <jb...@squareup.com> wrote:
>
> >We've noticed that some of our consumers are more likely to repeatedly
> >trigger rebalancing when the app is consuming messages more slowly (e.g.
> >persisting data to back-end systems, etc.).
> >
> >If on the other hand we 'fast-forward' the consumer (which essentially
> >means we tell it to consume but do nothing with the messages until all
> >caught up), it will never decide to do a rebalance during this time.  So
> >it
> >can go hours without rebalancing while fast forwarding and consuming super
> >fast, while during normal processing, it might decide to rebalance every
> >minute or
 so.
> >
> >Is there any simple explanation for this?
> >
> >Usually the trigger for rebalance logged is that a "topic info for path X
> >has changed to Y, triggering rebalance".
> >
> >Thanks for any ideas.
> >
> >We'd like to reduce the rebalancing, as it essentially slows down
> >consumption each time it happens.
> >
> >Thanks
> >
> >Jason
>
>

Re: consumer rebalance weirdness

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.

I think the question is what in your consuming application could cause it not to check in with ZK for longer than the timeout.

 
-----------------------------------------
http://www.philipotoole.com 


On Thursday, August 7, 2014 8:16 AM, Jason Rosenberg <jb...@squareup.com> wrote:
 


Well, it's possible that when processing, it might take longer than the
zookeeper timeout to process a message, intermittently.  Would that cause a
zookeeper timeout?

(btw I'm usind 0.8.1.1).



On Thu, Aug 7, 2014 at 2:30 AM, Clark Haskins <chaskins@linkedin.com.invalid
> wrote:

> Is your application possibly timing out its zookeeper connection during
> consumption while doing its processing, thus triggering the rebalance?
>
> -Clark
>
> On 8/6/14, 11:18 PM, "Jason Rosenberg" <jb...@squareup.com> wrote:
>
> >We've noticed that some of our consumers are more likely to repeatedly
> >trigger rebalancing when the app is consuming messages more slowly (e.g.
> >persisting data to back-end systems, etc.).
> >
> >If on the other hand we 'fast-forward' the consumer (which essentially
> >means we tell it to consume but do nothing with the messages until all
> >caught up), it will never decide to do a rebalance during this time.  So
> >it
> >can go hours without rebalancing while fast forwarding and consuming super
> >fast, while during normal processing, it might decide to rebalance every
> >minute or so.
> >
> >Is there any simple explanation for this?
> >
> >Usually the trigger for rebalance logged is that a "topic info for path X
> >has changed to Y, triggering rebalance".
> >
> >Thanks for any ideas.
> >
> >We'd like to reduce the rebalancing, as it essentially slows down
> >consumption each time it happens.
> >
> >Thanks
> >
> >Jason
>
>

Re: consumer rebalance weirdness

Posted by Jason Rosenberg <jb...@squareup.com>.

Well, it's possible that when processing, it might take longer than the
zookeeper timeout to process a message, intermittently.  Would that cause a
zookeeper timeout?

(btw I'm usind 0.8.1.1).


On Thu, Aug 7, 2014 at 2:30 AM, Clark Haskins <chaskins@linkedin.com.invalid
> wrote:

> Is your application possibly timing out its zookeeper connection during
> consumption while doing its processing, thus triggering the rebalance?
>
> -Clark
>
> On 8/6/14, 11:18 PM, "Jason Rosenberg" <jb...@squareup.com> wrote:
>
> >We've noticed that some of our consumers are more likely to repeatedly
> >trigger rebalancing when the app is consuming messages more slowly (e.g.
> >persisting data to back-end systems, etc.).
> >
> >If on the other hand we 'fast-forward' the consumer (which essentially
> >means we tell it to consume but do nothing with the messages until all
> >caught up), it will never decide to do a rebalance during this time.  So
> >it
> >can go hours without rebalancing while fast forwarding and consuming super
> >fast, while during normal processing, it might decide to rebalance every
> >minute or so.
> >
> >Is there any simple explanation for this?
> >
> >Usually the trigger for rebalance logged is that a "topic info for path X
> >has changed to Y, triggering rebalance".
> >
> >Thanks for any ideas.
> >
> >We'd like to reduce the rebalancing, as it essentially slows down
> >consumption each time it happens.
> >
> >Thanks
> >
> >Jason
>
>

Re: consumer rebalance weirdness

Posted by Clark Haskins <ch...@linkedin.com.INVALID>.

Is your application possibly timing out its zookeeper connection during
consumption while doing its processing, thus triggering the rebalance?

-Clark

On 8/6/14, 11:18 PM, "Jason Rosenberg" <jb...@squareup.com> wrote:

>We've noticed that some of our consumers are more likely to repeatedly
>trigger rebalancing when the app is consuming messages more slowly (e.g.
>persisting data to back-end systems, etc.).
>
>If on the other hand we 'fast-forward' the consumer (which essentially
>means we tell it to consume but do nothing with the messages until all
>caught up), it will never decide to do a rebalance during this time.  So
>it
>can go hours without rebalancing while fast forwarding and consuming super
>fast, while during normal processing, it might decide to rebalance every
>minute or so.
>
>Is there any simple explanation for this?
>
>Usually the trigger for rebalance logged is that a "topic info for path X
>has changed to Y, triggering rebalance".
>
>Thanks for any ideas.
>
>We'd like to reduce the rebalancing, as it essentially slows down
>consumption each time it happens.
>
>Thanks
>
>Jason