You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by 杨宝栓 <ya...@rcrai.com> on 2022/04/27 05:46:29 UTC

Unexpected Rebalances ，Any tips on APIs or debug techniques to figure out rebalance causes?

HI：
We are seeing unexpected rebalances in golang consumers, described below.
1. We have a topic with 36 partitions，and one consumer (lets name it consumer1) consuming it.
2. Run kafka in Docker and configuration：We use defaults
3. Consumer consuming data is slowly about cost 1s for one piece of data
4. All the consumers for topic A are in the same group
5. The rebalances are intermittent and hard to reproduce. We see no obvious errors in the logs.
6. No matter how to change the configuration that affects rebalance ，it always rebalance
The configuration that affects rebalance below:
max.poll.interval.ms
max.poll.records
request.timeout.ms
session.timeout.ms
As far as I am concerned, conditions of the rebalance contains：
a consumer is considered DEAD by the group coordinator.
A. when the consumer is busy, which means that no heartbeats has been sent in the meanwhile by the consumer to the group coordinator within the configured session interval
B. when the consumer is slowly with a long-running processing, which means that interval of poll() is too long within the configured max.poll.interval.ms。
question：
1. is it correctly the idea about B ? i think it maybe the major factor for rebalance in my case ,because consuming data is slowly.
2. i have do an experiment for assert B. but not reproduce
A：consumer fast or slow can't trigger it
3. Any tips on APIs or debug techniques to figure out rebalance causes
4. How can I trigger it manually ?
5. Is it a bad idea to have the same Consumer Group (Same ID) consuming from multiple topics ?

Re: Unexpected Rebalances ，Any tips on APIs or debug techniques to figure out rebalance causes?

Posted by sunil chaudhari <su...@gmail.com>.

Rebalancing happens mainly because of these reasons:

You restart consumer
Consumer host is not reachable
You stop consumer

All above situations are fine when you have sufficient number of
consumers(threads) to read from the available partitions and all consumers
are logically distributed across multiple consumer groups.

In case you have grouped many consumers into same group reading from
multiple topics then it may cause frequent rebalancing.
Even if one consumer has availability issue whole group will go for
rebalancing.

Recently we gone thru same problem and what we did is, whole group is
devided in many groups logically.
Each group is associated with specific topic and consumers are distributed
across those groups.

Now in case of consumer fails, only that small group is rebalanced not
others.

I hope i have answered your question with my limited knowledge😊

Corrections are welcome!


Regards,
Sunil.


On Thu, 28 Apr 2022 at 8:42 AM, Luke Chen <sh...@gmail.com> wrote:

> 1. is it  correctly the idea about  B ?   i think it maybe the major factor
> for rebalance in my case  ,because  consuming data is slowly.
> => Looks like so, but we cannot confirm that because we don't have other
> information. You should check the consumer log to see why the consumer
> leave group.
>
> 2. i have do an experiment  for  assert  B. but not reproduce
>          A：consumer fast or slow  can't trigger it
> You should also adjust the heartbeat interval to allow the heartbeat detect
> the poll expiration.
> You can refer to this test:
>
> https://github.com/apache/kafka/blob/trunk/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L167
>
> 3. Any tips on APIs or debug techniques to figure out rebalance causes
> On server side, you can check log like this:
> "Preparing to rebalance group xxx ... (reason: yyyyy)
>
>
>
>  4. How can I trigger it manually ?
> => same as question 2
>
>  5. Is it a bad idea to have the same Consumer Group (Same ID) consuming
> from multiple topics ?
> => Depends on your use case, no good or bad.
>
>
> Thank you.
> Luke
>
>
> On Wed, Apr 27, 2022 at 11:58 PM 杨宝栓 <ya...@rcrai.com> wrote:
>
> >
> >
> > HI：
> >  We are seeing unexpected rebalances in golang consumers, described
> below.
> >         1. We have a topic with 36 partitions，and one consumer (lets name
> > it consumer1) consuming it.
> >         2. Run kafka  in Docker and configuration：We use defaults
> >         3. Consumer  consuming data is slowly about  cost 1s for  one
> > piece of data
> >         4. All the consumers for topic A are in the same group
> >        5. The rebalances are intermittent and hard to reproduce.  We see
> > no obvious errors in the logs.
> >         6.  No matter how to change  the configuration that affects
> > rebalance ，it always rebalance
> >                 The configuration that affects rebalance below:
> >  max.poll.interval.ms
> >                 max.poll.records
> >                 request.timeout.ms
> >                 session.timeout.ms
> >        As far as I am concerned,  conditions of  the rebalance contains：
> >           a consumer is considered DEAD by the group coordinator.
> >               A.  when the consumer is busy, which means that no
> > heartbeats has been sent in the meanwhile by the consumer to the group
> > coordinator within the configured session interval
> >               B.  when the consumer is slowly with a long-running
> > processing, which means that  interval of  poll() is too long  within the
> > configured max.poll.interval.ms。
> >   question：
> >        1. is it  correctly the idea about  B ?   i think it maybe the
> > major factor for rebalance in my case  ,because  consuming data is
> slowly.
> >        2. i have do an experiment  for  assert  B. but not reproduce
> >          A：consumer fast or slow  can't trigger it
> >        3. Any tips on APIs or debug techniques to figure out rebalance
> > causes
> >   4. How can I trigger it manually ?
> >        5. Is it a bad idea to have the same Consumer Group (Same ID)
> > consuming from multiple topics ?
> >
> >
> >
> >
> >
> >
>

Re: Unexpected Rebalances ，Any tips on APIs or debug techniques to figure out rebalance causes?

Posted by Luke Chen <sh...@gmail.com>.

1. is it  correctly the idea about  B ?   i think it maybe the major factor
for rebalance in my case  ,because  consuming data is slowly.
=> Looks like so, but we cannot confirm that because we don't have other
information. You should check the consumer log to see why the consumer
leave group.

2. i have do an experiment  for  assert  B. but not reproduce
         A：consumer fast or slow  can't trigger it
You should also adjust the heartbeat interval to allow the heartbeat detect
the poll expiration.
You can refer to this test:
https://github.com/apache/kafka/blob/trunk/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L167

3. Any tips on APIs or debug techniques to figure out rebalance causes
On server side, you can check log like this:
"Preparing to rebalance group xxx ... (reason: yyyyy)



 4. How can I trigger it manually ?
=> same as question 2

 5. Is it a bad idea to have the same Consumer Group (Same ID) consuming
from multiple topics ?
=> Depends on your use case, no good or bad.


Thank you.
Luke


On Wed, Apr 27, 2022 at 11:58 PM 杨宝栓 <ya...@rcrai.com> wrote:

>
>
> HI：
>  We are seeing unexpected rebalances in golang consumers, described below.
>         1. We have a topic with 36 partitions，and one consumer (lets name
> it consumer1) consuming it.
>         2. Run kafka  in Docker and configuration：We use defaults
>         3. Consumer  consuming data is slowly about  cost 1s for  one
> piece of data
>         4. All the consumers for topic A are in the same group
>        5. The rebalances are intermittent and hard to reproduce.  We see
> no obvious errors in the logs.
>         6.  No matter how to change  the configuration that affects
> rebalance ，it always rebalance
>                 The configuration that affects rebalance below:
>  max.poll.interval.ms
>                 max.poll.records
>                 request.timeout.ms
>                 session.timeout.ms
>        As far as I am concerned,  conditions of  the rebalance contains：
>           a consumer is considered DEAD by the group coordinator.
>               A.  when the consumer is busy, which means that no
> heartbeats has been sent in the meanwhile by the consumer to the group
> coordinator within the configured session interval
>               B.  when the consumer is slowly with a long-running
> processing, which means that  interval of  poll() is too long  within the
> configured max.poll.interval.ms。
>   question：
>        1. is it  correctly the idea about  B ?   i think it maybe the
> major factor for rebalance in my case  ,because  consuming data is slowly.
>        2. i have do an experiment  for  assert  B. but not reproduce
>          A：consumer fast or slow  can't trigger it
>        3. Any tips on APIs or debug techniques to figure out rebalance
> causes
>   4. How can I trigger it manually ?
>        5. Is it a bad idea to have the same Consumer Group (Same ID)
> consuming from multiple topics ?
>
>
>
>
>
>