You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by dinesh kumar <di...@gmail.com> on 2014/11/05 13:50:34 UTC

How costly is Re balancing of partitions for a topic

Hello,

I am trying to come up with a design for consuming from Kafka.  *I am using
0.8.1.1 version of Kafka. *I am thinking of designing a system where the
consumer will be created every few seconds, consume the data from Kafka,
process it and then quits after committing the offsets to Kafka. At any
point of time expect 250 - 300 consumers to be active (running as
ThreadPools in different machines).

1. How and When a rebalance of partition happens?

2. How costly is the rebalancing of partitions among the consumers. I am
expecting a new consumer finishing up or joining every few seconds to the
same consumer group. So I just want to know the overhead and latency of a
rebalancing operation.

3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is
processing a message M1 from Partition P1. Now Consumer C2 joins the
group.  How is the partitions divided between C1 and C2. Is there a
possibility where C1's (which might take some time to commit its message to
Kafka) commit for M1 will be rejected and M1 will be treated as a fresh
message and will be delivered to someone else (I know Kafka is at least
once delivery model but wanted to confirm if the re partition by any chance
cause a re delivery of the same message)?


Thanks,
Dinesh

Re: How costly is Re balancing of partitions for a topic

Posted by Guozhang Wang <wa...@gmail.com>.
1. Since each time a consumer group changes a rebalance among all the
consumer members is triggered, it is usually recommend to have long lived
consumers rather than short ones. However, in the new consumer we are
working on optimizing the rebalance logic and remove its ZK dependency, so
in the new consumer (coming next spring) short lived consumers coming and
going should also be OK.

2. I was not correct before, it should be #. total partitions rather than
#. topics, since your scenario has 500 partitions it may still result in
high latency in the current consumer.

3. Once the message is returned from the iterator it is considered
"consumed", i.e. the offset increment itself. If auto offset commit is
turned on (by default), then before rebalance happens it will force a
commit and hence that offset will be written to ZK and this message will
not be exposed to others again.

4. If auto commit is turned off, and manual commit gets delayed somehow,
rebalance will cause some duplicates.

Guozhang


On Wed, Nov 5, 2014 at 8:57 AM, dinesh kumar <di...@gmail.com> wrote:

> Thanks for the answers. Have some follow up questions.
>
> Let me get a bit more specific.
>
> In a scenario of 1 topic with 400 - 500 partitions
>
> 1. Is it ok to have short lived consumer? Or it is recommended to have only
> long running consumers?
>
> 2. You mentioned that rebalance latency depends on # of consumers and #
> number of topics. In the case of 1 topic and hundred of consumers can say
> the latency is in the tens of seconds as you mentioned before?
>
> 3. You mentioned
>
>
> "Rebalance algorithm is deterministic (range-based), and before it kicks
> in consumers will first commit their current offset and stop fetchers,
> hence when M1 is already fetched by some consumer C1 before rebalance it
> will not be re-send to another C2 after the rebalance."
>
>
> Say a consumer fetches a message and does some processing with it for 5
> minutes and then commits the offset, if the rebalancing waits for all the
> consumers to commit offsets will it wait for 5 minutes? Or is there a
> timeout here?
>
> If the consumer does not commit after 5 minutes due to some exception what
> will happen?
>
>
> Thanks,
> Dinesh
>
>
> On Wed, Nov 5, 2014 at 10:22 PM, dinesh kumar <di...@gmail.com> wrote:
>
> > Thanks for the answers. Have some follow up questions.
> >
> > Let me get a bit more specific.
> >
> > In a scenario of 1 topic with 400 - 500 partitions
> >
> > 1. Is it ok to have short lived consumer? Or it is recommended to have
> > only long running consumers?
> >
> > 2. You mentioned that rebalance latency depends on # of consumers and #
> > number of topics. In the case of 1 topic and hundred of consumers can say
> > the latency is in the tens of seconds as you mentioned before?
> >
> > 3. You mentioned
> >
> > On Wed, Nov 5, 2014 at 10:03 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> Hello Dinesh,
> >>
> >> 1. A rebalance is triggered when the consumers is notified or the group
> >> member change / topic-partition change through ZK.
> >>
> >> 2. The cost of a rebalance is positively related to the #. consumers in
> >> the
> >> group and the #. of topics this group is consuming. The latency of the
> >> rebalance can be as high as tens of seconds when you have large number
> of
> >> consumers fetching from a large number of topics.
> >>
> >> 3. Rebalance algorithm is deterministic (range-based), and before it
> kicks
> >> in consumers will first commit their current offset and stop fetchers,
> >> hence when M1 is already fetched by some consumer C1 before rebalance it
> >> will not be re-send to another C2 after the rebalance.
> >>
> >> You can also read some faqs here:
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebalance
> >> ?
> >>
> >> And in 0.9, we will release our new consumer client, which will reduce
> >> rebalance latency compared to the current consumer.
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> >>
> >>
> >> Guozhang
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 5, 2014 at 4:50 AM, dinesh kumar <di...@gmail.com>
> wrote:
> >>
> >> > Hello,
> >> >
> >> > I am trying to come up with a design for consuming from Kafka.  *I am
> >> using
> >> > 0.8.1.1 version of Kafka. *I am thinking of designing a system where
> the
> >> > consumer will be created every few seconds, consume the data from
> Kafka,
> >> > process it and then quits after committing the offsets to Kafka. At
> any
> >> > point of time expect 250 - 300 consumers to be active (running as
> >> > ThreadPools in different machines).
> >> >
> >> > 1. How and When a rebalance of partition happens?
> >> >
> >> > 2. How costly is the rebalancing of partitions among the consumers. I
> am
> >> > expecting a new consumer finishing up or joining every few seconds to
> >> the
> >> > same consumer group. So I just want to know the overhead and latency
> of
> >> a
> >> > rebalancing operation.
> >> >
> >> > 3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is
> >> > processing a message M1 from Partition P1. Now Consumer C2 joins the
> >> > group.  How is the partitions divided between C1 and C2. Is there a
> >> > possibility where C1's (which might take some time to commit its
> >> message to
> >> > Kafka) commit for M1 will be rejected and M1 will be treated as a
> fresh
> >> > message and will be delivered to someone else (I know Kafka is at
> least
> >> > once delivery model but wanted to confirm if the re partition by any
> >> chance
> >> > cause a re delivery of the same message)?
> >> >
> >> >
> >> > Thanks,
> >> > Dinesh
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
>



-- 
-- Guozhang

Re: How costly is Re balancing of partitions for a topic

Posted by dinesh kumar <di...@gmail.com>.
Thanks for the answers. Have some follow up questions.

Let me get a bit more specific.

In a scenario of 1 topic with 400 - 500 partitions

1. Is it ok to have short lived consumer? Or it is recommended to have only
long running consumers?

2. You mentioned that rebalance latency depends on # of consumers and #
number of topics. In the case of 1 topic and hundred of consumers can say
the latency is in the tens of seconds as you mentioned before?

3. You mentioned


"Rebalance algorithm is deterministic (range-based), and before it kicks
in consumers will first commit their current offset and stop fetchers,
hence when M1 is already fetched by some consumer C1 before rebalance it
will not be re-send to another C2 after the rebalance."


Say a consumer fetches a message and does some processing with it for 5
minutes and then commits the offset, if the rebalancing waits for all the
consumers to commit offsets will it wait for 5 minutes? Or is there a
timeout here?

If the consumer does not commit after 5 minutes due to some exception what
will happen?


Thanks,
Dinesh


On Wed, Nov 5, 2014 at 10:22 PM, dinesh kumar <di...@gmail.com> wrote:

> Thanks for the answers. Have some follow up questions.
>
> Let me get a bit more specific.
>
> In a scenario of 1 topic with 400 - 500 partitions
>
> 1. Is it ok to have short lived consumer? Or it is recommended to have
> only long running consumers?
>
> 2. You mentioned that rebalance latency depends on # of consumers and #
> number of topics. In the case of 1 topic and hundred of consumers can say
> the latency is in the tens of seconds as you mentioned before?
>
> 3. You mentioned
>
> On Wed, Nov 5, 2014 at 10:03 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
>> Hello Dinesh,
>>
>> 1. A rebalance is triggered when the consumers is notified or the group
>> member change / topic-partition change through ZK.
>>
>> 2. The cost of a rebalance is positively related to the #. consumers in
>> the
>> group and the #. of topics this group is consuming. The latency of the
>> rebalance can be as high as tens of seconds when you have large number of
>> consumers fetching from a large number of topics.
>>
>> 3. Rebalance algorithm is deterministic (range-based), and before it kicks
>> in consumers will first commit their current offset and stop fetchers,
>> hence when M1 is already fetched by some consumer C1 before rebalance it
>> will not be re-send to another C2 after the rebalance.
>>
>> You can also read some faqs here:
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebalance
>> ?
>>
>> And in 0.9, we will release our new consumer client, which will reduce
>> rebalance latency compared to the current consumer.
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>>
>>
>> Guozhang
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 5, 2014 at 4:50 AM, dinesh kumar <di...@gmail.com> wrote:
>>
>> > Hello,
>> >
>> > I am trying to come up with a design for consuming from Kafka.  *I am
>> using
>> > 0.8.1.1 version of Kafka. *I am thinking of designing a system where the
>> > consumer will be created every few seconds, consume the data from Kafka,
>> > process it and then quits after committing the offsets to Kafka. At any
>> > point of time expect 250 - 300 consumers to be active (running as
>> > ThreadPools in different machines).
>> >
>> > 1. How and When a rebalance of partition happens?
>> >
>> > 2. How costly is the rebalancing of partitions among the consumers. I am
>> > expecting a new consumer finishing up or joining every few seconds to
>> the
>> > same consumer group. So I just want to know the overhead and latency of
>> a
>> > rebalancing operation.
>> >
>> > 3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is
>> > processing a message M1 from Partition P1. Now Consumer C2 joins the
>> > group.  How is the partitions divided between C1 and C2. Is there a
>> > possibility where C1's (which might take some time to commit its
>> message to
>> > Kafka) commit for M1 will be rejected and M1 will be treated as a fresh
>> > message and will be delivered to someone else (I know Kafka is at least
>> > once delivery model but wanted to confirm if the re partition by any
>> chance
>> > cause a re delivery of the same message)?
>> >
>> >
>> > Thanks,
>> > Dinesh
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>
>

Re: How costly is Re balancing of partitions for a topic

Posted by dinesh kumar <di...@gmail.com>.
Thanks for the answers. Have some follow up questions.

Let me get a bit more specific.

In a scenario of 1 topic with 400 - 500 partitions

1. Is it ok to have short lived consumer? Or it is recommended to have only
long running consumers?

2. You mentioned that rebalance latency depends on # of consumers and #
number of topics. In the case of 1 topic and hundred of consumers can say
the latency is in the tens of seconds as you mentioned before?

3. You mentioned

On Wed, Nov 5, 2014 at 10:03 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Dinesh,
>
> 1. A rebalance is triggered when the consumers is notified or the group
> member change / topic-partition change through ZK.
>
> 2. The cost of a rebalance is positively related to the #. consumers in the
> group and the #. of topics this group is consuming. The latency of the
> rebalance can be as high as tens of seconds when you have large number of
> consumers fetching from a large number of topics.
>
> 3. Rebalance algorithm is deterministic (range-based), and before it kicks
> in consumers will first commit their current offset and stop fetchers,
> hence when M1 is already fetched by some consumer C1 before rebalance it
> will not be re-send to another C2 after the rebalance.
>
> You can also read some faqs here:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebalance
> ?
>
> And in 0.9, we will release our new consumer client, which will reduce
> rebalance latency compared to the current consumer.
>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>
>
> Guozhang
>
>
>
>
>
>
> On Wed, Nov 5, 2014 at 4:50 AM, dinesh kumar <di...@gmail.com> wrote:
>
> > Hello,
> >
> > I am trying to come up with a design for consuming from Kafka.  *I am
> using
> > 0.8.1.1 version of Kafka. *I am thinking of designing a system where the
> > consumer will be created every few seconds, consume the data from Kafka,
> > process it and then quits after committing the offsets to Kafka. At any
> > point of time expect 250 - 300 consumers to be active (running as
> > ThreadPools in different machines).
> >
> > 1. How and When a rebalance of partition happens?
> >
> > 2. How costly is the rebalancing of partitions among the consumers. I am
> > expecting a new consumer finishing up or joining every few seconds to the
> > same consumer group. So I just want to know the overhead and latency of a
> > rebalancing operation.
> >
> > 3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is
> > processing a message M1 from Partition P1. Now Consumer C2 joins the
> > group.  How is the partitions divided between C1 and C2. Is there a
> > possibility where C1's (which might take some time to commit its message
> to
> > Kafka) commit for M1 will be rejected and M1 will be treated as a fresh
> > message and will be delivered to someone else (I know Kafka is at least
> > once delivery model but wanted to confirm if the re partition by any
> chance
> > cause a re delivery of the same message)?
> >
> >
> > Thanks,
> > Dinesh
> >
>
>
>
> --
> -- Guozhang
>

Re: How costly is Re balancing of partitions for a topic

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Dinesh,

1. A rebalance is triggered when the consumers is notified or the group
member change / topic-partition change through ZK.

2. The cost of a rebalance is positively related to the #. consumers in the
group and the #. of topics this group is consuming. The latency of the
rebalance can be as high as tens of seconds when you have large number of
consumers fetching from a large number of topics.

3. Rebalance algorithm is deterministic (range-based), and before it kicks
in consumers will first commit their current offset and stop fetchers,
hence when M1 is already fetched by some consumer C1 before rebalance it
will not be re-send to another C2 after the rebalance.

You can also read some faqs here:

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebalance
?

And in 0.9, we will release our new consumer client, which will reduce
rebalance latency compared to the current consumer.

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design


Guozhang






On Wed, Nov 5, 2014 at 4:50 AM, dinesh kumar <di...@gmail.com> wrote:

> Hello,
>
> I am trying to come up with a design for consuming from Kafka.  *I am using
> 0.8.1.1 version of Kafka. *I am thinking of designing a system where the
> consumer will be created every few seconds, consume the data from Kafka,
> process it and then quits after committing the offsets to Kafka. At any
> point of time expect 250 - 300 consumers to be active (running as
> ThreadPools in different machines).
>
> 1. How and When a rebalance of partition happens?
>
> 2. How costly is the rebalancing of partitions among the consumers. I am
> expecting a new consumer finishing up or joining every few seconds to the
> same consumer group. So I just want to know the overhead and latency of a
> rebalancing operation.
>
> 3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is
> processing a message M1 from Partition P1. Now Consumer C2 joins the
> group.  How is the partitions divided between C1 and C2. Is there a
> possibility where C1's (which might take some time to commit its message to
> Kafka) commit for M1 will be rejected and M1 will be treated as a fresh
> message and will be delivered to someone else (I know Kafka is at least
> once delivery model but wanted to confirm if the re partition by any chance
> cause a re delivery of the same message)?
>
>
> Thanks,
> Dinesh
>



-- 
-- Guozhang