You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Yonghui Zhao <zh...@gmail.com> on 2014/03/10 18:23:25 UTC

How kafka assign partition to stream

In my environment,  I have 2 brokers and only 1 topic,  each brokers has 10
partitions,
so there are 20 partitions in total.
I have 4 consumers in one consumer group,  each consumer use
createMessageStreams to create 10 streams, 40 streams in total.

Since partition can not be split,  so there are 20 streams are empty
streams.

We found 2 machines have 10 valid streams, but the other 2 have no valid
streams.
Is it expected?  Maybe each consumer has 5 valid streams is better for load
balance.

And I want to know if  one of the machines which have 10 valid streams is
dead,  will the the partitions be balanced to other machine dynamically?
 i.e. some empty stream in other machines will get data now.

Re: How kafka assign partition to stream

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Yonghui,

In 0.8 the load balance logic in consumers is based on range partitioning
with consumer-id as {consumer-1-stream-1, consumer-1-stream-2, ...
consumer-1-stream-10, consumer-2-stream-1, ...} and partitions are assigned
to this list in round robin. So yes, this behavior is expected. If you are
going to live with 20 partitions in the near future I would recommend you
to reduce the number of partitions per machine to 5.

To your second question the answer is Yes, the rebalance logic will be
triggered again (assuming consumer 1 is dead) but this time with consumer
list {consumer-2-stream-1, ...}, so for your case if just say the first one
is dead only the third one will get data, the forth one may still get zero
data.

Guozhang


On Mon, Mar 10, 2014 at 10:23 AM, Yonghui Zhao <zh...@gmail.com>wrote:

> In my environment,  I have 2 brokers and only 1 topic,  each brokers has 10
> partitions,
> so there are 20 partitions in total.
> I have 4 consumers in one consumer group,  each consumer use
> createMessageStreams to create 10 streams, 40 streams in total.
>
> Since partition can not be split,  so there are 20 streams are empty
> streams.
>
> We found 2 machines have 10 valid streams, but the other 2 have no valid
> streams.
> Is it expected?  Maybe each consumer has 5 valid streams is better for load
> balance.
>
> And I want to know if  one of the machines which have 10 valid streams is
> dead,  will the the partitions be balanced to other machine dynamically?
>  i.e. some empty stream in other machines will get data now.
>



-- 
-- Guozhang