You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (JIRA)" <ji...@apache.org> on 2014/08/16 00:45:19 UTC

[jira] [Commented] (KAFKA-687) Rebalance algorithm should consider partitions from all topics

    [ https://issues.apache.org/jira/browse/KAFKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099304#comment-14099304 ] 

Joel Koshy commented on KAFKA-687:
----------------------------------

[~junrao] I was thinking over this a little more and I felt it is better not to design the new consumer's partition allocator API in this jira. There are a couple of reasons:
* The new consumer's allocator's interface requirements and desired implementations will be known precisely only when we get to it - i.e., when we are implementing the partition assignment in the new consumer. So we will most likely change it anyway when we implement the new consumer.
* The allocation code is not very complicated anyway so I don't think it is a lot of work to rewrite it in the new consumer implementation.
* With the "more general" API that we discussed, the range allocation can no longer an exact copy (unlike the original patch). I would prefer to avoid touching the range-partitioner in the existing consumer at this point since that is the default that most people use.

So what I would propose is the following: keep the partition allocation interface as in the original patch and provide only one more allocation implementation: roundrobin. This allocation scheme is legal only when using wildcards on all consumer instances and all the regexes are identical (although stream counts can be different).


> Rebalance algorithm should consider partitions from all topics
> --------------------------------------------------------------
>
>                 Key: KAFKA-687
>                 URL: https://issues.apache.org/jira/browse/KAFKA-687
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Pablo Barrera
>            Assignee: Joel Koshy
>         Attachments: KAFKA-687.patch, KAFKA-687_2014-07-18_15:55:15.patch
>
>
> The current rebalance step, as stated in the original Kafka paper [1], splits the partitions per topic between all the consumers. So if you have 100 topics with 2 partitions each and 10 consumers only two consumers will be used. That is, for each topic all partitions will be listed and shared between the consumers in the consumer group in order (not randomly).
> If the consumer group is reading from several topics at the same time it makes sense to split all the partitions from all topics between all the consumer. Following the example, we will have 200 partitions in total, 20 per consumer, using the 10 consumers.
> The load per topic could be different and the division should consider this. However even a random division should be better than the current algorithm while reading from several topics and should harm reading from a few topics with several partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)