You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by 刘明敏 <di...@gmail.com> on 2012/05/15 12:00:57 UTC

how many numbers of partitions is preferred?

We are considerring putting kafka into production.

One thing we are not sure about is how many partitions for a topic is
suitable.

I notice that in the operations page(
https://cwiki.apache.org/confluence/display/KAFKA/Operations#kafka),linkedin
choose just one partition:

kafka.num.partitions=1


Though have been explained in one discussing,I still don't quite get why
you choose only 1 partition:

Pierre-Yves Ritschard:
> one partition only ? so the key here is that you start as many brokers
> as there are consumers ?



>

Jay Kreps:
> Yeah technically that is not 100% correct. We have tuned about 10 topics by
> adding more partitions to add parallelism.


 "tuned about 10 topics by adding more partitions to add parallelism",does
this mean you dispatch
the same group of logs into 10 different topics,thus you get more
partitions(one partition for a topic and totally 10 topic,thus 10
partitions on one broker) on each broker,and thus
you add parallelism?

if yes,why not just increase the # of partitions of one certain topic?

and how many partitions would you advise to assign for a topic?

-- 
Best Regards

----------------------
刘明敏 | mmLiu

Re: how many numbers of partitions is preferred?

Posted by Jay Kreps <ja...@gmail.com>.
Basically the total number of partitions across all brokers determines
the maximum parallelism of the consumer group. So if you want to have
12 consumer processes then any number larger than 12 will work. That
said, fewer files generally gives better I/O efficiency, and more than
say 10k files per machine is probably unwise.

At LinkedIn we find that most topics are just medium size so we
default to 1 partition, and bump it up later for the handful of very
large topics that need it.

-Jay

On Tue, May 15, 2012 at 3:00 AM, 刘明敏 <di...@gmail.com> wrote:
> We are considerring putting kafka into production.
>
> One thing we are not sure about is how many partitions for a topic is
> suitable.
>
> I notice that in the operations page(
> https://cwiki.apache.org/confluence/display/KAFKA/Operations#kafka),linkedin
> choose just one partition:
>
> kafka.num.partitions=1
>
>
> Though have been explained in one discussing,I still don't quite get why
> you choose only 1 partition:
>
> Pierre-Yves Ritschard:
>> one partition only ? so the key here is that you start as many brokers
>> as there are consumers ?
>
>
>
>>
>
> Jay Kreps:
>> Yeah technically that is not 100% correct. We have tuned about 10 topics by
>> adding more partitions to add parallelism.
>
>
>  "tuned about 10 topics by adding more partitions to add parallelism",does
> this mean you dispatch
> the same group of logs into 10 different topics,thus you get more
> partitions(one partition for a topic and totally 10 topic,thus 10
> partitions on one broker) on each broker,and thus
> you add parallelism?
>
> if yes,why not just increase the # of partitions of one certain topic?
>
> and how many partitions would you advise to assign for a topic?
>
> --
> Best Regards
>
> ----------------------
> 刘明敏 | mmLiu