You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by shalom sagges <sh...@gmail.com> on 2019/03/20 22:43:47 UTC

Partition Count Dilemma

Hi All,

I'm really new to Kafka and wanted to know if anyone can help me better
understand partition count in relation to the Kafka cluster (apologies in
advance for noob questions).

I was requested to increase a topic's partition count from 30 to 100 in
order to increase workers' parallelism (there are already other topics in
this cluster with 100-200 partition counts per topic).
The cluster is built of 4 physical servers. Each server has 132 GB RAM, 40
CPU cores, 6 SAS disks 1.1 TB each.

Is PartitionCount:100 considered a high number of partitions per topic in
relation to the cluster?
Is there a good way for me to predetermine what an optimal partition count
might be?

Thanks a lot!

Re: Partition Count Dilemma

Posted by "1095193290@qq.com" <10...@qq.com>.
Hi,
The number of partitions drives the parailism of consumers. In general, the more partitions, the more parallel consumer can be added , the more throughput can be provided. In other words, if you have 10 partitions, the most number of consumer is 10.  So you need to assume the  throughput a consumer can provide is C, and the target throughput is T. Then the minimum number of partitions, that is, the number of consumers,  is T/C.



1095193290@qq.com
 
From: shalom sagges
Date: 2019-03-21 06:43
To: users
Subject: Partition Count Dilemma
Hi All,
 
I'm really new to Kafka and wanted to know if anyone can help me better
understand partition count in relation to the Kafka cluster (apologies in
advance for noob questions).
 
I was requested to increase a topic's partition count from 30 to 100 in
order to increase workers' parallelism (there are already other topics in
this cluster with 100-200 partition counts per topic).
The cluster is built of 4 physical servers. Each server has 132 GB RAM, 40
CPU cores, 6 SAS disks 1.1 TB each.
 
Is PartitionCount:100 considered a high number of partitions per topic in
relation to the cluster?
Is there a good way for me to predetermine what an optimal partition count
might be?
 
Thanks a lot!

Re: Partition Count Dilemma

Posted by Antony Stubbs <an...@confluent.io.INVALID>.
Hi there, sorry just checking the user list for "parallel consumer" - you
can use Confluent's Parallel Consumer client library to increase your
concurrency far beyond the number of partitions you have available.

For example in your case - you can process 1,000's of messages in parallel
without touching the partition count.

Have a look here:
https://www.confluent.io/confluent-accelerators/#parallel-consumer

Latest talk here:
https://www.confluent.io/events/kafka-summit-europe-2021/introducing-confluent-labs-parallel-consumer-client/

Github here: https://github.com/confluentinc/parallel-consumer

On Thu, Mar 21, 2019 at 9:18 AM Jérémy Thulliez <je...@zenika.com>
wrote:

> In your case,
>
> Increase the number of partitions on an existing topic will have an impact
> on how the message will be dispatched into partitions.
>
> I mean if you use the default partitionner,  and your message have key, the
> algorithm is 'hash(key) % number of partitions'
>
> That means that a message with the same key, won't go into the same
> partition, before and after the change.
>
> IMO, It was important to notice, in case you need ordering by key.
>
> Jeremy
>


-- 

<https://www.confluent.io>

Antony Stubbs


Follow us:  Blog
<https://confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog>
• Slack <https://slackpass.io/confluentcommunity> • Twitter
<https://twitter.com/ConfluentInc> • YouTube <https://youtube.com/confluent>

Re: Partition Count Dilemma

Posted by Jérémy Thulliez <je...@zenika.com>.
In your case,

Increase the number of partitions on an existing topic will have an impact
on how the message will be dispatched into partitions.

I mean if you use the default partitionner,  and your message have key, the
algorithm is 'hash(key) % number of partitions'

That means that a message with the same key, won't go into the same
partition, before and after the change.

IMO, It was important to notice, in case you need ordering by key.

Jeremy

Re: Partition Count Dilemma

Posted by shalom sagges <sh...@gmail.com>.
Thanks a lot guys! That's really helpful!

So now I understand that the partition count affects the number of
consumers I can efficiently use. Does partition count affect producers as
well?

Thanks!

On Thu, Mar 21, 2019 at 10:13 AM Vincent Maurin <vi...@gmail.com>
wrote:

> Hi
>
> 100 partitions is not a high number for this cluster.
> The downsides of having more partitions are :
> - having more file descriptors open, check that the limit for the user
> running kafka are high enough
> - more work to perform for the brokers and more memory used for keeping the
> metadata about the partitions (but 30 to 100 should be fine)
> - if the clean strategy has not changed, you will use more disk space
>
> So you have to consider these cons versus the benefit you get from the
> parallelism
>
> On Thu, Mar 21, 2019 at 3:59 AM 1095193290@qq.com <10...@qq.com>
> wrote:
>
> > Hi,
> > The number of partitions drives the parailism of consumers. In general,
> > the more partitions, the more parallel consumer can be added , the more
> > throughput can be provided. In other words, if you have 10 partitions,
> the
> > most number of consumer is 10.  So you need to assume the  throughput a
> > consumer can provide is C, and the target throughput is T. Then the
> minimum
> > number of partitions, that is, the number of consumers,  is T/C.
> >
> >
> >
> > 1095193290@qq.com
> >
> > From: shalom sagges
> > Date: 2019-03-21 06:43
> > To: users
> > Subject: Partition Count Dilemma
> > Hi All,
> >
> > I'm really new to Kafka and wanted to know if anyone can help me better
> > understand partition count in relation to the Kafka cluster (apologies in
> > advance for noob questions).
> >
> > I was requested to increase a topic's partition count from 30 to 100 in
> > order to increase workers' parallelism (there are already other topics in
> > this cluster with 100-200 partition counts per topic).
> > The cluster is built of 4 physical servers. Each server has 132 GB RAM,
> 40
> > CPU cores, 6 SAS disks 1.1 TB each.
> >
> > Is PartitionCount:100 considered a high number of partitions per topic in
> > relation to the cluster?
> > Is there a good way for me to predetermine what an optimal partition
> count
> > might be?
> >
> > Thanks a lot!
> >
>

Re: Partition Count Dilemma

Posted by Vincent Maurin <vi...@gmail.com>.
Hi

100 partitions is not a high number for this cluster.
The downsides of having more partitions are :
- having more file descriptors open, check that the limit for the user
running kafka are high enough
- more work to perform for the brokers and more memory used for keeping the
metadata about the partitions (but 30 to 100 should be fine)
- if the clean strategy has not changed, you will use more disk space

So you have to consider these cons versus the benefit you get from the
parallelism

On Thu, Mar 21, 2019 at 3:59 AM 1095193290@qq.com <10...@qq.com> wrote:

> Hi,
> The number of partitions drives the parailism of consumers. In general,
> the more partitions, the more parallel consumer can be added , the more
> throughput can be provided. In other words, if you have 10 partitions, the
> most number of consumer is 10.  So you need to assume the  throughput a
> consumer can provide is C, and the target throughput is T. Then the minimum
> number of partitions, that is, the number of consumers,  is T/C.
>
>
>
> 1095193290@qq.com
>
> From: shalom sagges
> Date: 2019-03-21 06:43
> To: users
> Subject: Partition Count Dilemma
> Hi All,
>
> I'm really new to Kafka and wanted to know if anyone can help me better
> understand partition count in relation to the Kafka cluster (apologies in
> advance for noob questions).
>
> I was requested to increase a topic's partition count from 30 to 100 in
> order to increase workers' parallelism (there are already other topics in
> this cluster with 100-200 partition counts per topic).
> The cluster is built of 4 physical servers. Each server has 132 GB RAM, 40
> CPU cores, 6 SAS disks 1.1 TB each.
>
> Is PartitionCount:100 considered a high number of partitions per topic in
> relation to the cluster?
> Is there a good way for me to predetermine what an optimal partition count
> might be?
>
> Thanks a lot!
>