You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by François Méthot <fm...@gmail.com> on 2015/04/07 19:13:27 UTC

Number of Partitions and Performance

Hi,

  We initially had configured our topics to have between 8 to 16 partitions
each on a cluster of 10 brokers (vm with 2 cores, 16 MB ram, Few TB of SAN
Disk).

Then I came across the rule of thump formula *100 x b x r.*
(
http://blog.confluent.io/2015/03/12/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
)

100 x 10 brokers x 2 Replication = 2000 partitions.

We gave it try and but our single threaded kafka producer performance
dropped by 80%.

What is the benefits of having that much partitions?

Is there any problem in the long run with using a topic with as few as 16
partitions?


Francois

Re: Number of Partitions and Performance

Posted by François Méthot <fm...@gmail.com>.
Thanks guys for the clarification about the "rule of thumb formula",

I will stick with a reasonably small set of partitions  but add a few
to make them a multiple of the number of brokers.

Todd, I read your post yesterday as well, very helpful.

On Tue, Apr 7, 2015 at 1:42 PM, Todd Palino <tp...@gmail.com> wrote:

> Going to stand with Jay here :)
>
> I just posted an email yesterday about how we size clusters and topics.
> Basically, have at least as many partitions as you have consumers in your
> consumer group (preferably a multiple). If you want to balance it across
> the cluster, also have it be a multiple of the number of brokers you have.
> We tend to ignore the second one on most clusters, but we will expand a
> topic (as long as it is not keyed) if the retention on disk exceeds 50 GB.
> That's just a guideline we have so it's easier to balance the traffic and
> move partitions around when needed.
>
> -Todd
>
>
> On Tue, Apr 7, 2015 at 10:28 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > I think the blog post was giving that as an upper bound not a recommended
> > size. I think that blog goes through some of the trade offs of having
> more
> > or fewer partitions.
> >
> > -Jay
> >
> > On Tue, Apr 7, 2015 at 10:13 AM, François Méthot <fm...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > >   We initially had configured our topics to have between 8 to 16
> > partitions
> > > each on a cluster of 10 brokers (vm with 2 cores, 16 MB ram, Few TB of
> > SAN
> > > Disk).
> > >
> > > Then I came across the rule of thump formula *100 x b x r.*
> > > (
> > >
> > >
> >
> http://blog.confluent.io/2015/03/12/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
> > > )
> > >
> > > 100 x 10 brokers x 2 Replication = 2000 partitions.
> > >
> > > We gave it try and but our single threaded kafka producer performance
> > > dropped by 80%.
> > >
> > > What is the benefits of having that much partitions?
> > >
> > > Is there any problem in the long run with using a topic with as few as
> 16
> > > partitions?
> > >
> > >
> > > Francois
> > >
> >
>

Re: Number of Partitions and Performance

Posted by Todd Palino <tp...@gmail.com>.
Going to stand with Jay here :)

I just posted an email yesterday about how we size clusters and topics.
Basically, have at least as many partitions as you have consumers in your
consumer group (preferably a multiple). If you want to balance it across
the cluster, also have it be a multiple of the number of brokers you have.
We tend to ignore the second one on most clusters, but we will expand a
topic (as long as it is not keyed) if the retention on disk exceeds 50 GB.
That's just a guideline we have so it's easier to balance the traffic and
move partitions around when needed.

-Todd


On Tue, Apr 7, 2015 at 10:28 AM, Jay Kreps <ja...@gmail.com> wrote:

> I think the blog post was giving that as an upper bound not a recommended
> size. I think that blog goes through some of the trade offs of having more
> or fewer partitions.
>
> -Jay
>
> On Tue, Apr 7, 2015 at 10:13 AM, François Méthot <fm...@gmail.com>
> wrote:
>
> > Hi,
> >
> >   We initially had configured our topics to have between 8 to 16
> partitions
> > each on a cluster of 10 brokers (vm with 2 cores, 16 MB ram, Few TB of
> SAN
> > Disk).
> >
> > Then I came across the rule of thump formula *100 x b x r.*
> > (
> >
> >
> http://blog.confluent.io/2015/03/12/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
> > )
> >
> > 100 x 10 brokers x 2 Replication = 2000 partitions.
> >
> > We gave it try and but our single threaded kafka producer performance
> > dropped by 80%.
> >
> > What is the benefits of having that much partitions?
> >
> > Is there any problem in the long run with using a topic with as few as 16
> > partitions?
> >
> >
> > Francois
> >
>

Re: Number of Partitions and Performance

Posted by Jay Kreps <ja...@gmail.com>.
I think the blog post was giving that as an upper bound not a recommended
size. I think that blog goes through some of the trade offs of having more
or fewer partitions.

-Jay

On Tue, Apr 7, 2015 at 10:13 AM, François Méthot <fm...@gmail.com>
wrote:

> Hi,
>
>   We initially had configured our topics to have between 8 to 16 partitions
> each on a cluster of 10 brokers (vm with 2 cores, 16 MB ram, Few TB of SAN
> Disk).
>
> Then I came across the rule of thump formula *100 x b x r.*
> (
>
> http://blog.confluent.io/2015/03/12/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
> )
>
> 100 x 10 brokers x 2 Replication = 2000 partitions.
>
> We gave it try and but our single threaded kafka producer performance
> dropped by 80%.
>
> What is the benefits of having that much partitions?
>
> Is there any problem in the long run with using a topic with as few as 16
> partitions?
>
>
> Francois
>