You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by BYEONG-GI KIM <bg...@bluedigm.com> on 2016/03/02 01:11:32 UTC

About the number of partitions

Hello.

I have questions about how many partitions are optimal while using kafka.
As far as I know, even if there are multiple consumers that belong to a
consumer group, say *group_A*, only one consumer can receive a kafka
message produced by a producer if there is a partition. So, as a result,
multiple partitions are required in order to distribute the message to all
the consumers in group_A if I want the consumers to get the message.

Is it right?

I'm considering developing several kafka consumer applications, e.g.,
message saver, message analyzer, etc., so a message from a producer must be
consumed by those kinds of consumers.

Any advice and help would be really appreciated.

Thanks in advance!

Best regards

Kim

Re: About the number of partitions

Posted by BYEONG-GI KIM <bg...@bluedigm.com>.
Dear Jens

Thank you for the reply!

It's really hard to decide how many brokers/partitions are optimal for a
system. Is there any good reports or documents about that? I'd like to know
some examples related to the optimization, especially on product level
environment.

Thank you in advance.

Best regards

Kim

2016-03-02 21:19 GMT+09:00 Jens Rantil <je...@tink.se>:

> Hi Kim,
>
> You are correct in that the number of partitions sets the upper limit on
> consumer parallelization. That is, a single consumer in a group can consume
> multiple partitions, however multiple consumers in a group can't consume a
> single partition.
>
> Also, since partitions are spread across your brokers, really it's the
> ratio nPartitions/nBrokers that you want to optimize for.
>
> Given the above parallelization limit, it would make sense to have a very
> large ratio. This would have other implications:
>
>    - Your brokers will have a lot of smaller files to they will have to
>    flush periodically. This can incur a lot of overhead and introduce
>    latencies. Especially on a spinning disk where seeks are expensive.
>    - Brokers are generally set to rotate their logs at a certain size. It
>    could be hard to tune rotation with many small files.
>
> Given this, really you need to benchmark for your use case with your
> message sizes etc.
>
> Side-note: Note that for autoscaling you will have to overprovision your
> partitions somewhat to not hit the parallelization limit.
>
> Cheers,
> Jens
>
> On Wed, Mar 2, 2016 at 1:11 AM, BYEONG-GI KIM <bg...@bluedigm.com> wrote:
>
> > Hello.
> >
> > I have questions about how many partitions are optimal while using kafka.
> > As far as I know, even if there are multiple consumers that belong to a
> > consumer group, say *group_A*, only one consumer can receive a kafka
> > message produced by a producer if there is a partition. So, as a result,
> > multiple partitions are required in order to distribute the message to
> all
> > the consumers in group_A if I want the consumers to get the message.
> >
> > Is it right?
> >
> > I'm considering developing several kafka consumer applications, e.g.,
> > message saver, message analyzer, etc., so a message from a producer must
> be
> > consumed by those kinds of consumers.
> >
> > Any advice and help would be really appreciated.
> >
> > Thanks in advance!
> >
> > Best regards
> >
> > Kim
> >
>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.rantil@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> <
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> >
>  Twitter <https://twitter.com/tink>
>

Re: About the number of partitions

Posted by Jens Rantil <je...@tink.se>.
Hi Kim,

You are correct in that the number of partitions sets the upper limit on
consumer parallelization. That is, a single consumer in a group can consume
multiple partitions, however multiple consumers in a group can't consume a
single partition.

Also, since partitions are spread across your brokers, really it's the
ratio nPartitions/nBrokers that you want to optimize for.

Given the above parallelization limit, it would make sense to have a very
large ratio. This would have other implications:

   - Your brokers will have a lot of smaller files to they will have to
   flush periodically. This can incur a lot of overhead and introduce
   latencies. Especially on a spinning disk where seeks are expensive.
   - Brokers are generally set to rotate their logs at a certain size. It
   could be hard to tune rotation with many small files.

Given this, really you need to benchmark for your use case with your
message sizes etc.

Side-note: Note that for autoscaling you will have to overprovision your
partitions somewhat to not hit the parallelization limit.

Cheers,
Jens

On Wed, Mar 2, 2016 at 1:11 AM, BYEONG-GI KIM <bg...@bluedigm.com> wrote:

> Hello.
>
> I have questions about how many partitions are optimal while using kafka.
> As far as I know, even if there are multiple consumers that belong to a
> consumer group, say *group_A*, only one consumer can receive a kafka
> message produced by a producer if there is a partition. So, as a result,
> multiple partitions are required in order to distribute the message to all
> the consumers in group_A if I want the consumers to get the message.
>
> Is it right?
>
> I'm considering developing several kafka consumer applications, e.g.,
> message saver, message analyzer, etc., so a message from a producer must be
> consumed by those kinds of consumers.
>
> Any advice and help would be really appreciated.
>
> Thanks in advance!
>
> Best regards
>
> Kim
>



-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: About the number of partitions

Posted by BYEONG-GI KIM <bg...@bluedigm.com>.
Dear James,

Thank you for the information indeed!

That's very helpful for me to understand much more deeply about kafka.

Best regards

Kim

2016-03-03 3:29 GMT+09:00 James Cheng <jc...@tivo.com>:

> Kim,
>
> Here's a good blog post from Confluent with advice on how to choose the
> number of partitions.
>
>
> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
>
> -James
>
>
> > On Mar 1, 2016, at 4:11 PM, BYEONG-GI KIM <bg...@bluedigm.com> wrote:
> >
> > Hello.
> >
> > I have questions about how many partitions are optimal while using kafka.
> > As far as I know, even if there are multiple consumers that belong to a
> > consumer group, say *group_A*, only one consumer can receive a kafka
> > message produced by a producer if there is a partition. So, as a result,
> > multiple partitions are required in order to distribute the message to
> all
> > the consumers in group_A if I want the consumers to get the message.
> >
> > Is it right?
> >
> > I'm considering developing several kafka consumer applications, e.g.,
> > message saver, message analyzer, etc., so a message from a producer must
> be
> > consumed by those kinds of consumers.
> >
> > Any advice and help would be really appreciated.
> >
> > Thanks in advance!
> >
> > Best regards
> >
> > Kim
>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

Re: About the number of partitions

Posted by James Cheng <jc...@tivo.com>.
Kim,

Here's a good blog post from Confluent with advice on how to choose the number of partitions.

http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/

-James


> On Mar 1, 2016, at 4:11 PM, BYEONG-GI KIM <bg...@bluedigm.com> wrote:
>
> Hello.
>
> I have questions about how many partitions are optimal while using kafka.
> As far as I know, even if there are multiple consumers that belong to a
> consumer group, say *group_A*, only one consumer can receive a kafka
> message produced by a producer if there is a partition. So, as a result,
> multiple partitions are required in order to distribute the message to all
> the consumers in group_A if I want the consumers to get the message.
>
> Is it right?
>
> I'm considering developing several kafka consumer applications, e.g.,
> message saver, message analyzer, etc., so a message from a producer must be
> consumed by those kinds of consumers.
>
> Any advice and help would be really appreciated.
>
> Thanks in advance!
>
> Best regards
>
> Kim


________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.