You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Dotan Patrich <do...@fortscale.com> on 2015/02/03 07:00:37 UTC

defining topic partitions

Hi,

We're using Samza with Kafka and we would like to use multiple partitions
in our topics.

We've noticed that the number of partitions is defined in server.properties.
According to the Kafka documentation
<http://kafka.apache.org/07/configuration.html>, there are 2 options to
defined the number of partitions:

   - num.partitions - Specifies the default number of partitions per topic.
   - topic.partition.count.map - Override parameter to control the number
   of partitions for selected topics. E.g., topic1:10,topic2:20

We thought about using the first option (num.partitions) in order to avoid
the overhead of adding every new topic to a map. Our concern is how it
affects the metrics and checkpoint topics.

Can anyone share what is the best practice for using multiple partitions in
Kafka?


Thanks,
Dotan

Re: defining topic partitions

Posted by Chris Riccomini <cr...@apache.org>.
Hey Dotan,

Another way to create topics with specific partitions is to use the
bin/kafka-topics.sh tool in Kafka's binary distribution. This allows you to
create topics with a specific count.

> Our concern is how it affects the metrics and checkpoint topics.

Good question. In 0.7.0, the checkpoint topic must have the same number of
partitions as the input topic(s). Samza will use the max(partition count)
when multiple input topics exist (e.g. input topics with 4 and 8 partitions
would result in an 8 partition checkpoint topic). Samza will automatically
create a checkpoint topic of the appropriate size when the job first
executes.

If the input topic(s)'s partition count changes after the job has been
started, the checkpoint topic will have to be resized. In 0.8.0, the
checkpoint topic is a single partition, and this problem goes away.

Keep in mind that resizing a topic (checkpoint, or otherwise) if you're
depending on keyed messages has a big impact, since it will change the
partition that a key gets mapped to (key.hashCode % partitionCount =
partition; partition changes when partitionCount changes).

As for the metrics topic, there shouldn't be any issue.

Cheers,
Chris

On Mon, Feb 2, 2015 at 10:00 PM, Dotan Patrich <do...@fortscale.com> wrote:

> Hi,
>
> We're using Samza with Kafka and we would like to use multiple partitions
> in our topics.
>
> We've noticed that the number of partitions is defined in
> server.properties.
> According to the Kafka documentation
> <http://kafka.apache.org/07/configuration.html>, there are 2 options to
> defined the number of partitions:
>
>    - num.partitions - Specifies the default number of partitions per topic.
>    - topic.partition.count.map - Override parameter to control the number
>    of partitions for selected topics. E.g., topic1:10,topic2:20
>
> We thought about using the first option (num.partitions) in order to avoid
> the overhead of adding every new topic to a map. Our concern is how it
> affects the metrics and checkpoint topics.
>
> Can anyone share what is the best practice for using multiple partitions in
> Kafka?
>
>
> Thanks,
> Dotan
>