You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by BigData dev <bi...@gmail.com> on 2016/08/29 17:52:07 UTC

Reg: DefaultParititioner in Kafka

Hi All,
In DefaultPartitioner implementation, when key is null, we get the
partition number by modulo of available partitions. Below is the code
snippet.

if (availablePartitions.size() > 0)
{ int part = Utils.toPositive(nextValue) % availablePartitions.size();
return availablePartitions.get(part).partition();
}
Where as when key is not null, we get the partition number by modulo of
total no og partitions.

return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;

As if some partitions are not available,then the producer will not be able
to publish message to that partition.

Should n't we do the same as by considering only available partitions?

https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java#L67

Could any help to clarify on this issue.


Thanks,
Bharat

Re: Reg: DefaultParititioner in Kafka

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

When a key is available, you generally include it because you want all
messages with the same key to always end up in the same partition. This
allows all messages with the same key to be processed by the same consumer
(e.g. allowing you to aggregate all data for a single user if you key on
user ID). To accomplish this you always consider all partitions (not just
available partitions) and keep the # of partitions in a topic fixed.

The docs on Kafka's design, specifically some notes in the producer &
consumer sections, cover a bit of this: http://kafka.apache.org/
documentation.html#intro_producers

-Ewen

On Mon, Aug 29, 2016 at 10:52 AM, BigData dev <bi...@gmail.com>
wrote:

> Hi All,
> In DefaultPartitioner implementation, when key is null, we get the
> partition number by modulo of available partitions. Below is the code
> snippet.
>
> if (availablePartitions.size() > 0)
> { int part = Utils.toPositive(nextValue) % availablePartitions.size();
> return availablePartitions.get(part).partition();
> }
> Where as when key is not null, we get the partition number by modulo of
> total no og partitions.
>
> return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
>
> As if some partitions are not available,then the producer will not be able
> to publish message to that partition.
>
> Should n't we do the same as by considering only available partitions?
>
> https://github.com/apache/kafka/blob/trunk/clients/src/
> main/java/org/apache/kafka/clients/producer/internals/
> DefaultPartitioner.java#L67
>
> Could any help to clarify on this issue.
>
>
> Thanks,
> Bharat
>

-- 
Thanks,
Ewen

Re: Reg: DefaultParititioner in Kafka

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

When a key is available, you generally include it because you want all
messages with the same key to always end up in the same partition. This
allows all messages with the same key to be processed by the same consumer
(e.g. allowing you to aggregate all data for a single user if you key on
user ID). To accomplish this you always consider all partitions (not just
available partitions) and keep the # of partitions in a topic fixed.

The docs on Kafka's design, specifically some notes in the producer &
consumer sections, cover a bit of this: http://kafka.apache.org/
documentation.html#intro_producers

-Ewen

On Mon, Aug 29, 2016 at 10:52 AM, BigData dev <bi...@gmail.com>
wrote:

> Hi All,
> In DefaultPartitioner implementation, when key is null, we get the
> partition number by modulo of available partitions. Below is the code
> snippet.
>
> if (availablePartitions.size() > 0)
> { int part = Utils.toPositive(nextValue) % availablePartitions.size();
> return availablePartitions.get(part).partition();
> }
> Where as when key is not null, we get the partition number by modulo of
> total no og partitions.
>
> return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
>
> As if some partitions are not available,then the producer will not be able
> to publish message to that partition.
>
> Should n't we do the same as by considering only available partitions?
>
> https://github.com/apache/kafka/blob/trunk/clients/src/
> main/java/org/apache/kafka/clients/producer/internals/
> DefaultPartitioner.java#L67
>
> Could any help to clarify on this issue.
>
>
> Thanks,
> Bharat
>

-- 
Thanks,
Ewen