You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Pushkar Deole <pd...@gmail.com> on 2021/11/20 06:54:19 UTC

uneven distribution of events across kafka topic partitions for small number of unique keys

Hi All,

We are experiencing some uneven distribution of events across topic
partitions for a small set of unique keys: following are the details:

1. topic with 6 partitions
2. 8 unique keys used to produce events onto the topic

Used 'key' based partitioning while producing events onto the above topic
Observation: only 3 partitions were utilized for all the events pertaining
to those 8 unique keys.

Any idea how can the load be even across partitions while using key based
partitioning strategy? Any help would be greatly appreciated.

Note: we cannot use round robin since key level ordering matters for us

Re: uneven distribution of events across kafka topic partitions for small number of unique keys

Posted by Dave Klein <da...@usa.net>.
I’m sorry.  I misread your message.  I thought you were asking about increasing the number of partitions on a topic after there were keyed events in it.  

> On Nov 22, 2021, at 3:07 AM, Pushkar Deole <pd...@gmail.com> wrote:
> 
> Dave,
> 
> i am not sure i get your point... it is not about lesser partitions, the
> issue is about the duplicate hash caused by default partitioner for 2
> different string, which might be landing the 2 different keys into same
> partition
> 
>> On Sun, Nov 21, 2021 at 9:33 PM Dave Klein <da...@usa.net> wrote:
>> 
>> Another possibility, if you can pause processing, is to create a new topic
>> with the higher number of partitions, then consume from the beginning of
>> the old topic and produce to the new one. Then continue processing as
>> normal and all events will be in the correct partitions.
>> 
>> Regards,
>> Dave
>> 
>>>> On Nov 21, 2021, at 7:38 AM, Pushkar Deole <pd...@gmail.com> wrote:
>>> 
>>> Thanks Luke, I am sure this problem would have been faced by many others
>>> before so would like to know if there are any existing custom algorithms
>>> that can be reused,
>>> 
>>> Note that we also have requirement to maintain key level ordering,  so
>> the
>>> custom partitioner should support that as well
>>> 
>>>> On Sun, Nov 21, 2021, 18:29 Luke Chen <sh...@gmail.com> wrote:
>>>> 
>>>> Hello Pushkar,
>>>> Default distribution algorithm is by "hash(key) % partition_count", so
>>>> there's possibility to have the uneven distribution you saw.
>>>> 
>>>> Yes, there's a way to solve your problem: custom partitioner:
>>>> 
>> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
>>>> 
>>>> You can check the partitioner javadoc here
>>>> <
>>>> 
>> https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html
>>>>> 
>>>> for reference. You can see some examples from built-in partitioners, ex:
>>>> 
>>>> 
>> clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
>>>> Basically, you want to focus on the "partition" method, to define your
>> own
>>>> algorithm to distribute the keys based on the events, ex: key-1 ->
>>>> partition-1, key-2 -> partition-2... etc.
>>>> 
>>>> Thank you.
>>>> Luke
>>>> 
>>>> 
>>>> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <pd...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> We are experiencing some uneven distribution of events across topic
>>>>> partitions for a small set of unique keys: following are the details:
>>>>> 
>>>>> 1. topic with 6 partitions
>>>>> 2. 8 unique keys used to produce events onto the topic
>>>>> 
>>>>> Used 'key' based partitioning while producing events onto the above
>> topic
>>>>> Observation: only 3 partitions were utilized for all the events
>>>> pertaining
>>>>> to those 8 unique keys.
>>>>> 
>>>>> Any idea how can the load be even across partitions while using key
>> based
>>>>> partitioning strategy? Any help would be greatly appreciated.
>>>>> 
>>>>> Note: we cannot use round robin since key level ordering matters for us
>>>>> 
>>>> 
>> 
>> 


Re: uneven distribution of events across kafka topic partitions for small number of unique keys

Posted by Pushkar Deole <pd...@gmail.com>.
Dave,

i am not sure i get your point... it is not about lesser partitions, the
issue is about the duplicate hash caused by default partitioner for 2
different string, which might be landing the 2 different keys into same
partition

On Sun, Nov 21, 2021 at 9:33 PM Dave Klein <da...@usa.net> wrote:

> Another possibility, if you can pause processing, is to create a new topic
> with the higher number of partitions, then consume from the beginning of
> the old topic and produce to the new one. Then continue processing as
> normal and all events will be in the correct partitions.
>
> Regards,
> Dave
>
> > On Nov 21, 2021, at 7:38 AM, Pushkar Deole <pd...@gmail.com> wrote:
> >
> > Thanks Luke, I am sure this problem would have been faced by many others
> > before so would like to know if there are any existing custom algorithms
> > that can be reused,
> >
> > Note that we also have requirement to maintain key level ordering,  so
> the
> > custom partitioner should support that as well
> >
> >> On Sun, Nov 21, 2021, 18:29 Luke Chen <sh...@gmail.com> wrote:
> >>
> >> Hello Pushkar,
> >> Default distribution algorithm is by "hash(key) % partition_count", so
> >> there's possibility to have the uneven distribution you saw.
> >>
> >> Yes, there's a way to solve your problem: custom partitioner:
> >>
> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
> >>
> >> You can check the partitioner javadoc here
> >> <
> >>
> https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html
> >>>
> >> for reference. You can see some examples from built-in partitioners, ex:
> >>
> >>
> clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
> >> Basically, you want to focus on the "partition" method, to define your
> own
> >> algorithm to distribute the keys based on the events, ex: key-1 ->
> >> partition-1, key-2 -> partition-2... etc.
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <pd...@gmail.com>
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> We are experiencing some uneven distribution of events across topic
> >>> partitions for a small set of unique keys: following are the details:
> >>>
> >>> 1. topic with 6 partitions
> >>> 2. 8 unique keys used to produce events onto the topic
> >>>
> >>> Used 'key' based partitioning while producing events onto the above
> topic
> >>> Observation: only 3 partitions were utilized for all the events
> >> pertaining
> >>> to those 8 unique keys.
> >>>
> >>> Any idea how can the load be even across partitions while using key
> based
> >>> partitioning strategy? Any help would be greatly appreciated.
> >>>
> >>> Note: we cannot use round robin since key level ordering matters for us
> >>>
> >>
>
>

Re: uneven distribution of events across kafka topic partitions for small number of unique keys

Posted by Dave Klein <da...@usa.net>.
Another possibility, if you can pause processing, is to create a new topic with the higher number of partitions, then consume from the beginning of the old topic and produce to the new one. Then continue processing as normal and all events will be in the correct partitions. 

Regards,
Dave

> On Nov 21, 2021, at 7:38 AM, Pushkar Deole <pd...@gmail.com> wrote:
> 
> Thanks Luke, I am sure this problem would have been faced by many others
> before so would like to know if there are any existing custom algorithms
> that can be reused,
> 
> Note that we also have requirement to maintain key level ordering,  so the
> custom partitioner should support that as well
> 
>> On Sun, Nov 21, 2021, 18:29 Luke Chen <sh...@gmail.com> wrote:
>> 
>> Hello Pushkar,
>> Default distribution algorithm is by "hash(key) % partition_count", so
>> there's possibility to have the uneven distribution you saw.
>> 
>> Yes, there's a way to solve your problem: custom partitioner:
>> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
>> 
>> You can check the partitioner javadoc here
>> <
>> https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html
>>> 
>> for reference. You can see some examples from built-in partitioners, ex:
>> 
>> clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
>> Basically, you want to focus on the "partition" method, to define your own
>> algorithm to distribute the keys based on the events, ex: key-1 ->
>> partition-1, key-2 -> partition-2... etc.
>> 
>> Thank you.
>> Luke
>> 
>> 
>> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <pd...@gmail.com>
>> wrote:
>> 
>>> Hi All,
>>> 
>>> We are experiencing some uneven distribution of events across topic
>>> partitions for a small set of unique keys: following are the details:
>>> 
>>> 1. topic with 6 partitions
>>> 2. 8 unique keys used to produce events onto the topic
>>> 
>>> Used 'key' based partitioning while producing events onto the above topic
>>> Observation: only 3 partitions were utilized for all the events
>> pertaining
>>> to those 8 unique keys.
>>> 
>>> Any idea how can the load be even across partitions while using key based
>>> partitioning strategy? Any help would be greatly appreciated.
>>> 
>>> Note: we cannot use round robin since key level ordering matters for us
>>> 
>> 


Re: uneven distribution of events across kafka topic partitions for small number of unique keys

Posted by Pushkar Deole <pd...@gmail.com>.
Thanks Luke, I am sure this problem would have been faced by many others
before so would like to know if there are any existing custom algorithms
that can be reused,

Note that we also have requirement to maintain key level ordering,  so the
custom partitioner should support that as well

On Sun, Nov 21, 2021, 18:29 Luke Chen <sh...@gmail.com> wrote:

> Hello Pushkar,
> Default distribution algorithm is by "hash(key) % partition_count", so
> there's possibility to have the uneven distribution you saw.
>
> Yes, there's a way to solve your problem: custom partitioner:
> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
>
> You can check the partitioner javadoc here
> <
> https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html
> >
> for reference. You can see some examples from built-in partitioners, ex:
>
> clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
> Basically, you want to focus on the "partition" method, to define your own
> algorithm to distribute the keys based on the events, ex: key-1 ->
> partition-1, key-2 -> partition-2... etc.
>
> Thank you.
> Luke
>
>
> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <pd...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > We are experiencing some uneven distribution of events across topic
> > partitions for a small set of unique keys: following are the details:
> >
> > 1. topic with 6 partitions
> > 2. 8 unique keys used to produce events onto the topic
> >
> > Used 'key' based partitioning while producing events onto the above topic
> > Observation: only 3 partitions were utilized for all the events
> pertaining
> > to those 8 unique keys.
> >
> > Any idea how can the load be even across partitions while using key based
> > partitioning strategy? Any help would be greatly appreciated.
> >
> > Note: we cannot use round robin since key level ordering matters for us
> >
>

Re: uneven distribution of events across kafka topic partitions for small number of unique keys

Posted by Luke Chen <sh...@gmail.com>.
Hello Pushkar,
Default distribution algorithm is by "hash(key) % partition_count", so
there's possibility to have the uneven distribution you saw.

Yes, there's a way to solve your problem: custom partitioner:
https://kafka.apache.org/documentation/#producerconfigs_partitioner.class

You can check the partitioner javadoc here
<https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html>
for reference. You can see some examples from built-in partitioners, ex:
clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
Basically, you want to focus on the "partition" method, to define your own
algorithm to distribute the keys based on the events, ex: key-1 ->
partition-1, key-2 -> partition-2... etc.

Thank you.
Luke


On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <pd...@gmail.com> wrote:

> Hi All,
>
> We are experiencing some uneven distribution of events across topic
> partitions for a small set of unique keys: following are the details:
>
> 1. topic with 6 partitions
> 2. 8 unique keys used to produce events onto the topic
>
> Used 'key' based partitioning while producing events onto the above topic
> Observation: only 3 partitions were utilized for all the events pertaining
> to those 8 unique keys.
>
> Any idea how can the load be even across partitions while using key based
> partitioning strategy? Any help would be greatly appreciated.
>
> Note: we cannot use round robin since key level ordering matters for us
>