You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by SenthilKumar K <se...@gmail.com> on 2018/08/29 07:15:33 UTC

Kafka Producer Partition Key Selection

Hello Experts, We want to distribute data across partitions in Kafka
Cluster.
Option 1 : Use Null Partition Key which can distribute data across
paritions.
Option 2 : Choose Key ( Random UUID ? ) which can help to distribute data
70-80%.

I have seen below side effect on Confluence Page about sending null Keys to
Producer. Is this still valid on newer version of Kafka Producer Lib ?
Why is data not evenly distributed among partitions when a partitioning key
is not specified?

In Kafka producer, a partition key can be specified to indicate the
destination partition of the message. By default, a hashing-based
partitioner is used to determine the partition id given the key, and people
can use customized partitioners also.

To reduce # of open sockets, in 0.8.0 (
https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning
key is not specified or null, a producer will pick a random partition and
stick to it for some time (default is 10 mins) before switching to another
one. So, if there are fewer producers than partitions, at a given point of
time, some partitions may not receive any data. To alleviate this problem,
one can either reduce the metadata refresh interval or specify a message
key and a customized random partitioner. For more detail see this thread
http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E

Pls advise on Choosing Partition Key which should not have side effects.

--Senthil

Re: Kafka Producer Partition Key Selection

Posted by "M. Manna" <ma...@gmail.com>.

Why can't we override the DefaultPartitioner, and simply override
paritition()  method, such that it will redistribute to all partitions in
round robin fashion.

Round-Robin partitioner and StickyAssignor (consumer) should work nicely
for any publish subscribe system.

On Wed, 29 Aug 2018 at 09:39, SenthilKumar K <se...@gmail.com> wrote:

> Thanks Gaurav.  Did you notice side effect mentioned in this page :
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
> ?
>
>
> --Senthil
>
> On Wed, Aug 29, 2018 at 2:02 PM Gaurav Bajaj <ga...@gmail.com>
> wrote:
>
> > Hello Senthil,
> >
> > In our case we use NULL as message Key to achieve even distribution in
> > producer.
> > With that we were able to achieve very even distribution with that.
> > Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1
> >
> >
> > Thanks,
> > Gaurav
> >
> > On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <se...@gmail.com>
> > wrote:
> >
> >> Hello Experts, We want to distribute data across partitions in Kafka
> >> Cluster.
> >>  Option 1 : Use Null Partition Key which can distribute data across
> >> paritions.
> >>  Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute
> >> data
> >> 70-80%.
> >>
> >> I have seen below side effect on Confluence Page about sending null Keys
> >> to
> >> Producer. Is this still valid on newer version of Kafka Producer Lib ?
> >> Why is data not evenly distributed among partitions when a partitioning
> >> key
> >> is not specified?
> >>
> >> In Kafka producer, a partition key can be specified to indicate the
> >> destination partition of the message. By default, a hashing-based
> >> partitioner is used to determine the partition id given the key, and
> >> people
> >> can use customized partitioners also.
> >>
> >> To reduce # of open sockets, in 0.8.0 (
> >> https://issues.apache.org/jira/browse/KAFKA-1017), when the
> partitioning
> >> key is not specified or null, a producer will pick a random partition
> and
> >> stick to it for some time (default is 10 mins) before switching to
> another
> >> one. So, if there are fewer producers than partitions, at a given point
> of
> >> time, some partitions may not receive any data. To alleviate this
> problem,
> >> one can either reduce the metadata refresh interval or specify a message
> >> key and a customized random partitioner. For more detail see this thread
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E
> >>
> >> Pls advise on Choosing Partition Key which should not have side effects.
> >>
> >> --Senthil
> >>
> >
> >
>

Re: Kafka Producer Partition Key Selection

Posted by "M. Manna" <ma...@gmail.com>.

Why can't we override the DefaultPartitioner, and simply override
paritition()  method, such that it will redistribute to all partitions in
round robin fashion.

Round-Robin partitioner and StickyAssignor (consumer) should work nicely
for any publish subscribe system.

On Wed, 29 Aug 2018 at 09:39, SenthilKumar K <se...@gmail.com> wrote:

> Thanks Gaurav.  Did you notice side effect mentioned in this page :
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
> ?
>
>
> --Senthil
>
> On Wed, Aug 29, 2018 at 2:02 PM Gaurav Bajaj <ga...@gmail.com>
> wrote:
>
> > Hello Senthil,
> >
> > In our case we use NULL as message Key to achieve even distribution in
> > producer.
> > With that we were able to achieve very even distribution with that.
> > Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1
> >
> >
> > Thanks,
> > Gaurav
> >
> > On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <se...@gmail.com>
> > wrote:
> >
> >> Hello Experts, We want to distribute data across partitions in Kafka
> >> Cluster.
> >>  Option 1 : Use Null Partition Key which can distribute data across
> >> paritions.
> >>  Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute
> >> data
> >> 70-80%.
> >>
> >> I have seen below side effect on Confluence Page about sending null Keys
> >> to
> >> Producer. Is this still valid on newer version of Kafka Producer Lib ?
> >> Why is data not evenly distributed among partitions when a partitioning
> >> key
> >> is not specified?
> >>
> >> In Kafka producer, a partition key can be specified to indicate the
> >> destination partition of the message. By default, a hashing-based
> >> partitioner is used to determine the partition id given the key, and
> >> people
> >> can use customized partitioners also.
> >>
> >> To reduce # of open sockets, in 0.8.0 (
> >> https://issues.apache.org/jira/browse/KAFKA-1017), when the
> partitioning
> >> key is not specified or null, a producer will pick a random partition
> and
> >> stick to it for some time (default is 10 mins) before switching to
> another
> >> one. So, if there are fewer producers than partitions, at a given point
> of
> >> time, some partitions may not receive any data. To alleviate this
> problem,
> >> one can either reduce the metadata refresh interval or specify a message
> >> key and a customized random partitioner. For more detail see this thread
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E
> >>
> >> Pls advise on Choosing Partition Key which should not have side effects.
> >>
> >> --Senthil
> >>
> >
> >
>

Re: Kafka Producer Partition Key Selection

Posted by SenthilKumar K <se...@gmail.com>.

Thanks Gaurav.  Did you notice side effect mentioned in this page :
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
?


--Senthil

On Wed, Aug 29, 2018 at 2:02 PM Gaurav Bajaj <ga...@gmail.com> wrote:

> Hello Senthil,
>
> In our case we use NULL as message Key to achieve even distribution in
> producer.
> With that we were able to achieve very even distribution with that.
> Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1
>
>
> Thanks,
> Gaurav
>
> On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <se...@gmail.com>
> wrote:
>
>> Hello Experts, We want to distribute data across partitions in Kafka
>> Cluster.
>>  Option 1 : Use Null Partition Key which can distribute data across
>> paritions.
>>  Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute
>> data
>> 70-80%.
>>
>> I have seen below side effect on Confluence Page about sending null Keys
>> to
>> Producer. Is this still valid on newer version of Kafka Producer Lib ?
>> Why is data not evenly distributed among partitions when a partitioning
>> key
>> is not specified?
>>
>> In Kafka producer, a partition key can be specified to indicate the
>> destination partition of the message. By default, a hashing-based
>> partitioner is used to determine the partition id given the key, and
>> people
>> can use customized partitioners also.
>>
>> To reduce # of open sockets, in 0.8.0 (
>> https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning
>> key is not specified or null, a producer will pick a random partition and
>> stick to it for some time (default is 10 mins) before switching to another
>> one. So, if there are fewer producers than partitions, at a given point of
>> time, some partitions may not receive any data. To alleviate this problem,
>> one can either reduce the metadata refresh interval or specify a message
>> key and a customized random partitioner. For more detail see this thread
>>
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E
>>
>> Pls advise on Choosing Partition Key which should not have side effects.
>>
>> --Senthil
>>
>
>

Re: Kafka Producer Partition Key Selection

Posted by SenthilKumar K <se...@gmail.com>.

Thanks Gaurav.  Did you notice side effect mentioned in this page :
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
?


--Senthil

On Wed, Aug 29, 2018 at 2:02 PM Gaurav Bajaj <ga...@gmail.com> wrote:

> Hello Senthil,
>
> In our case we use NULL as message Key to achieve even distribution in
> producer.
> With that we were able to achieve very even distribution with that.
> Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1
>
>
> Thanks,
> Gaurav
>
> On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <se...@gmail.com>
> wrote:
>
>> Hello Experts, We want to distribute data across partitions in Kafka
>> Cluster.
>>  Option 1 : Use Null Partition Key which can distribute data across
>> paritions.
>>  Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute
>> data
>> 70-80%.
>>
>> I have seen below side effect on Confluence Page about sending null Keys
>> to
>> Producer. Is this still valid on newer version of Kafka Producer Lib ?
>> Why is data not evenly distributed among partitions when a partitioning
>> key
>> is not specified?
>>
>> In Kafka producer, a partition key can be specified to indicate the
>> destination partition of the message. By default, a hashing-based
>> partitioner is used to determine the partition id given the key, and
>> people
>> can use customized partitioners also.
>>
>> To reduce # of open sockets, in 0.8.0 (
>> https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning
>> key is not specified or null, a producer will pick a random partition and
>> stick to it for some time (default is 10 mins) before switching to another
>> one. So, if there are fewer producers than partitions, at a given point of
>> time, some partitions may not receive any data. To alleviate this problem,
>> one can either reduce the metadata refresh interval or specify a message
>> key and a customized random partitioner. For more detail see this thread
>>
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E
>>
>> Pls advise on Choosing Partition Key which should not have side effects.
>>
>> --Senthil
>>
>
>

Re: Kafka Producer Partition Key Selection

Posted by Gaurav Bajaj <ga...@gmail.com>.

Hello Senthil,

In our case we use NULL as message Key to achieve even distribution in
producer.
With that we were able to achieve very even distribution with that.
Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1


Thanks,
Gaurav

On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <se...@gmail.com>
wrote:

> Hello Experts, We want to distribute data across partitions in Kafka
> Cluster.
>  Option 1 : Use Null Partition Key which can distribute data across
> paritions.
>  Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute data
> 70-80%.
>
> I have seen below side effect on Confluence Page about sending null Keys to
> Producer. Is this still valid on newer version of Kafka Producer Lib ?
> Why is data not evenly distributed among partitions when a partitioning key
> is not specified?
>
> In Kafka producer, a partition key can be specified to indicate the
> destination partition of the message. By default, a hashing-based
> partitioner is used to determine the partition id given the key, and people
> can use customized partitioners also.
>
> To reduce # of open sockets, in 0.8.0 (
> https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning
> key is not specified or null, a producer will pick a random partition and
> stick to it for some time (default is 10 mins) before switching to another
> one. So, if there are fewer producers than partitions, at a given point of
> time, some partitions may not receive any data. To alleviate this problem,
> one can either reduce the metadata refresh interval or specify a message
> key and a customized random partitioner. For more detail see this thread
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.
> mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUK
> Mg%40mail.gmail.com%3E
>
> Pls advise on Choosing Partition Key which should not have side effects.
>
> --Senthil
>

Re: Kafka Producer Partition Key Selection

Posted by Gaurav Bajaj <ga...@gmail.com>.

Hello Senthil,

In our case we use NULL as message Key to achieve even distribution in
producer.
With that we were able to achieve very even distribution with that.
Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1


Thanks,
Gaurav

On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <se...@gmail.com>
wrote:

> Hello Experts, We want to distribute data across partitions in Kafka
> Cluster.
>  Option 1 : Use Null Partition Key which can distribute data across
> paritions.
>  Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute data
> 70-80%.
>
> I have seen below side effect on Confluence Page about sending null Keys to
> Producer. Is this still valid on newer version of Kafka Producer Lib ?
> Why is data not evenly distributed among partitions when a partitioning key
> is not specified?
>
> In Kafka producer, a partition key can be specified to indicate the
> destination partition of the message. By default, a hashing-based
> partitioner is used to determine the partition id given the key, and people
> can use customized partitioners also.
>
> To reduce # of open sockets, in 0.8.0 (
> https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning
> key is not specified or null, a producer will pick a random partition and
> stick to it for some time (default is 10 mins) before switching to another
> one. So, if there are fewer producers than partitions, at a given point of
> time, some partitions may not receive any data. To alleviate this problem,
> one can either reduce the metadata refresh interval or specify a message
> key and a customized random partitioner. For more detail see this thread
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.
> mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUK
> Mg%40mail.gmail.com%3E
>
> Pls advise on Choosing Partition Key which should not have side effects.
>
> --Senthil
>