You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jack Huang <ja...@mz.com> on 2016/07/28 19:10:20 UTC

Same partition number of different Kafka topcs

Hi all,

I have an application where I need to join events from two different
topics. Every event is identified by an id, which is used as the key for
the topic partition. After doing some experiment, I observed that events
will go into different partitions even if the number of partitions for both
topics are the same. I can't find any documentation on this point though.
Does anyone know if this is indeed the case?


Thanks,
Jack

Re: Same partition number of different Kafka topcs

Posted by Dana Powers <da...@gmail.com>.
kafka-python by default uses the same partitioning algorithm as the Java
client. If there are bugs, please let me know. I think the issue here is
with the default nodejs partitioner.

-Dana
On Aug 3, 2016 7:03 PM, "Jack Huang" <ja...@mz.com> wrote:

I see, thanks for the clarification.

On Tue, Aug 2, 2016 at 10:07 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> Jack,
>
> The partition is always selected by the client -- if it weren't the
brokers
> would need to forward requests since different partitions are handled by
> different brokers. The only "default Kafka partitioner" is the one that
you
> could consider "standardized" by the Java client implementation. Some
> client libraries will make this pluggable like the Java client does so you
> could use a compatible implementation.
>
> -Ewen
>
> On Fri, Jul 29, 2016 at 11:27 AM, Jack Huang <ja...@mz.com> wrote:
>
> > Hi Gerard,
> >
> > After further digging, I found that the clients we are using also have
> > different partitioner. The Python one uses murmur2 (
> >
> >
>
https://github.com/dpkp/kafka-python/blob/master/kafka/partitioner/default.py
> > ),
> > and the NodeJS one uses its own impl (
> > https://github.com/SOHU-Co/kafka-node/blob/master/lib/partitioner.js).
> > Does
> > Kafka delegate the task of partitioning to client? From their
> documentation
> > it doesn't seem like they provide an option to select the "default Kafka
> > partitioner".
> >
> > Thanks,
> > Jack
> >
> >
> > On Fri, Jul 29, 2016 at 7:42 AM, Gerard Klijs <ge...@dizzit.com>
> > wrote:
> >
> > > The default partitioner will take the key, make the hash from it, and
> do
> > a
> > > modulo operation to determine the partition it goes to. Some things
> which
> > > might cause it to and up different for different topics:
> > > - partition number are not the same (you already checked)
> > > - key is not exactly the same, for example one might have a space
after
> > the
> > > id
> > > - the other topic is configured to use another partitioner
> > > - the serialiser for the key is different for both topics, since the
> hash
> > > is created based on the bytes of key of the serialised message
> > > - all the topics use another partitioner (for example round robin)
> > >
> > > On Thu, Jul 28, 2016 at 9:11 PM Jack Huang <ja...@mz.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have an application where I need to join events from two different
> > > > topics. Every event is identified by an id, which is used as the key
> > for
> > > > the topic partition. After doing some experiment, I observed that
> > events
> > > > will go into different partitions even if the number of partitions
> for
> > > both
> > > > topics are the same. I can't find any documentation on this point
> > though.
> > > > Does anyone know if this is indeed the case?
> > > >
> > > >
> > > > Thanks,
> > > > Jack
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: Same partition number of different Kafka topcs

Posted by Jack Huang <ja...@mz.com>.
I see, thanks for the clarification.

On Tue, Aug 2, 2016 at 10:07 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> Jack,
>
> The partition is always selected by the client -- if it weren't the brokers
> would need to forward requests since different partitions are handled by
> different brokers. The only "default Kafka partitioner" is the one that you
> could consider "standardized" by the Java client implementation. Some
> client libraries will make this pluggable like the Java client does so you
> could use a compatible implementation.
>
> -Ewen
>
> On Fri, Jul 29, 2016 at 11:27 AM, Jack Huang <ja...@mz.com> wrote:
>
> > Hi Gerard,
> >
> > After further digging, I found that the clients we are using also have
> > different partitioner. The Python one uses murmur2 (
> >
> >
> https://github.com/dpkp/kafka-python/blob/master/kafka/partitioner/default.py
> > ),
> > and the NodeJS one uses its own impl (
> > https://github.com/SOHU-Co/kafka-node/blob/master/lib/partitioner.js).
> > Does
> > Kafka delegate the task of partitioning to client? From their
> documentation
> > it doesn't seem like they provide an option to select the "default Kafka
> > partitioner".
> >
> > Thanks,
> > Jack
> >
> >
> > On Fri, Jul 29, 2016 at 7:42 AM, Gerard Klijs <ge...@dizzit.com>
> > wrote:
> >
> > > The default partitioner will take the key, make the hash from it, and
> do
> > a
> > > modulo operation to determine the partition it goes to. Some things
> which
> > > might cause it to and up different for different topics:
> > > - partition number are not the same (you already checked)
> > > - key is not exactly the same, for example one might have a space after
> > the
> > > id
> > > - the other topic is configured to use another partitioner
> > > - the serialiser for the key is different for both topics, since the
> hash
> > > is created based on the bytes of key of the serialised message
> > > - all the topics use another partitioner (for example round robin)
> > >
> > > On Thu, Jul 28, 2016 at 9:11 PM Jack Huang <ja...@mz.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have an application where I need to join events from two different
> > > > topics. Every event is identified by an id, which is used as the key
> > for
> > > > the topic partition. After doing some experiment, I observed that
> > events
> > > > will go into different partitions even if the number of partitions
> for
> > > both
> > > > topics are the same. I can't find any documentation on this point
> > though.
> > > > Does anyone know if this is indeed the case?
> > > >
> > > >
> > > > Thanks,
> > > > Jack
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: Same partition number of different Kafka topcs

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Jack,

The partition is always selected by the client -- if it weren't the brokers
would need to forward requests since different partitions are handled by
different brokers. The only "default Kafka partitioner" is the one that you
could consider "standardized" by the Java client implementation. Some
client libraries will make this pluggable like the Java client does so you
could use a compatible implementation.

-Ewen

On Fri, Jul 29, 2016 at 11:27 AM, Jack Huang <ja...@mz.com> wrote:

> Hi Gerard,
>
> After further digging, I found that the clients we are using also have
> different partitioner. The Python one uses murmur2 (
>
> https://github.com/dpkp/kafka-python/blob/master/kafka/partitioner/default.py
> ),
> and the NodeJS one uses its own impl (
> https://github.com/SOHU-Co/kafka-node/blob/master/lib/partitioner.js).
> Does
> Kafka delegate the task of partitioning to client? From their documentation
> it doesn't seem like they provide an option to select the "default Kafka
> partitioner".
>
> Thanks,
> Jack
>
>
> On Fri, Jul 29, 2016 at 7:42 AM, Gerard Klijs <ge...@dizzit.com>
> wrote:
>
> > The default partitioner will take the key, make the hash from it, and do
> a
> > modulo operation to determine the partition it goes to. Some things which
> > might cause it to and up different for different topics:
> > - partition number are not the same (you already checked)
> > - key is not exactly the same, for example one might have a space after
> the
> > id
> > - the other topic is configured to use another partitioner
> > - the serialiser for the key is different for both topics, since the hash
> > is created based on the bytes of key of the serialised message
> > - all the topics use another partitioner (for example round robin)
> >
> > On Thu, Jul 28, 2016 at 9:11 PM Jack Huang <ja...@mz.com> wrote:
> >
> > > Hi all,
> > >
> > > I have an application where I need to join events from two different
> > > topics. Every event is identified by an id, which is used as the key
> for
> > > the topic partition. After doing some experiment, I observed that
> events
> > > will go into different partitions even if the number of partitions for
> > both
> > > topics are the same. I can't find any documentation on this point
> though.
> > > Does anyone know if this is indeed the case?
> > >
> > >
> > > Thanks,
> > > Jack
> > >
> >
>



-- 
Thanks,
Ewen

Re: Same partition number of different Kafka topcs

Posted by Jack Huang <ja...@mz.com>.
Hi Gerard,

After further digging, I found that the clients we are using also have
different partitioner. The Python one uses murmur2 (
https://github.com/dpkp/kafka-python/blob/master/kafka/partitioner/default.py),
and the NodeJS one uses its own impl (
https://github.com/SOHU-Co/kafka-node/blob/master/lib/partitioner.js). Does
Kafka delegate the task of partitioning to client? From their documentation
it doesn't seem like they provide an option to select the "default Kafka
partitioner".

Thanks,
Jack


On Fri, Jul 29, 2016 at 7:42 AM, Gerard Klijs <ge...@dizzit.com>
wrote:

> The default partitioner will take the key, make the hash from it, and do a
> modulo operation to determine the partition it goes to. Some things which
> might cause it to and up different for different topics:
> - partition number are not the same (you already checked)
> - key is not exactly the same, for example one might have a space after the
> id
> - the other topic is configured to use another partitioner
> - the serialiser for the key is different for both topics, since the hash
> is created based on the bytes of key of the serialised message
> - all the topics use another partitioner (for example round robin)
>
> On Thu, Jul 28, 2016 at 9:11 PM Jack Huang <ja...@mz.com> wrote:
>
> > Hi all,
> >
> > I have an application where I need to join events from two different
> > topics. Every event is identified by an id, which is used as the key for
> > the topic partition. After doing some experiment, I observed that events
> > will go into different partitions even if the number of partitions for
> both
> > topics are the same. I can't find any documentation on this point though.
> > Does anyone know if this is indeed the case?
> >
> >
> > Thanks,
> > Jack
> >
>

Re: Same partition number of different Kafka topcs

Posted by Gerard Klijs <ge...@dizzit.com>.
The default partitioner will take the key, make the hash from it, and do a
modulo operation to determine the partition it goes to. Some things which
might cause it to and up different for different topics:
- partition number are not the same (you already checked)
- key is not exactly the same, for example one might have a space after the
id
- the other topic is configured to use another partitioner
- the serialiser for the key is different for both topics, since the hash
is created based on the bytes of key of the serialised message
- all the topics use another partitioner (for example round robin)

On Thu, Jul 28, 2016 at 9:11 PM Jack Huang <ja...@mz.com> wrote:

> Hi all,
>
> I have an application where I need to join events from two different
> topics. Every event is identified by an id, which is used as the key for
> the topic partition. After doing some experiment, I observed that events
> will go into different partitions even if the number of partitions for both
> topics are the same. I can't find any documentation on this point though.
> Does anyone know if this is indeed the case?
>
>
> Thanks,
> Jack
>