You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Maria Pilar <pi...@gmail.com> on 2018/01/25 13:47:17 UTC

Best practices Partition Key

Hi everyone,

I´m trying to understand the best practice to define the partition key. I
have defined some topics that they are related with entities in cassandra
data model, the relationship is one-to-one, one entity - one topic, because
I need to ensure the properly ordering in the events. I have created one
partition for each topic to ensure it as well.

If I will use kafka like a datastore and search throgh the records, I know
that could be a best practice use the partition key of Cassandra (e.g
Customer ID) as a partition key in kafka

any comment please ??

thanks

Re: Best practices Partition Key

Posted by Maria Pilar <pi...@gmail.com>.

Yes, I´m capturing different events from the same entity/resource (create,
update and delete) for that reason I´ve choosen that options however my
question is if i can improve my solution if I want to use kafka as
datastore including the partition key of cassandra for each entity as
partition key of kafka.

On 25 January 2018 at 16:02, Dmitry Minkovsky <dm...@gmail.com> wrote:

> > one entity - one topic, because I need to ensure the properly ordering in
> the events.
>
> This is a great in insight. I discovered that keeping entity-related things
> on one topic is much easier than splitting entity-related things onto
> multiple topics. If you have one topic, replaying that topic is trivial. If
> you have multiple topics, replaying those topics requires careful
> synchronization. In my case, I am doing event capture and I have
> entity-related events on multiple topics. For example, for a user entity I
> have topics `join-requests` and `settings-update-requests`. Having separate
> topics is superficially nicer in terms of consuming them with Kafka
> Streams: you can set up topic-specific serdes. But the benefit you get from
> this is dwarfed by the complexity of then having to synchronize these two
> streams if you want to replay them. Your situation seems simpler though
> because you are not even doing event capture, but just logging complete
> entities out of Cassandra.
>
> > If I will use kafka like a datastore and search throgh the records,
>
> Interactive Queries API makes this very nice.
>
> On Thu, Jan 25, 2018 at 8:47 AM, Maria Pilar <pi...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I´m trying to understand the best practice to define the partition key. I
> > have defined some topics that they are related with entities in cassandra
> > data model, the relationship is one-to-one, one entity - one topic,
> because
> > I need to ensure the properly ordering in the events. I have created one
> > partition for each topic to ensure it as well.
> >
> > If I will use kafka like a datastore and search throgh the records, I
> know
> > that could be a best practice use the partition key of Cassandra (e.g
> > Customer ID) as a partition key in kafka
> >
> > any comment please ??
> >
> > thanks
> >
>

Re: Best practices Partition Key

Posted by Dmitry Minkovsky <dm...@gmail.com>.

> I know that could be a best practice use the partition key of Cassandra
(e.g Customer ID) as a partition key in kafka

Yeah, the Kafka Producer will hash that key with murmur so all entities
coming out of cassandra with the same partition key will end up on the same
kafka partition. Then you can use Kafka Streams Interactive Queries to get
data..

On Thu, Jan 25, 2018 at 10:02 AM, Dmitry Minkovsky <dm...@gmail.com>
wrote:

> > one entity - one topic, because I need to ensure the properly ordering
> in the events.
>
> This is a great in insight. I discovered that keeping entity-related
> things on one topic is much easier than splitting entity-related things
> onto multiple topics. If you have one topic, replaying that topic is
> trivial. If you have multiple topics, replaying those topics requires
> careful synchronization. In my case, I am doing event capture and I have
> entity-related events on multiple topics. For example, for a user entity I
> have topics `join-requests` and `settings-update-requests`. Having separate
> topics is superficially nicer in terms of consuming them with Kafka
> Streams: you can set up topic-specific serdes. But the benefit you get from
> this is dwarfed by the complexity of then having to synchronize these two
> streams if you want to replay them. Your situation seems simpler though
> because you are not even doing event capture, but just logging complete
> entities out of Cassandra.
>
> > If I will use kafka like a datastore and search throgh the records,
>
> Interactive Queries API makes this very nice.
>
> On Thu, Jan 25, 2018 at 8:47 AM, Maria Pilar <pi...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I´m trying to understand the best practice to define the partition key. I
>> have defined some topics that they are related with entities in cassandra
>> data model, the relationship is one-to-one, one entity - one topic,
>> because
>> I need to ensure the properly ordering in the events. I have created one
>> partition for each topic to ensure it as well.
>>
>> If I will use kafka like a datastore and search throgh the records, I know
>> that could be a best practice use the partition key of Cassandra (e.g
>> Customer ID) as a partition key in kafka
>>
>> any comment please ??
>>
>> thanks
>>
>
>

Re: Best practices Partition Key

Posted by Dmitry Minkovsky <dm...@gmail.com>.

> one entity - one topic, because I need to ensure the properly ordering in
the events.

This is a great in insight. I discovered that keeping entity-related things
on one topic is much easier than splitting entity-related things onto
multiple topics. If you have one topic, replaying that topic is trivial. If
you have multiple topics, replaying those topics requires careful
synchronization. In my case, I am doing event capture and I have
entity-related events on multiple topics. For example, for a user entity I
have topics `join-requests` and `settings-update-requests`. Having separate
topics is superficially nicer in terms of consuming them with Kafka
Streams: you can set up topic-specific serdes. But the benefit you get from
this is dwarfed by the complexity of then having to synchronize these two
streams if you want to replay them. Your situation seems simpler though
because you are not even doing event capture, but just logging complete
entities out of Cassandra.

> If I will use kafka like a datastore and search throgh the records,

Interactive Queries API makes this very nice.

On Thu, Jan 25, 2018 at 8:47 AM, Maria Pilar <pi...@gmail.com> wrote:

> Hi everyone,
>
> I´m trying to understand the best practice to define the partition key. I
> have defined some topics that they are related with entities in cassandra
> data model, the relationship is one-to-one, one entity - one topic, because
> I need to ensure the properly ordering in the events. I have created one
> partition for each topic to ensure it as well.
>
> If I will use kafka like a datastore and search throgh the records, I know
> that could be a best practice use the partition key of Cassandra (e.g
> Customer ID) as a partition key in kafka
>
> any comment please ??
>
> thanks
>