You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Mark van Leeuwen <ma...@vl.id.au> on 2016/03/28 16:53:52 UTC

Event sourcing and topic partitions

Hi all,

When using Kafka for event sourcing in a CQRS style app, what approach 
do you recommend for mapping DDD aggregates to topic partitions?

Assigning a partition to each aggregate seems at first to be the right 
approach: events can be replayed in correct order for each aggregate and 
there is no mixing of events for different aggregates.

But this page
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster
has the recommendation to " limit the number of partitions per broker to 
two to four thousand and the total number of partitions in the cluster 
to low tens of thousand".

One partition per aggregate will far exceed that number.

Thanks,
Mark

Re: Event sourcing and topic partitions

Posted by Mark van Leeuwen <ma...@vl.id.au>.

Thanks. Apparently there is nothing wrong with that :-)

I came to the same conclusion in an earlier post.

Would be good if someone having experience with Kafka and event sourcing 
corrected this stackoverflow answer (i.e. update 2):
http://stackoverflow.com/questions/17708489/using-kafka-as-a-cqrs-eventstore-good-idea/17813930#17813930


On 30/03/16 05:32, Cees de Groot wrote:
> What's wrong with multiple aggregates per partition? You'll still process
> all events for each aggregate in order. If you want to just replay for a
> single aggregate somewhere, Kafka can spit out events fast enough to allow
> you to quickly skip through all the stuff you don't need...
>
> On Mon, Mar 28, 2016 at 10:53 AM, Mark van Leeuwen <ma...@vl.id.au> wrote:
>
>> Hi all,
>>
>> When using Kafka for event sourcing in a CQRS style app, what approach do
>> you recommend for mapping DDD aggregates to topic partitions?
>>
>> Assigning a partition to each aggregate seems at first to be the right
>> approach: events can be replayed in correct order for each aggregate and
>> there is no mixing of events for different aggregates.
>>
>> But this page
>>
>> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster
>> has the recommendation to " limit the number of partitions per broker to
>> two to four thousand and the total number of partitions in the cluster to
>> low tens of thousand".
>>
>> One partition per aggregate will far exceed that number.
>>
>> Thanks,
>> Mark
>>
>
>

Re: Event sourcing and topic partitions

Posted by Cees de Groot <ce...@pagerduty.com>.

What's wrong with multiple aggregates per partition? You'll still process
all events for each aggregate in order. If you want to just replay for a
single aggregate somewhere, Kafka can spit out events fast enough to allow
you to quickly skip through all the stuff you don't need...

On Mon, Mar 28, 2016 at 10:53 AM, Mark van Leeuwen <ma...@vl.id.au> wrote:

> Hi all,
>
> When using Kafka for event sourcing in a CQRS style app, what approach do
> you recommend for mapping DDD aggregates to topic partitions?
>
> Assigning a partition to each aggregate seems at first to be the right
> approach: events can be replayed in correct order for each aggregate and
> there is no mixing of events for different aggregates.
>
> But this page
>
> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster
> has the recommendation to " limit the number of partitions per broker to
> two to four thousand and the total number of partitions in the cluster to
> low tens of thousand".
>
> One partition per aggregate will far exceed that number.
>
> Thanks,
> Mark
>



-- 
Cees de Groot
Principal Software Engineer
PagerDuty, Inc.

Re: Event sourcing and topic partitions

Posted by Daniel Schierbeck <da...@zendesk.com.INVALID>.

Have you looked into using a relational database as the primary store, with
something like Maxwell or Bottled Water as a broadcast mechanism?
On Mon, 28 Mar 2016 at 17:28 Daniel Schierbeck <da...@zendesk.com> wrote:

> I ended up abandoning the use of Kafka as a primary event store, for
> several reasons. One is the partition granularity issue; another is the
> lack of a way to guarantee exclusive write access, i.e. ensure that only a
> single process can commit an event for an aggregate at any one time.
> On Mon, 28 Mar 2016 at 16:54 Mark van Leeuwen <ma...@vl.id.au> wrote:
>
>> Hi all,
>>
>> When using Kafka for event sourcing in a CQRS style app, what approach
>> do you recommend for mapping DDD aggregates to topic partitions?
>>
>> Assigning a partition to each aggregate seems at first to be the right
>> approach: events can be replayed in correct order for each aggregate and
>> there is no mixing of events for different aggregates.
>>
>> But this page
>>
>> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster
>> <http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster>
>> has the recommendation to " limit the number of partitions per broker to
>> two to four thousand and the total number of partitions in the cluster
>> to low tens of thousand".
>>
>> One partition per aggregate will far exceed that number.
>>
>> Thanks,
>> Mark
>>
>

Re: Event sourcing and topic partitions

Posted by Mark van Leeuwen <ma...@vl.id.au>.

Thanks for your detailed reply. Viewing each partition as an ordered 
event bus is helpful.

The problem then moves to working out a strategy for mapping individual 
DDD aggregates to partitions in a cluster which distributes load and 
also allows for growing the cluster.


On 30/03/16 01:07, Helleren, Erik wrote:
> Well, if a partition is too large of a unit of order for your tastes, you
> can add publisher ID¹s to some metadata, or force partition mapping and
> use the key as an extra level of partitioning.  And, pick a topicName that
> describes all the traffic on that topic. An example:
>
> topicName=³ad.click.events²; //lets say we are a web advertising company
> and want to aggregate click events
>
> publisherId=³ad.click.handling.server.1234²; // this is the name of the
> entity publishing to the topic, server #1234
> targetPartition=config.targetPartition; //This needs to be configured
> somehow, and known to all consumers
> usingPublisherIDWithPartition = new ProducerRecord<>(topicName,
> targetPartition, publisherId, payload);
>
> Then, on consumers, you can use the kafka API to listen to a targeted
> partition, and then filter to just the publisher ID you are interested in.
>   You still get an ³even sourced² bus that would let you reliably replay
> those messages in the same order to any number of consumers.
>
> If you are willing to embrace kafka¹s consumer API to scale consumers,
> your life becomes even easier. Example:
>
> topicName=³ad.click.events²;
> publisherId=³ad.click.handling.server.1234²;
>
> usingOnlyPublisherID = new ProducerRecord<>(topicName, publisherId,
> payload);
>
>
>
> Kafka will ensure that all messages with the same publisher ID are sent to
> the same partition. And from there, all messages with the same publisher
> ID are sent in the same order to all consumers.  That is, provided there
> are no mid-execution changes to the number of partitions, which is a
> manual operation.  Also, number of publisherIds >> number of partitions
> within the topic.
>
> The think to keep in mind is that, while you would decrease any brokers
> overhead, you are implicitly limiting your throughput.  And feel free to
> replace ³publisherID² with ³TargetConsumerID² or ³OrderedEventBusID².
>
>
> On 3/29/16, 7:38 AM, "Mark van Leeuwen" <ma...@vl.id.au> wrote:
>
>> Thanks for sharing your experience.
>>
>> I'm surprised no one else has responded. Maybe there are few people
>> using Kafka for event sourcing.
>>
>> I did find one answer to my question
>> http://stackoverflow.com/questions/26060535/kafka-as-event-store-in-event-
>> sourced-system?rq=1
>>
>> I guess using a single topic partition as the event source for multiple
>> aggregates is similar to using a single table shard for that purpose in
>> a relational DB.
>>
>>
>> On 29/03/16 02:28, Daniel Schierbeck wrote:
>>> I ended up abandoning the use of Kafka as a primary event store, for
>>> several reasons. One is the partition granularity issue; another is the
>>> lack of a way to guarantee exclusive write access, i.e. ensure that
>>> only a
>>> single process can commit an event for an aggregate at any one time.
>>> On Mon, 28 Mar 2016 at 16:54 Mark van Leeuwen <ma...@vl.id.au> wrote:
>>>
>>>> Hi all,
>>>>
>>>> When using Kafka for event sourcing in a CQRS style app, what approach
>>>> do you recommend for mapping DDD aggregates to topic partitions?
>>>>
>>>> Assigning a partition to each aggregate seems at first to be the right
>>>> approach: events can be replayed in correct order for each aggregate
>>>> and
>>>> there is no mixing of events for different aggregates.
>>>>
>>>> But this page
>>>>
>>>>
>>>> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartition
>>>> s-in-a-kafka-cluster
>>>>
>>>> <http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitio
>>>> ns-in-a-kafka-cluster>
>>>> has the recommendation to " limit the number of partitions per broker
>>>> to
>>>> two to four thousand and the total number of partitions in the cluster
>>>> to low tens of thousand".
>>>>
>>>> One partition per aggregate will far exceed that number.
>>>>
>>>> Thanks,
>>>> Mark
>>>>
> ________________________________
>
> NOTICE: This message, and any attachments, are for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at E-Communication Disclaimer<http://www.cmegroup.com/tools-information/communications/e-communication-disclaimer.html>. If you are not the intended recipient, please delete this message. CME Group and its subsidiaries reserve the right to monitor all email communications that occur on CME Group information systems.

Re: Event sourcing and topic partitions

Posted by "Helleren, Erik" <Er...@cmegroup.com>.

Well, if a partition is too large of a unit of order for your tastes, you
can add publisher ID¹s to some metadata, or force partition mapping and
use the key as an extra level of partitioning.  And, pick a topicName that
describes all the traffic on that topic. An example:

topicName=³ad.click.events²; //lets say we are a web advertising company
and want to aggregate click events

publisherId=³ad.click.handling.server.1234²; // this is the name of the
entity publishing to the topic, server #1234
targetPartition=config.targetPartition; //This needs to be configured
somehow, and known to all consumers
usingPublisherIDWithPartition = new ProducerRecord<>(topicName,
targetPartition, publisherId, payload);

Then, on consumers, you can use the kafka API to listen to a targeted
partition, and then filter to just the publisher ID you are interested in.
 You still get an ³even sourced² bus that would let you reliably replay
those messages in the same order to any number of consumers.

If you are willing to embrace kafka¹s consumer API to scale consumers,
your life becomes even easier. Example:

topicName=³ad.click.events²;
publisherId=³ad.click.handling.server.1234²;

usingOnlyPublisherID = new ProducerRecord<>(topicName, publisherId,
payload);

Kafka will ensure that all messages with the same publisher ID are sent to
the same partition. And from there, all messages with the same publisher
ID are sent in the same order to all consumers.  That is, provided there
are no mid-execution changes to the number of partitions, which is a
manual operation.  Also, number of publisherIds >> number of partitions
within the topic.

The think to keep in mind is that, while you would decrease any brokers
overhead, you are implicitly limiting your throughput.  And feel free to
replace ³publisherID² with ³TargetConsumerID² or ³OrderedEventBusID².

On 3/29/16, 7:38 AM, "Mark van Leeuwen" <ma...@vl.id.au> wrote:

>Thanks for sharing your experience.
>
>I'm surprised no one else has responded. Maybe there are few people
>using Kafka for event sourcing.
>
>I did find one answer to my question
>http://stackoverflow.com/questions/26060535/kafka-as-event-store-in-event-
>sourced-system?rq=1
>
>I guess using a single topic partition as the event source for multiple
>aggregates is similar to using a single table shard for that purpose in
>a relational DB.
>
>
>On 29/03/16 02:28, Daniel Schierbeck wrote:
>> I ended up abandoning the use of Kafka as a primary event store, for
>> several reasons. One is the partition granularity issue; another is the
>> lack of a way to guarantee exclusive write access, i.e. ensure that
>>only a
>> single process can commit an event for an aggregate at any one time.
>> On Mon, 28 Mar 2016 at 16:54 Mark van Leeuwen <ma...@vl.id.au> wrote:
>>
>>> Hi all,
>>>
>>> When using Kafka for event sourcing in a CQRS style app, what approach
>>> do you recommend for mapping DDD aggregates to topic partitions?
>>>
>>> Assigning a partition to each aggregate seems at first to be the right
>>> approach: events can be replayed in correct order for each aggregate
>>>and
>>> there is no mixing of events for different aggregates.
>>>
>>> But this page
>>>
>>>
>>>http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartition
>>>s-in-a-kafka-cluster
>>>
>>><http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitio
>>>ns-in-a-kafka-cluster>
>>> has the recommendation to " limit the number of partitions per broker
>>>to
>>> two to four thousand and the total number of partitions in the cluster
>>> to low tens of thousand".
>>>
>>> One partition per aggregate will far exceed that number.
>>>
>>> Thanks,
>>> Mark
>>>
>

________________________________

NOTICE: This message, and any attachments, are for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at E-Communication Disclaimer<http://www.cmegroup.com/tools-information/communications/e-communication-disclaimer.html>. If you are not the intended recipient, please delete this message. CME Group and its subsidiaries reserve the right to monitor all email communications that occur on CME Group information systems.

Re: Event sourcing and topic partitions

Posted by Mark van Leeuwen <ma...@vl.id.au>.

Thanks for sharing your experience.

I'm surprised no one else has responded. Maybe there are few people 
using Kafka for event sourcing.

I did find one answer to my question
http://stackoverflow.com/questions/26060535/kafka-as-event-store-in-event-sourced-system?rq=1

I guess using a single topic partition as the event source for multiple 
aggregates is similar to using a single table shard for that purpose in 
a relational DB.


On 29/03/16 02:28, Daniel Schierbeck wrote:
> I ended up abandoning the use of Kafka as a primary event store, for
> several reasons. One is the partition granularity issue; another is the
> lack of a way to guarantee exclusive write access, i.e. ensure that only a
> single process can commit an event for an aggregate at any one time.
> On Mon, 28 Mar 2016 at 16:54 Mark van Leeuwen <ma...@vl.id.au> wrote:
>
>> Hi all,
>>
>> When using Kafka for event sourcing in a CQRS style app, what approach
>> do you recommend for mapping DDD aggregates to topic partitions?
>>
>> Assigning a partition to each aggregate seems at first to be the right
>> approach: events can be replayed in correct order for each aggregate and
>> there is no mixing of events for different aggregates.
>>
>> But this page
>>
>> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster
>> <http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster>
>> has the recommendation to " limit the number of partitions per broker to
>> two to four thousand and the total number of partitions in the cluster
>> to low tens of thousand".
>>
>> One partition per aggregate will far exceed that number.
>>
>> Thanks,
>> Mark
>>

Re: Event sourcing and topic partitions

Posted by Daniel Schierbeck <da...@zendesk.com.INVALID>.

I ended up abandoning the use of Kafka as a primary event store, for
several reasons. One is the partition granularity issue; another is the
lack of a way to guarantee exclusive write access, i.e. ensure that only a
single process can commit an event for an aggregate at any one time.
On Mon, 28 Mar 2016 at 16:54 Mark van Leeuwen <ma...@vl.id.au> wrote:

> Hi all,
>
> When using Kafka for event sourcing in a CQRS style app, what approach
> do you recommend for mapping DDD aggregates to topic partitions?
>
> Assigning a partition to each aggregate seems at first to be the right
> approach: events can be replayed in correct order for each aggregate and
> there is no mixing of events for different aggregates.
>
> But this page
>
> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster
> <http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster>
> has the recommendation to " limit the number of partitions per broker to
> two to four thousand and the total number of partitions in the cluster
> to low tens of thousand".
>
> One partition per aggregate will far exceed that number.
>
> Thanks,
> Mark
>