You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Yann Simon <ya...@gmail.com> on 2015/01/11 17:34:17 UTC

Using Kafka for Event Sourcing

Hi,

after having read
http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying,
I am considering Kafka for an application build around CQRS and Event
Sourcing.

Disclaimer: I read the documentation but do not have any experience with
Kafka at this time.

In that constellation, the queried state is build by applying every events
from the beginning.
It is also important:
- that all events are ordered, at least per entity
- that all events are stored (no deletion) OR that events are compacted in
such a way that the final state stays the same

Questions:
- I read that Kafka can delete events based on time or on disk usage. Is it
possible to completely deactivate events deletion? (without using log
compaction, this is my next questions)

Kafka can also compact log (
https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction and
http://kafka.apache.org/documentation.html#compaction).
- How can we structure all events to that the final state stays the same?

For example, if I have the following events:
- create user 456
- for user 456, set email "email1@dns"
- for user 456, set email "email2@dns"

The log compaction should keep the user creation and the last email setting.
Should I set events like that:
- id "user-456-creation": create user 456
- id "user-456-email-set": for user 456, set email "email1@dns"
- id "user-456-email-set": for user 456, set email "email2@dns"

- Can we provide a custom log compaction logic?

If somebody is using Kafka for this purpose, I'd be glad to hear some
return of experience.

Cheers,
Yann

Re: Using Kafka for Event Sourcing

Posted by Yann Simon <ya...@gmail.com>.

up, as I am not sure you receive this email.

Le Sun Jan 11 2015 at 5:34:17 PM, Yann Simon <ya...@gmail.com> a
écrit :

> Hi,
>
> after having read
> http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying,
> I am considering Kafka for an application build around CQRS and Event
> Sourcing.
>
> Disclaimer: I read the documentation but do not have any experience with
> Kafka at this time.
>
> In that constellation, the queried state is build by applying every events
> from the beginning.
> It is also important:
> - that all events are ordered, at least per entity
> - that all events are stored (no deletion) OR that events are compacted in
> such a way that the final state stays the same
>
> Questions:
> - I read that Kafka can delete events based on time or on disk usage. Is
> it possible to completely deactivate events deletion? (without using log
> compaction, this is my next questions)
>
> Kafka can also compact log (
> https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction and
> http://kafka.apache.org/documentation.html#compaction).
> - How can we structure all events to that the final state stays the same?
>
> For example, if I have the following events:
> - create user 456
> - for user 456, set email "email1@dns"
> - for user 456, set email "email2@dns"
>
> The log compaction should keep the user creation and the last email
> setting.
> Should I set events like that:
> - id "user-456-creation": create user 456
> - id "user-456-email-set": for user 456, set email "email1@dns"
> - id "user-456-email-set": for user 456, set email "email2@dns"
>
> - Can we provide a custom log compaction logic?
>
> If somebody is using Kafka for this purpose, I'd be glad to hear some
> return of experience.
>
> Cheers,
> Yann
>
>

Re: Using Kafka for Event Sourcing

Posted by Jay Kreps <ja...@confluent.io>.

Hey Yann,

Yes, you can just make the retention infinite which will disable any
deletion.

What you describe with compaction might work, but wasn't exactly the
intention.

This type of event logging can work two ways: you can log the "command" or
you can log the result of the command. In databases this is sometimes
referred to as logical and physical logging.

The intention of Kafka's compaction feature is to support physical logging
where the last update contains the full aggregated state so far. So in your
example the events we would expect would be
  456 => {"id": 456}
  456 => {"id":456, "email":"email1@dns"}
  456 => {"id":456, "email":"email2@dns"}

A more verbose approach could even log the prior state, the current state,
and the command (as some databases do).

Why do it this way? Kafka doesn't allow customizing the compaction. There
are embeddable event stores like (http://geteventstore.com/) that allow
plugging in customized business logic for compacting events. However Kafka
is built to run as a central service not per-application, so in that model
deploying business logic into the central Kafka cluster every time you
needed to change your compaction logic is a non-starter.

There were a number of data systems at LinkedIn that worked off this log
like this, but I can't give a good comparison to other CRQS systems since I
haven't used any of them.

-Jay

On Sun, Jan 11, 2015 at 8:34 AM, Yann Simon <ya...@gmail.com> wrote:

> Hi,
>
> after having read
>
> http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
> ,
> I am considering Kafka for an application build around CQRS and Event
> Sourcing.
>
> Disclaimer: I read the documentation but do not have any experience with
> Kafka at this time.
>
> In that constellation, the queried state is build by applying every events
> from the beginning.
> It is also important:
> - that all events are ordered, at least per entity
> - that all events are stored (no deletion) OR that events are compacted in
> such a way that the final state stays the same
>
> Questions:
> - I read that Kafka can delete events based on time or on disk usage. Is it
> possible to completely deactivate events deletion? (without using log
> compaction, this is my next questions)
>
> Kafka can also compact log (
> https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction and
> http://kafka.apache.org/documentation.html#compaction).
> - How can we structure all events to that the final state stays the same?
>
> For example, if I have the following events:
> - create user 456
> - for user 456, set email "email1@dns"
> - for user 456, set email "email2@dns"
>
> The log compaction should keep the user creation and the last email
> setting.
> Should I set events like that:
> - id "user-456-creation": create user 456
> - id "user-456-email-set": for user 456, set email "email1@dns"
> - id "user-456-email-set": for user 456, set email "email2@dns"
>
> - Can we provide a custom log compaction logic?
>
> If somebody is using Kafka for this purpose, I'd be glad to hear some
> return of experience.
>
> Cheers,
> Yann
>