You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Vjeran Marcinko <vj...@email.t-com.hr> on 2014/01/27 11:44:06 UTC

Event data modelling best practices?

Hi,

This I guess is not just question for Kafka, but for all event driven
systems, but since most of people here deal with events, I would like to
hear some basic suggestions for modelling event messages, or even better,
pointer to some relevant literature/website that deals with this stuff?

Anyway, my quetions are something like this...

If I have event that is spawned when some request is processed, such as:

BankAccountService.credit(long bankAccountId, long amount);
, and event that is triggered then is (in some pseudo data structure):

BankAccountCredited {
	long bankAccountId;
	long amount;
}

1. If I leave just these pieces of data in this event, the consumer would
not be able to reconstruct the state of bank account (account's balance
being the piece of state that changed), if not having the same logic present
in event accumulator (which is especially very problematic when code
versioning is in place, which is practically alwys)?

2. Because of previous code/logic requirement to reconstruct state, I guess
it would be wise to include piece of account state that changed, such as
adding balance after tha credit request execution:
BankAccountCredited {
	long bankAccountId;
	long amount;
	long balance;
}

3. Another option that maybe seems better when thinking that many different
events will want to report state of acount after action, then nested Bank
Account dana structure seems better, right?

BankAccountCredited {
	long amount;
	BankAccount {
		long id;
		long balance;
		boolean active;
	}
}
We can see that in this case there is also some fields rpesent (active) of
account entitiy that were not directly affected by credit action, but we
have them here because BankAccount dana structure contains all of fields,
that is OK, right?

4. What is some downstream consumers are interested in all events
("category") that change account's balance, meaning, maybe the consumer
doesn't care if event is BankAccountCredited or BankAccountDebited, because
he is interested in the category of evevnts that can be described as
"BankAccountBalanceChanged". Since there is no "supertyping" usually present
in popular serialization libs (Avro, Thrift...), how do you implement this -
do you subscribe consumer individually to all topics that contrain events
that change bank account balance, or you create one topic that contains all
of evevtns of that particular category? (the later aproach would not work
because categories doesn't have to be so straightforward, many events have
many-to-many relationship to various categories - in java it would simply be
implemented with using interfaces to mark categories, but here we don't have
that option)

5. What if some action mutate several entities when being processed? Do you
spawn 2 events from application layer, or you publish just one which
subsequently, by some real-time rpocessor, triggers spawning 2 various ones
- each for different  entitiy that was affected?

I could probably think of some other questions, but you get the point what
I'm interested in..

Best regards,
Vjeran