You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Mark Tinsley <ma...@gmail.com> on 2019/12/14 21:58:25 UTC

Reset topology

Hi,

I've got a question or more of a confirmation, and I hope this is the right
place to post (please point me elsewhere if not). My question is around
replaying events in a topology to rebuild the topology state (KTables and
stores etc are what is meant by state for the topology).

To try and give a bit more grounding I’ll use an example of a stock system.
In this system I have stock and locations. You can move stock to other
locations or get told by other systems where stock is.

Simple case moving stock in the system:

In the above example the Kafka topology consumes allocate commands,
location events and item events. The topology builds up a KTable’s from the
events to understand the current state of where the item is, in order to
validate if the allocate command can proceed.

The building up of this Ktable from events is a form of event sourcing to
my knowledge (please correct if wrong).

Now let’s say we have another condition for allocating stock. Only certain
locations have a cold storage, items which require cold storage can only be
put into locations that have cold storage.

We have the information in the original event, but not cared about in the
state built in the ktable. So that means we will need to recreate the state
of the Ktable by re-ingesting the events and building up the new state.

To do this we will need to do two things, stop consuming allocate commands
and kick off the ./bin/kafka-streams-application-reset.sh script using the
item topic and location topic as input topics. The reason to stop consuming
the allocate commands is to ensure that the state is up to date before
processing new commands (as this may give incorrect results). The reason
for not setting the allocate command as an input topic is to ensure we do
not reprocess previous commands.

Does this sound correct?

So that’s reprocessing in the command-driven world, the other side is the
event-driven world. For this example, imagine we are ingesting a third
party system about where items are located. But the information does not
explicitly state the location, we need to work it out according to the
information given. Let’s say the first bit of information is all items that
require a cold store are located in a cold store that is linked to where
the stock comes from. So if the stock was made in location X it will be
stored in location X.


One possible thing to do would be to listen for an item created event, in
the topology we outlined above and allocate. But this now causes a problem
when resetting the application topology. The issue is we are building the
state from the item created events and allocating on them. This means that
resetting the allocation topology using the item as a source will cause the
allocations to run again. This re-running of the allocation based on the
item created event will mean it overrides any command based allocation.
Hope this makes sense, feel free to ask more if i’ve not explained this
clearly enough. Key take away here is don't event-source and do
event-driven on the same event in the same topology.

So to solve the above I think you would need two topologies. One for
creating aggregate of the item state and produce this as a state on a
topic. Then the other for performing the allocation. This would mean the
reset on the first would be a reset using the item and location topics as
normal, and the reset on the other (the allocaiton topology) would be
against the aggregate topic.

Does this sound correct?


I understand the above might not be too clear, I do tend to use the DDD
terminology quite freely. Please feel free to ask more.

I’m looking for confirmation on the above or corrections, both welcome.

Thanks,

Mark