You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Guillermo Lammers Corral <gu...@tecsisa.com> on 2016/04/19 19:16:55 UTC

Kafka Streams State Store and RocksDB

Hello,

I read in the docs that Kafka Streams stores the computed aggregations in a
local embedded key-value store (RocksDB by default), i.e., Kafka Streams
provides so-called state stores. I'm wondering about the relationship
between each state store and its replicated changelog Kafka topic.

If we use the WorkCountDemo example I would like to know what are the
sequence of events to understand both concepts when we execute something
like that:

I send to topic the words: "hello", "world", "hello"

Changelog topic messages:

"hello" => 1
"world" => 1
"hello" => 2

It's ok.

Q1) What is the status of RocksDB at this moment?

Q2) If I delete all data in the changelog Kafka topic and send a new
"hello", I can see in the changelog topic:

"hello" => 3

What is happening? CountByKey are counting using the RocksDB data stored
prior to updating the changelog again?

Q3) Could someone explain me in depth the following process step by step?
What happen if the changelog Kafka topic was deleted as I did in the
question above?

"If tasks run on a machine that fails and are restarted on another machine,
Kafka Streams guarantees to restore their associated state stores to the
content before the failure by replaying the corresponding changelog topics
prior to resuming the processing on the newly started tasks."

Q4) What are the differences with operations without changelog Kafka topic
associated like joins between two KTables when machine fails occurs and we
need fault tolerance policy?

Thanks!

Re: Kafka Streams State Store and RocksDB

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Guillermo,

1). It will have two rows: {"hello" => 2} and {"world" => 1}.

2). That is correct. Note that changelog records the most recent values for
each key, so if you do not delete the data, the new "hello" => 3 record
would practically make the previous two "hello" => 1 and "hello" => 2
obsolete.

3). Changelog data is used for restoration only. If you delete the topic,
but no failure has ever happened, then everything is OK; but if you ever
have a failure, you cannot restore the state as the changelog data is not
available anymore.

4). Any operations that have associated state stores will have changelog
topic by default; for KTable joins it will also have changelog for its
materialized stream data.


Guozhang



On Tue, Apr 19, 2016 at 10:16 AM, Guillermo Lammers Corral <
guillermo.lammers.corral@tecsisa.com> wrote:

> Hello,
>
> I read in the docs that Kafka Streams stores the computed aggregations in a
> local embedded key-value store (RocksDB by default), i.e., Kafka Streams
> provides so-called state stores. I'm wondering about the relationship
> between each state store and its replicated changelog Kafka topic.
>
> If we use the WorkCountDemo example I would like to know what are the
> sequence of events to understand both concepts when we execute something
> like that:
>
> I send to topic the words: "hello", "world", "hello"
>
> Changelog topic messages:
>
> "hello" => 1
> "world" => 1
> "hello" => 2
>
> It's ok.
>
> Q1) What is the status of RocksDB at this moment?
>
> Q2) If I delete all data in the changelog Kafka topic and send a new
> "hello", I can see in the changelog topic:
>
> "hello" => 3
>
> What is happening? CountByKey are counting using the RocksDB data stored
> prior to updating the changelog again?
>
> Q3) Could someone explain me in depth the following process step by step?
> What happen if the changelog Kafka topic was deleted as I did in the
> question above?
>
> "If tasks run on a machine that fails and are restarted on another machine,
> Kafka Streams guarantees to restore their associated state stores to the
> content before the failure by replaying the corresponding changelog topics
> prior to resuming the processing on the newly started tasks."
>
> Q4) What are the differences with operations without changelog Kafka topic
> associated like joins between two KTables when machine fails occurs and we
> need fault tolerance policy?
>
> Thanks!
>



-- 
-- Guozhang