You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Alex Leung (BLOOMBERG/ SAN FRAN)" <al...@bloomberg.net> on 2019/10/16 23:16:25 UTC

Kafka Streams processing.guarantee behavior

We're using the Kafka Streams processor API and directly performing get() and put() on two different state stores (KeyValueStore) from a single processor. If two puts are performed from one processor, e.g.:

1. store1.put(...)
2. store2.put(...)

my understanding is that if processing.guarantee="at_least_once", it is possible that the commit from 1. succeeds but the commit from 2. fails (or vice versa). And if we need to guarantee that either both or neither succeed, we need to enable processing.guarantee="exactly_once". I believe the relevant code is here: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L547

Is my understanding correct?

Thanks,
Alex

Re: Kafka Streams processing.guarantee behavior

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Sounds about right.

On commit, both stores would be flushed to local disk and the producer
would be flushed to ensure all write to the changelog topics are done.
Only afterwards, the input topic offsets would be committed.

Because flushing happens one after each other for the store (and in no
particular order), it would happen that one flush success while an error
occurs before the second one succeeds. Similarly, the producer might
have written to the changelog topic successfully or not at this point.

Of course, the input data would be reprocessed after restart and hence,
if your store updates are idempotent, you might be fine with
"at_least_once" guaranteed. If your store updates are not idempotent,
using "exactly_once" would ensure that "partial/dirty" write to the
stores are rolled back, before the input data is reprocessed.


-Matthias

On 10/16/19 4:16 PM, Alex Leung (BLOOMBERG/ SAN FRAN) wrote:
> We're using the Kafka Streams processor API and directly performing get() and put() on two different state stores (KeyValueStore) from a single processor. If two puts are performed from one processor, e.g.:
> 
> 1. store1.put(...)
> 2. store2.put(...)
> 
> my understanding is that if processing.guarantee="at_least_once", it is possible that the commit from 1. succeeds but the commit from 2. fails (or vice versa). And if we need to guarantee that either both or neither succeed, we need to enable processing.guarantee="exactly_once". I believe the relevant code is here: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L547
> 
> Is my understanding correct?
> 
> Thanks,
> Alex
>