You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Pedro Silvestre <pm...@gmail.com> on 2019/11/14 21:42:08 UTC

Details on read-your-writes changelog

Hello all,

I was reading through the Samza paper (
http://www.vldb.org/pvldb/vol10/p1634-noghabi.pdf, very nicely written by
the way), and in the section on fault-tolerance I noticed that the
changelog is implemented with read-your-writes guarantees. Knowing that
this changelog is a Kafka stream, I cannot find any information on whether
Kafka provides read-your-writes guarantees.

Intuitively, since producers and consumers are separate entities I would
expect this guarantee to not exist: a process acting as both a producer and
consumer, which executes a produce() followed by a poll() is not guaranteed
to read the produced record immediately.

So, how is the read-your-writes changelog implemented?

Regards,

Pedro Silvestre

Re: Details on read-your-writes changelog

Posted by rayman preet <ra...@gmail.com>.
Hi Pedro,

Thanks for reaching out.
A Samza app using KVStores can expect read-your-writes guarantee because
a. in the failure free case, rocksdb provides that, and b. in case of
failures, samza resumes input (and state) consumption
from the last complete checkpoint (at which everything written to store is
guaranteed to be persisted/flushed to kafka's changelog).
At checkpoint, Samza simply flushes the local rocksdbstore, the kafka
changelog producer and then proceeds to checkpoint input only if both
flushes succeed.

--
thanks
rayman




On Thu, Nov 14, 2019 at 1:42 PM Pedro Silvestre <pm...@gmail.com>
wrote:

> Hello all,
>
> I was reading through the Samza paper (
> http://www.vldb.org/pvldb/vol10/p1634-noghabi.pdf, very nicely written by
> the way), and in the section on fault-tolerance I noticed that the
> changelog is implemented with read-your-writes guarantees. Knowing that
> this changelog is a Kafka stream, I cannot find any information on whether
> Kafka provides read-your-writes guarantees.
>
> Intuitively, since producers and consumers are separate entities I would
> expect this guarantee to not exist: a process acting as both a producer and
> consumer, which executes a produce() followed by a poll() is not guaranteed
> to read the produced record immediately.
>
> So, how is the read-your-writes changelog implemented?
>
> Regards,
>
> Pedro Silvestre
>


-- 
thanks
rayman