You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Guozhang Wang <wa...@gmail.com> on 2016/07/07 22:55:41 UTC

Re: Kafka Streams change log behavior

Hi Manu,

Just realized you have a question in the previous email that I missed.
Sorry!

The local log offset is actually only used for shortening the recovery time
upon resuming: it acts as an indicator that the task was cleanly shutdown
previously, and hence the state stores are in consistent status and can be
reused, otherwise we need to restore it from scratch.

We are still working on a better re-processing user-experience in Kafka
Streams, the main goal is that streams users can specify to "rewind to a
specific time in the past history and restart from there", with potentially
changed processing logic.


Guozhang

On Mon, May 23, 2016 at 3:19 PM, Manu Zhang <ow...@gmail.com> wrote:

> Well, that's something our user is asking for so I'd like to learn how
> Kafka Streams has done it. FYI, Gearpump checkpoints the mapping of event
> time clock to kafka offset for replay.
>
> We are thinking about allowing users to configure a topic to recover from
> after application upgrade.
>
> Meanwhile, we are exploring ways to make recover in application (where
> there could be duplicate checkpoints) more efficient. I know that Kafka
> checkpoints the change log offset to local file system before StreamTask
> close. How does that work out in practice ?
>
> Thanks,
> Manu
>
> On Tue, May 24, 2016 at 12:14 AM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hi Manu,
> >
> > That is a good question, currently if users change their application
> logic
> > while upgrading, the intermediate state stores may not be valid anymore,
> > and hence the change log. In this case we need to wipe out the invalid
> > internal data and restart from scratch.
> >
> > We are working on a better re-processing user experience now and once
> there
> > is a proposal we will present it to the community.
> >
> >
> > How's everything on the Gearpump side? Looking forward to hear from you
> for
> > any news :)
> >
> > Guozhang
> >
> >
> > On Mon, May 23, 2016 at 8:50 AM, Manu Zhang <ow...@gmail.com>
> > wrote:
> >
> > > That's one case. Well, if I get it right, the change log's topic name
> is
> > > bound to applicationId, so what I want to ask is how change log works
> for
> > > scenarios like application upgrade (with new applicationId, I guess).
> > Does
> > > the system handle that for user ?
> > >
> > > On Mon, May 23, 2016 at 9:21 PM Matthias J. Sax <matthias@confluent.io
> >
> > > wrote:
> > >
> > > > If I understand correctly, you want to have a non-Kafka-Streams
> > consumer
> > > > to read a change-log topic that was written by a Kafka-Streams
> > > application?
> > > >
> > > > That is certainly possible. Kafka is agnostic to Kafka Stream, ie,
> all
> > > > topics are regular topic and can be read by any consumer.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 05/23/2016 10:48 AM, Manu Zhang wrote:
> > > > > Thanks Matthias. Is there a way to allow users to read change logs
> > > from a
> > > > > previous application ?
> > > > >
> > > > > On Mon, May 23, 2016 at 3:57 PM Matthias J. Sax <
> > matthias@confluent.io
> > > >
> > > > > wrote:
> > > > >
> > > > >> Hi Manu,
> > > > >>
> > > > >> Yes. If a StreamTask recovers, it will write to the same change
> > log's
> > > > >> topic partition. Log compaction is enable per default for those
> > > topics.
> > > > >> You still might see some duplicates in your output. Currently,
> Kafka
> > > > >> Streams guarantees at-least-once processing (exactly-once
> processing
> > > is
> > > > >> on the roadmap though).
> > > > >>
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >> On 05/23/2016 06:10 AM, Manu Zhang wrote:
> > > > >>> Hi All,
> > > > >>>
> > > > >>> I'm new to Kafka Streams and have a question on change log.
> > > > >>> If a StreamTask fails and is restarted, will the change log be
> > > written
> > > > to
> > > > >>> the old change log's  topic partition ? Is it possible for some
> > > change
> > > > >> log
> > > > >>> topic partition to have duplicate records so that log compaction
> is
> > > > >>> required ?
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Manu Zhang
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang