You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Tim Ward <ti...@origamienergy.com> on 2019/08/09 14:09:13 UTC

How to start a stream from only new records?

With a real time application, nobody is interested in old data, and in particular they're not interested in paying to spend time processing it only to throw it away, thereby delaying up to date data.

How do I tell StreamsBuilder.stream() to read only new data from its topic, and not process any backlog?

Tim Ward

This email is from Origami Energy Limited. The contents of this email and any attachment are confidential to the intended recipient(s). If you are not an intended recipient: (i) do not use, disclose, distribute, copy or publish this email or its contents; (ii) please contact Origami Energy Limited immediately; and then (iii) delete this email. For more information, our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy Limited (company number 8619644) is a company registered in England with its registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.

Re: How to start a stream from only new records?

Posted by Patrik Kleindl <pk...@gmail.com>.
Hi

Our requirement is related, we want our streams application to only process
messages from the last x weeks.
On new deployments this requires starting the application first, stopping
the application and then resetting the offsets.
I have created https://issues.apache.org/jira/browse/KAFKA-8766 with the
idea to allow providing a custom offset selection mechanism.
No KIP has been filed yet but maybe it helps to aggregate similar cases.

best regards

Patrik

On Tue, 13 Aug 2019 at 09:26, Matthias J. Sax <ma...@confluent.io> wrote:

> You would need to delete committed offsets for the application.id and
> set `auto.offset.reset="latest" to get the behavior you want.
>
>
> -Matthias
>
> On 8/12/19 1:20 AM, Tim Ward wrote:
> > I believe not, because that only causes the application to start reading
> from latest when there is no recorded offset at application start, no?
> >
> > What I need is to be able to specify, by topic, that when the
> application starts it doesn't want to see anything other than new data,
> regardless  of what offset it committed last time it ran.
> >
> > Tim Ward
> >
> > -----Original Message-----
> > From: Boyang Chen <re...@gmail.com>
> > Sent: 09 August 2019 17:23
> > To: users@kafka.apache.org
> > Subject: Re: How to start a stream from only new records?
> >
> > Hey Tim,
> >
> > if you are talking about avoid re-processing data and start consumption
> > from latest, you could set your `offset.reset.policy` to latest.
> >
> > Let me know if this answers your question.
> >
> > On Fri, Aug 9, 2019 at 7:09 AM Tim Ward <ti...@origamienergy.com>
> wrote:
> >
> >> With a real time application, nobody is interested in old data, and in
> >> particular they're not interested in paying to spend time processing it
> >> only to throw it away, thereby delaying up to date data.
> >>
> >> How do I tell StreamsBuilder.stream() to read only new data from its
> >> topic, and not process any backlog?
> >>
> >> Tim Ward
> >>
> >> This email is from Origami Energy Limited. The contents of this email
> and
> >> any attachment are confidential to the intended recipient(s). If you are
> >> not an intended recipient: (i) do not use, disclose, distribute, copy or
> >> publish this email or its contents; (ii) please contact Origami Energy
> >> Limited immediately; and then (iii) delete this email. For more
> >> information, our privacy policy is available here:
> >> https://origamienergy.com/privacy-policy/. Origami Energy Limited
> >> (company number 8619644) is a company registered in England with its
> >> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
> >>
> > This email is from Origami Energy Limited. The contents of this email
> and any attachment are confidential to the intended recipient(s). If you
> are not an intended recipient: (i) do not use, disclose, distribute, copy
> or publish this email or its contents; (ii) please contact Origami Energy
> Limited immediately; and then (iii) delete this email. For more
> information, our privacy policy is available here:
> https://origamienergy.com/privacy-policy/. Origami Energy Limited
> (company number 8619644) is a company registered in England with its
> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
> >
>
>

Re: How to start a stream from only new records?

Posted by "Matthias J. Sax" <ma...@confluent.io>.
You would need to delete committed offsets for the application.id and
set `auto.offset.reset="latest" to get the behavior you want.


-Matthias

On 8/12/19 1:20 AM, Tim Ward wrote:
> I believe not, because that only causes the application to start reading from latest when there is no recorded offset at application start, no?
> 
> What I need is to be able to specify, by topic, that when the application starts it doesn't want to see anything other than new data, regardless  of what offset it committed last time it ran.
> 
> Tim Ward
> 
> -----Original Message-----
> From: Boyang Chen <re...@gmail.com>
> Sent: 09 August 2019 17:23
> To: users@kafka.apache.org
> Subject: Re: How to start a stream from only new records?
> 
> Hey Tim,
> 
> if you are talking about avoid re-processing data and start consumption
> from latest, you could set your `offset.reset.policy` to latest.
> 
> Let me know if this answers your question.
> 
> On Fri, Aug 9, 2019 at 7:09 AM Tim Ward <ti...@origamienergy.com> wrote:
> 
>> With a real time application, nobody is interested in old data, and in
>> particular they're not interested in paying to spend time processing it
>> only to throw it away, thereby delaying up to date data.
>>
>> How do I tell StreamsBuilder.stream() to read only new data from its
>> topic, and not process any backlog?
>>
>> Tim Ward
>>
>> This email is from Origami Energy Limited. The contents of this email and
>> any attachment are confidential to the intended recipient(s). If you are
>> not an intended recipient: (i) do not use, disclose, distribute, copy or
>> publish this email or its contents; (ii) please contact Origami Energy
>> Limited immediately; and then (iii) delete this email. For more
>> information, our privacy policy is available here:
>> https://origamienergy.com/privacy-policy/. Origami Energy Limited
>> (company number 8619644) is a company registered in England with its
>> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
>>
> This email is from Origami Energy Limited. The contents of this email and any attachment are confidential to the intended recipient(s). If you are not an intended recipient: (i) do not use, disclose, distribute, copy or publish this email or its contents; (ii) please contact Origami Energy Limited immediately; and then (iii) delete this email. For more information, our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy Limited (company number 8619644) is a company registered in England with its registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
> 


RE: How to start a stream from only new records?

Posted by Tim Ward <ti...@origamienergy.com.INVALID>.
I believe not, because that only causes the application to start reading from latest when there is no recorded offset at application start, no?

What I need is to be able to specify, by topic, that when the application starts it doesn't want to see anything other than new data, regardless  of what offset it committed last time it ran.

Tim Ward

-----Original Message-----
From: Boyang Chen <re...@gmail.com>
Sent: 09 August 2019 17:23
To: users@kafka.apache.org
Subject: Re: How to start a stream from only new records?

Hey Tim,

if you are talking about avoid re-processing data and start consumption
from latest, you could set your `offset.reset.policy` to latest.

Let me know if this answers your question.

On Fri, Aug 9, 2019 at 7:09 AM Tim Ward <ti...@origamienergy.com> wrote:

> With a real time application, nobody is interested in old data, and in
> particular they're not interested in paying to spend time processing it
> only to throw it away, thereby delaying up to date data.
>
> How do I tell StreamsBuilder.stream() to read only new data from its
> topic, and not process any backlog?
>
> Tim Ward
>
> This email is from Origami Energy Limited. The contents of this email and
> any attachment are confidential to the intended recipient(s). If you are
> not an intended recipient: (i) do not use, disclose, distribute, copy or
> publish this email or its contents; (ii) please contact Origami Energy
> Limited immediately; and then (iii) delete this email. For more
> information, our privacy policy is available here:
> https://origamienergy.com/privacy-policy/. Origami Energy Limited
> (company number 8619644) is a company registered in England with its
> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
>
This email is from Origami Energy Limited. The contents of this email and any attachment are confidential to the intended recipient(s). If you are not an intended recipient: (i) do not use, disclose, distribute, copy or publish this email or its contents; (ii) please contact Origami Energy Limited immediately; and then (iii) delete this email. For more information, our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy Limited (company number 8619644) is a company registered in England with its registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.

Re: How to start a stream from only new records?

Posted by Boyang Chen <re...@gmail.com>.
Hey Tim,

if you are talking about avoid re-processing data and start consumption
from latest, you could set your `offset.reset.policy` to latest.

Let me know if this answers your question.

On Fri, Aug 9, 2019 at 7:09 AM Tim Ward <ti...@origamienergy.com> wrote:

> With a real time application, nobody is interested in old data, and in
> particular they're not interested in paying to spend time processing it
> only to throw it away, thereby delaying up to date data.
>
> How do I tell StreamsBuilder.stream() to read only new data from its
> topic, and not process any backlog?
>
> Tim Ward
>
> This email is from Origami Energy Limited. The contents of this email and
> any attachment are confidential to the intended recipient(s). If you are
> not an intended recipient: (i) do not use, disclose, distribute, copy or
> publish this email or its contents; (ii) please contact Origami Energy
> Limited immediately; and then (iii) delete this email. For more
> information, our privacy policy is available here:
> https://origamienergy.com/privacy-policy/. Origami Energy Limited
> (company number 8619644) is a company registered in England with its
> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
>