You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Baki Hayat <ba...@gmail.com> on 2020/05/10 10:29:50 UTC

Kafka Streams, WindowBy, Grace Period, Late events, Suppres operation

Hello Friends,

I wrote into stackoverflow but also i am writing here,

I have couple of questions about window operation, grace period and late
events.

Could you please check my problem about group by with adding time field as
a key or window by and group by without time field ?

Here is detail explanation...

https://stackoverflow.com/questions/61680407/kafka-streams-groupby-late-event-persistentwindowstore-windowby-with-gra

Re: Kafka Streams, WindowBy, Grace Period, Late events, Suppres operation

Posted by Baki Hayat <ba...@gmail.com>.
Hello John,

Thank you for your response,

I am using custom time extractor, in final stage i am persisting streamed
data into timeseries database and when i did a double check from there, i
confirmed that time calculation seems correct.

How about warning message that i mentioned ? How can be possible that i am
taking that warning ? I mean that i could not get the point/root cause ...
or do i need to pass that message without taking any action ?

On 11 May 2020 Mon at 18:33 John Roesler <vv...@apache.org> wrote:

> Hello Baki,
>
> It looks like option 2 is really what you want. The purpose of the time
> window stores is to allow deleting old data when you need to group by a
> time dimension, which naturally results in an infinite key space.
>
> If you don’t want to wait for the final result, can you just not add the
> suppression? It’s only purpose is to _not_ emit any data until _after_ the
> grace period expires. Without it, streams will still respect the grace
> period by updating the result whenever there is late arriving data.
>
> Lastly, that is a check for overflow. The timestamp is supposed to be a
> timestamp in milliseconds since the epoch. If you’re getting an overflow,
> it means your time stamps are from the far future. You might want to
> manually inspect them.
>
> I hope this helps,
> John
>
>
> On Sun, May 10, 2020, at 05:29, Baki Hayat wrote:
> > Hello Friends,
> >
> > I wrote into stackoverflow but also i am writing here,
> >
> > I have couple of questions about window operation, grace period and late
> > events.
> >
> > Could you please check my problem about group by with adding time field
> as
> > a key or window by and group by without time field ?
> >
> > Here is detail explanation...
> >
> >
> https://stackoverflow.com/questions/61680407/kafka-streams-groupby-late-event-persistentwindowstore-windowby-with-gra
> >
>

Re: Kafka Streams, WindowBy, Grace Period, Late events, Suppres operation

Posted by Baki Hayat <ba...@gmail.com>.
Hello Scott,

Thank you for your response,

Actually i was not aware that i should window deserializer... in my code it
was not asked me to add window deserializer.

In which part should i provide deserializer ? I am  just providing window
size in my code.

BR

On 11 May 2020 Mon at 18:48 Scott Reynolds <sr...@twilio.com.invalid>
wrote:

> Baki,
>
> You can get this message "o.a.k.s.state.internals.WindowKeySchema :
> Warning: window end time was truncated to Long.MAX"" when your
> TimeWindowDeserializer is created without a windowSize. There are two
> constructors for a TimeWindowDeserializer, are you using the one with
> WindowSize?
>
>
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindowedDeserializer.java#L46-L55
>
> It calls WindowKeySchema with a Long.MAX_VALUE
>
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindowedDeserializer.java#L84-L90
>
> On Mon, May 11, 2020 at 8:33 AM John Roesler <vv...@apache.org> wrote:
>
> > Hello Baki,
> >
> > It looks like option 2 is really what you want. The purpose of the time
> > window stores is to allow deleting old data when you need to group by a
> > time dimension, which naturally results in an infinite key space.
> >
> > If you don’t want to wait for the final result, can you just not add the
> > suppression? It’s only purpose is to _not_ emit any data until _after_
> the
> > grace period expires. Without it, streams will still respect the grace
> > period by updating the result whenever there is late arriving data.
> >
> > Lastly, that is a check for overflow. The timestamp is supposed to be a
> > timestamp in milliseconds since the epoch. If you’re getting an overflow,
> > it means your time stamps are from the far future. You might want to
> > manually inspect them.
> >
> > I hope this helps,
> > John
> >
> >
> > On Sun, May 10, 2020, at 05:29, Baki Hayat wrote:
> > > Hello Friends,
> > >
> > > I wrote into stackoverflow but also i am writing here,
> > >
> > > I have couple of questions about window operation, grace period and
> late
> > > events.
> > >
> > > Could you please check my problem about group by with adding time field
> > as
> > > a key or window by and group by without time field ?
> > >
> > > Here is detail explanation...
> > >
> > >
> >
> https://urldefense.com/v3/__https://stackoverflow.com/questions/61680407/kafka-streams-groupby-late-event-persistentwindowstore-windowby-with-gra__;!!NCc8flgU!M1JXhEsl1PDFilfVGcXuTiFhBfnyiB9FGoTRYUhWV06V4IYu6vsRKqik-DMtmrIO$
> > >
> >
>

Re: Kafka Streams, WindowBy, Grace Period, Late events, Suppres operation

Posted by Scott Reynolds <sr...@twilio.com.INVALID>.
Baki,

You can get this message "o.a.k.s.state.internals.WindowKeySchema :
Warning: window end time was truncated to Long.MAX"" when your
TimeWindowDeserializer is created without a windowSize. There are two
constructors for a TimeWindowDeserializer, are you using the one with
WindowSize?

https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindowedDeserializer.java#L46-L55

It calls WindowKeySchema with a Long.MAX_VALUE
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindowedDeserializer.java#L84-L90

On Mon, May 11, 2020 at 8:33 AM John Roesler <vv...@apache.org> wrote:

> Hello Baki,
>
> It looks like option 2 is really what you want. The purpose of the time
> window stores is to allow deleting old data when you need to group by a
> time dimension, which naturally results in an infinite key space.
>
> If you don’t want to wait for the final result, can you just not add the
> suppression? It’s only purpose is to _not_ emit any data until _after_ the
> grace period expires. Without it, streams will still respect the grace
> period by updating the result whenever there is late arriving data.
>
> Lastly, that is a check for overflow. The timestamp is supposed to be a
> timestamp in milliseconds since the epoch. If you’re getting an overflow,
> it means your time stamps are from the far future. You might want to
> manually inspect them.
>
> I hope this helps,
> John
>
>
> On Sun, May 10, 2020, at 05:29, Baki Hayat wrote:
> > Hello Friends,
> >
> > I wrote into stackoverflow but also i am writing here,
> >
> > I have couple of questions about window operation, grace period and late
> > events.
> >
> > Could you please check my problem about group by with adding time field
> as
> > a key or window by and group by without time field ?
> >
> > Here is detail explanation...
> >
> >
> https://urldefense.com/v3/__https://stackoverflow.com/questions/61680407/kafka-streams-groupby-late-event-persistentwindowstore-windowby-with-gra__;!!NCc8flgU!M1JXhEsl1PDFilfVGcXuTiFhBfnyiB9FGoTRYUhWV06V4IYu6vsRKqik-DMtmrIO$
> >
>

Re: Kafka Streams, WindowBy, Grace Period, Late events, Suppres operation

Posted by John Roesler <vv...@apache.org>.
Hello Baki,

It looks like option 2 is really what you want. The purpose of the time window stores is to allow deleting old data when you need to group by a time dimension, which naturally results in an infinite key space. 

If you don’t want to wait for the final result, can you just not add the suppression? It’s only purpose is to _not_ emit any data until _after_ the grace period expires. Without it, streams will still respect the grace period by updating the result whenever there is late arriving data.

Lastly, that is a check for overflow. The timestamp is supposed to be a timestamp in milliseconds since the epoch. If you’re getting an overflow, it means your time stamps are from the far future. You might want to manually inspect them. 

I hope this helps,
John


On Sun, May 10, 2020, at 05:29, Baki Hayat wrote:
> Hello Friends,
> 
> I wrote into stackoverflow but also i am writing here,
> 
> I have couple of questions about window operation, grace period and late
> events.
> 
> Could you please check my problem about group by with adding time field as
> a key or window by and group by without time field ?
> 
> Here is detail explanation...
> 
> https://stackoverflow.com/questions/61680407/kafka-streams-groupby-late-event-persistentwindowstore-windowby-with-gra
>