You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Bart van Deenen <ba...@fastmail.fm> on 2016/03/29 09:35:51 UTC

window limits ?

Hi all

I'm doing a fold on a sliding window, using
TimeCharacteristic.EventTime. For output I'm picking the timestamp of
the most recent event in the window, and use that to name the output (to
a file).

My question is: will a second run of Flink on the same set of data (from
Kafka) put the same events in a Window, or are the limits of a window
somehow dependent on the real time of the run.
The windows I'm using are two sliding timeWindow's and one timeWindowAll

Thanks for any answers

Bart van Deenen

Re: window limits ?

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
which version of Flink are you using and do you have a custom timestamp
extractor/watermark extractor? The semantics of this changed between 0.10
and 1.0 and I just want to make sure that you get the correct behavior.

Cheers,
Aljoscha

On Tue, 29 Mar 2016 at 10:13 Bart van Deenen <ba...@fastmail.fm>
wrote:

> Great!
>
> I'm actually taking the max of the timestamps, so I should be fine.
>
> Thanks
>
> Bart
>
> On Tue, Mar 29, 2016, at 09:48, Matthias J. Sax wrote:
> > If you use event time, a second run will put the exact same tuples into
> > the windows (event time implies, that the timestamp is encoded in the
> > tuple itself, thus, it is independent of the wall-clock time).
> >
> > However, be aware that the order of tuples *within a window* might
> > change!
> >
> > Thus, the timestamp of the "most recent event in the window" might
> > change...
> >
> >
> > -Matthias
> >
> > On 03/29/2016 09:35 AM, Bart van Deenen wrote:
> > > Hi all
> > >
> > > I'm doing a fold on a sliding window, using
> > > TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> > > the most recent event in the window, and use that to name the output
> (to
> > > a file).
> > >
> > > My question is: will a second run of Flink on the same set of data
> (from
> > > Kafka) put the same events in a Window, or are the limits of a window
> > > somehow dependent on the real time of the run.
> > > The windows I'm using are two sliding timeWindow's and one
> timeWindowAll
> > >
> > > Thanks for any answers
> > >
> > > Bart van Deenen
> > >
> >
> > Email had 1 attachment:
> > + signature.asc
> >   1k (application/pgp-signature)
>

Re: window limits ?

Posted by Bart van Deenen <ba...@fastmail.fm>.
Great!

I'm actually taking the max of the timestamps, so I should be fine.

Thanks

Bart

On Tue, Mar 29, 2016, at 09:48, Matthias J. Sax wrote:
> If you use event time, a second run will put the exact same tuples into
> the windows (event time implies, that the timestamp is encoded in the
> tuple itself, thus, it is independent of the wall-clock time).
> 
> However, be aware that the order of tuples *within a window* might
> change!
> 
> Thus, the timestamp of the "most recent event in the window" might
> change...
> 
> 
> -Matthias
> 
> On 03/29/2016 09:35 AM, Bart van Deenen wrote:
> > Hi all
> > 
> > I'm doing a fold on a sliding window, using
> > TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> > the most recent event in the window, and use that to name the output (to
> > a file).
> > 
> > My question is: will a second run of Flink on the same set of data (from
> > Kafka) put the same events in a Window, or are the limits of a window
> > somehow dependent on the real time of the run.
> > The windows I'm using are two sliding timeWindow's and one timeWindowAll
> > 
> > Thanks for any answers
> > 
> > Bart van Deenen
> > 
> 
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)

Re: window limits ?

Posted by "Matthias J. Sax" <mj...@apache.org>.
If you use event time, a second run will put the exact same tuples into
the windows (event time implies, that the timestamp is encoded in the
tuple itself, thus, it is independent of the wall-clock time).

However, be aware that the order of tuples *within a window* might change!

Thus, the timestamp of the "most recent event in the window" might change...


-Matthias

On 03/29/2016 09:35 AM, Bart van Deenen wrote:
> Hi all
> 
> I'm doing a fold on a sliding window, using
> TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> the most recent event in the window, and use that to name the output (to
> a file).
> 
> My question is: will a second run of Flink on the same set of data (from
> Kafka) put the same events in a Window, or are the limits of a window
> somehow dependent on the real time of the run.
> The windows I'm using are two sliding timeWindow's and one timeWindowAll
> 
> Thanks for any answers
> 
> Bart van Deenen
>