You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Bart van Deenen <ba...@fastmail.fm> on 2016/03/29 09:35:51 UTC
window limits ?
Hi all
I'm doing a fold on a sliding window, using
TimeCharacteristic.EventTime. For output I'm picking the timestamp of
the most recent event in the window, and use that to name the output (to
a file).
My question is: will a second run of Flink on the same set of data (from
Kafka) put the same events in a Window, or are the limits of a window
somehow dependent on the real time of the run.
The windows I'm using are two sliding timeWindow's and one timeWindowAll
Thanks for any answers
Bart van Deenen
Re: window limits ?
Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
which version of Flink are you using and do you have a custom timestamp
extractor/watermark extractor? The semantics of this changed between 0.10
and 1.0 and I just want to make sure that you get the correct behavior.
Cheers,
Aljoscha
On Tue, 29 Mar 2016 at 10:13 Bart van Deenen <ba...@fastmail.fm>
wrote:
> Great!
>
> I'm actually taking the max of the timestamps, so I should be fine.
>
> Thanks
>
> Bart
>
> On Tue, Mar 29, 2016, at 09:48, Matthias J. Sax wrote:
> > If you use event time, a second run will put the exact same tuples into
> > the windows (event time implies, that the timestamp is encoded in the
> > tuple itself, thus, it is independent of the wall-clock time).
> >
> > However, be aware that the order of tuples *within a window* might
> > change!
> >
> > Thus, the timestamp of the "most recent event in the window" might
> > change...
> >
> >
> > -Matthias
> >
> > On 03/29/2016 09:35 AM, Bart van Deenen wrote:
> > > Hi all
> > >
> > > I'm doing a fold on a sliding window, using
> > > TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> > > the most recent event in the window, and use that to name the output
> (to
> > > a file).
> > >
> > > My question is: will a second run of Flink on the same set of data
> (from
> > > Kafka) put the same events in a Window, or are the limits of a window
> > > somehow dependent on the real time of the run.
> > > The windows I'm using are two sliding timeWindow's and one
> timeWindowAll
> > >
> > > Thanks for any answers
> > >
> > > Bart van Deenen
> > >
> >
> > Email had 1 attachment:
> > + signature.asc
> > 1k (application/pgp-signature)
>
Re: window limits ?
Posted by Bart van Deenen <ba...@fastmail.fm>.
Great!
I'm actually taking the max of the timestamps, so I should be fine.
Thanks
Bart
On Tue, Mar 29, 2016, at 09:48, Matthias J. Sax wrote:
> If you use event time, a second run will put the exact same tuples into
> the windows (event time implies, that the timestamp is encoded in the
> tuple itself, thus, it is independent of the wall-clock time).
>
> However, be aware that the order of tuples *within a window* might
> change!
>
> Thus, the timestamp of the "most recent event in the window" might
> change...
>
>
> -Matthias
>
> On 03/29/2016 09:35 AM, Bart van Deenen wrote:
> > Hi all
> >
> > I'm doing a fold on a sliding window, using
> > TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> > the most recent event in the window, and use that to name the output (to
> > a file).
> >
> > My question is: will a second run of Flink on the same set of data (from
> > Kafka) put the same events in a Window, or are the limits of a window
> > somehow dependent on the real time of the run.
> > The windows I'm using are two sliding timeWindow's and one timeWindowAll
> >
> > Thanks for any answers
> >
> > Bart van Deenen
> >
>
> Email had 1 attachment:
> + signature.asc
> 1k (application/pgp-signature)
Re: window limits ?
Posted by "Matthias J. Sax" <mj...@apache.org>.
If you use event time, a second run will put the exact same tuples into
the windows (event time implies, that the timestamp is encoded in the
tuple itself, thus, it is independent of the wall-clock time).
However, be aware that the order of tuples *within a window* might change!
Thus, the timestamp of the "most recent event in the window" might change...
-Matthias
On 03/29/2016 09:35 AM, Bart van Deenen wrote:
> Hi all
>
> I'm doing a fold on a sliding window, using
> TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> the most recent event in the window, and use that to name the output (to
> a file).
>
> My question is: will a second run of Flink on the same set of data (from
> Kafka) put the same events in a Window, or are the limits of a window
> somehow dependent on the real time of the run.
> The windows I'm using are two sliding timeWindow's and one timeWindowAll
>
> Thanks for any answers
>
> Bart van Deenen
>