You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Emmanuel <el...@msn.com> on 2016/01/28 23:08:48 UTC

Window stream using timestamp key for time

Hello,
I have used Flink to stream data and do analytics on the stream, using time windows...
Now, this is assuming the data is effectively coming in real time. However I have a use case where the data is 'batched' upstream, and comes in bursts, but has a timestamp.It obviously messes up the windowed stream assumption. (note it is a problem with queuing in Kafka for example when there is any kind of downtime downstream of Kafka: if data accumulates and then is consumed, it is consumed at higher 'speed' than real clock time and statistics do not match reality.)
So my question is:
Is it possible to use a window stream based on a timestamp key for time, as opposed to clock time?
How would one do this with the current API?
ThanksEmmanuel 		 	   		  

RE: Window stream using timestamp key for time

Posted by Emmanuel <el...@msn.com>.
Nice,
you guys rock!

From: fhueske@gmail.com
Date: Thu, 28 Jan 2016 23:34:58 +0100
Subject: Re: Window stream using timestamp key for time
To: user@flink.apache.org

Hi Emmanuel,

the feature you are looking for is called event time processing in Flink.
These blog posts should help you to become familiar with the concepts:

1) Event-Time concepts: http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
2) Windows in Flink: http://flink.apache.org/news/2015/12/04/Introducing-windows.html
3) Event-Time example use-case: https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana
4) Code for example: https://github.com/dataArtisans/flink-streaming-demo

Best, Fabian


2016-01-28 23:08 GMT+01:00 Emmanuel <el...@msn.com>:



Hello,
I have used Flink to stream data and do analytics on the stream, using time windows...
Now, this is assuming the data is effectively coming in real time. However I have a use case where the data is 'batched' upstream, and comes in bursts, but has a timestamp.It obviously messes up the windowed stream assumption. (note it is a problem with queuing in Kafka for example when there is any kind of downtime downstream of Kafka: if data accumulates and then is consumed, it is consumed at higher 'speed' than real clock time and statistics do not match reality.)
So my question is:
Is it possible to use a window stream based on a timestamp key for time, as opposed to clock time?
How would one do this with the current API?
ThanksEmmanuel 		 	   		  

 		 	   		  

Re: Window stream using timestamp key for time

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Emmanuel,

the feature you are looking for is called event time processing in Flink.
These blog posts should help you to become familiar with the concepts:

1) Event-Time concepts:
http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
2) Windows in Flink:
http://flink.apache.org/news/2015/12/04/Introducing-windows.html
3) Event-Time example use-case:
https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana
4) Code for example: https://github.com/dataArtisans/flink-streaming-demo

Best, Fabian


2016-01-28 23:08 GMT+01:00 Emmanuel <el...@msn.com>:

> Hello,
>
> I have used Flink to stream data and do analytics on the stream, using
> time windows...
>
> Now, this is assuming the data is effectively coming in real time. However
> I have a use case where the data is 'batched' upstream, and comes in
> bursts, but has a timestamp.
> It obviously messes up the windowed stream assumption.
> (note it is a problem with queuing in Kafka for example when there is any
> kind of downtime downstream of Kafka: if data accumulates and then is
> consumed, it is consumed at higher 'speed' than real clock time and
> statistics do not match reality.)
>
> So my question is:
>
> Is it possible to use a window stream based on a timestamp key for time,
> as opposed to clock time?
>
> How would one do this with the current API?
>
> Thanks
> Emmanuel
>