You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marco Villalobos <mv...@kineteque.com> on 2021/04/19 19:24:23 UTC

How can I demarcate which event elements are the boundaries of a window?

I have a tumbling window that aggregates into a process window function.
Downstream there is a keyed process function.

[window aggregate into process function] -> keyed process function

I am not quite sure how the keyed process knows which elements are at the
boundary of the window.  Is there a means to communicate that?

Are watermarks the means by which we signal that either processing time or
event time has finished an interval?

Is it a watermark, that can be used to signal to the downstream operators
the demarcating events?

Are there other ways to do that?

Re: How can I demarcate which event elements are the boundaries of a window?

Posted by Arvid Heise <ar...@apache.org>.
Hi Marco,

It basically works like this for windows:
- For any incoming record, calculate the respective window based on the
event timestamp (ts). Let's assume a tumbling window for now, then we
calculate by ts / window size (simplified).
- This means that at any given time, there could be an arbitrary number of
open windows.
- Now a watermark comes in. The watermark tells the window operator: no
elements with a lower ts can come at this point.
- The window operator closes all windows before that point in time. For
each window: the aggregations are performed and the output is generated.
The window is usually evicted at this point in time.
- A record coming after the respective window is closed is discarded.

I don't quite understand if you have 2 process functions or 1 process
function with a window, but note that any process function directly after a
window is merged into one task. So aggregation happens right away. This
ProcessWindowFunction gets an Iterable of all elements. The elements are
sorted by their arrival time. If you want to receive the bounds, you can
either select first and last on an ordered stream or min/max.

On Mon, Apr 19, 2021 at 9:24 PM Marco Villalobos <mv...@kineteque.com>
wrote:

> I have a tumbling window that aggregates into a process window function.
> Downstream there is a keyed process function.
>
> [window aggregate into process function] -> keyed process function
>
> I am not quite sure how the keyed process knows which elements are at the
> boundary of the window.  Is there a means to communicate that?
>
> Are watermarks the means by which we signal that either processing time or
> event time has finished an interval?
>
> Is it a watermark, that can be used to signal to the downstream operators
> the demarcating events?
>
> Are there other ways to do that?
>