You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Jinhua Luo <lu...@gmail.com> on 2017/12/12 07:10:23 UTC

could I chain two timed window?

Hi All,

Given one stream source which generates 20k events/sec, and I need to
aggregate the element count using sliding window of 1 hour size.

The problem is, the window may buffer too many elements (which may
cause a lot of block I/O because of checkpointing?), and in fact it
does not necessary to store them for one hour, because the elements
should get folded incrementally. But unlike Tumbling Window, the
sliding window would save elements for next window, right?

So I am considering kind of workaround, should I chain two window like below:

            .timeWindow(Time.minutes(1))
            ...
            .timeWindow(Time.hours(1), Time.minutes(1))

Here the first window generate 1 minute aggregation units and the
second window provides the sliding output.

Any suggestions? Thanks.

Re: could I chain two timed window?

Posted by Fabian Hueske <fh...@gmail.com>.
Hi,

sliding windows replicate their records for each window.
If you have use an incrementally aggregating function (ReduceFunction,
AggregateFunction) with a sliding, the space requirement should not be an
issue because each window stores a single value.
However, this also means that each window performs its aggregations
independently from the others. So, if you many concurrent sliding windows,
pre-aggregate the records in a tumbling window can reduce the computational
effort.

Best, Fabian



2017-12-12 8:10 GMT+01:00 Jinhua Luo <lu...@gmail.com>:

> Hi All,
>
> Given one stream source which generates 20k events/sec, and I need to
> aggregate the element count using sliding window of 1 hour size.
>
> The problem is, the window may buffer too many elements (which may
> cause a lot of block I/O because of checkpointing?), and in fact it
> does not necessary to store them for one hour, because the elements
> should get folded incrementally. But unlike Tumbling Window, the
> sliding window would save elements for next window, right?
>
> So I am considering kind of workaround, should I chain two window like
> below:
>
>             .timeWindow(Time.minutes(1))
>             ...
>             .timeWindow(Time.hours(1), Time.minutes(1))
>
> Here the first window generate 1 minute aggregation units and the
> second window provides the sliding output.
>
> Any suggestions? Thanks.
>