You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Elias Levy <fe...@gmail.com> on 2016/05/03 04:07:38 UTC

TimeWindow overload?

Looking over the code, I see that Flink creates a TimeWindow object each
time the WindowAssigner is created.  I have not yet tested this, but I am
wondering if this can become problematic if you have a very long sliding
window with a small slide, such as a 24 hour window with a 1 minute slide.
It seems this would create 1,440 TimeWindow objects per event.  Event a low
event rates this would seem to result in an explosion of TimeWindow
objects: at 1,000 events per second, you'd be creating 1,440,000 TImeWindow
objects.  After 24 hours you'd have nearly 125 billion TM objects that
would just begin to be purged.

Does this analysis seem right?

I suppose that means you should not use long length sliding window with
small slides.

Re: TimeWindow overload?

Posted by Stephan Ewen <se...@apache.org>.
Just had a quick chat with Aljoscha...

The first version of the aligned window code will still duplicate the
elements, later versions should be able to get rid of that.

On Tue, May 3, 2016 at 11:10 AM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
> even with the optimized operator for aligned time windows I would advice
> against using long sliding windows with a small slide. The system will
> internally create a lot of "buckets", i.e. each sliding window is treated
> separately and the element is put into 1,440 buckets, in your case. With a
> moderate amount of different keys this can very quickly lead to a lot of
> created window buckets. You can think of it in terms of write
> amplification. If you have tumbling windows you basically have no
> amplification, if you have sliding windows you have window processing
> overhead for every slide.
>
> Cheers,
> Aljoscha
>
> On Tue, 3 May 2016 at 09:05 Stephan Ewen <se...@apache.org> wrote:
>
>> Hi Elias!
>>
>> There is a feature pending that uses an optimized version for aligned
>> time windows. In that case, elements would go into a single window pane,
>> and the full window would be composed of all panes it spans (in the case of
>> sliding windows). That should help a lot in those cases.
>>
>> The default window mechanism does it that way, because is supports
>> unaligned windows (where each key has a different window start and
>> endpoint) and it supports completely custom window assigners.
>>
>> Greetings,
>> Stephan
>>
>>
>>
>> On Tue, May 3, 2016 at 4:07 AM, Elias Levy <fe...@gmail.com>
>> wrote:
>>
>>> Looking over the code, I see that Flink creates a TimeWindow object each
>>> time the WindowAssigner is created.  I have not yet tested this, but I am
>>> wondering if this can become problematic if you have a very long sliding
>>> window with a small slide, such as a 24 hour window with a 1 minute slide.
>>> It seems this would create 1,440 TimeWindow objects per event.  Event a low
>>> event rates this would seem to result in an explosion of TimeWindow
>>> objects: at 1,000 events per second, you'd be creating 1,440,000 TImeWindow
>>> objects.  After 24 hours you'd have nearly 125 billion TM objects that
>>> would just begin to be purged.
>>>
>>> Does this analysis seem right?
>>>
>>> I suppose that means you should not use long length sliding window with
>>> small slides.
>>>
>>>
>>

Re: TimeWindow overload?

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
even with the optimized operator for aligned time windows I would advice
against using long sliding windows with a small slide. The system will
internally create a lot of "buckets", i.e. each sliding window is treated
separately and the element is put into 1,440 buckets, in your case. With a
moderate amount of different keys this can very quickly lead to a lot of
created window buckets. You can think of it in terms of write
amplification. If you have tumbling windows you basically have no
amplification, if you have sliding windows you have window processing
overhead for every slide.

Cheers,
Aljoscha

On Tue, 3 May 2016 at 09:05 Stephan Ewen <se...@apache.org> wrote:

> Hi Elias!
>
> There is a feature pending that uses an optimized version for aligned time
> windows. In that case, elements would go into a single window pane, and the
> full window would be composed of all panes it spans (in the case of sliding
> windows). That should help a lot in those cases.
>
> The default window mechanism does it that way, because is supports
> unaligned windows (where each key has a different window start and
> endpoint) and it supports completely custom window assigners.
>
> Greetings,
> Stephan
>
>
>
> On Tue, May 3, 2016 at 4:07 AM, Elias Levy <fe...@gmail.com>
> wrote:
>
>> Looking over the code, I see that Flink creates a TimeWindow object each
>> time the WindowAssigner is created.  I have not yet tested this, but I am
>> wondering if this can become problematic if you have a very long sliding
>> window with a small slide, such as a 24 hour window with a 1 minute slide.
>> It seems this would create 1,440 TimeWindow objects per event.  Event a low
>> event rates this would seem to result in an explosion of TimeWindow
>> objects: at 1,000 events per second, you'd be creating 1,440,000 TImeWindow
>> objects.  After 24 hours you'd have nearly 125 billion TM objects that
>> would just begin to be purged.
>>
>> Does this analysis seem right?
>>
>> I suppose that means you should not use long length sliding window with
>> small slides.
>>
>>
>

Re: TimeWindow overload?

Posted by Stephan Ewen <se...@apache.org>.
Hi Elias!

There is a feature pending that uses an optimized version for aligned time
windows. In that case, elements would go into a single window pane, and the
full window would be composed of all panes it spans (in the case of sliding
windows). That should help a lot in those cases.

The default window mechanism does it that way, because is supports
unaligned windows (where each key has a different window start and
endpoint) and it supports completely custom window assigners.

Greetings,
Stephan



On Tue, May 3, 2016 at 4:07 AM, Elias Levy <fe...@gmail.com>
wrote:

> Looking over the code, I see that Flink creates a TimeWindow object each
> time the WindowAssigner is created.  I have not yet tested this, but I am
> wondering if this can become problematic if you have a very long sliding
> window with a small slide, such as a 24 hour window with a 1 minute slide.
> It seems this would create 1,440 TimeWindow objects per event.  Event a low
> event rates this would seem to result in an explosion of TimeWindow
> objects: at 1,000 events per second, you'd be creating 1,440,000 TImeWindow
> objects.  After 24 hours you'd have nearly 125 billion TM objects that
> would just begin to be purged.
>
> Does this analysis seem right?
>
> I suppose that means you should not use long length sliding window with
> small slides.
>
>