You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Matt <dr...@gmail.com> on 2016/12/15 06:22:41 UTC

Updating a Tumbling Window every second?

Hello,

I have a rather simple problem with a difficult explanation...

I have 3 streams, one of objects of class A (stream A), one of class B
(stream B) and one of class C (stream C). The elements of A are generated
at a rate of about 3 times every second. Elements of type B encapsulates
some key features of the stream A (like the number of elements of A in the
window) during the last 30 seconds (tumbling window 30s). Finally, the
elements of type C contains statistics (for simplicity let's say the
average of elements processed by each element in B) of the last 3 elements
in B and are produced on every new element of B (count window 3, 1).

Illustrative example, () and [] denotes windows:

... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
... (b4 [b3 b2) b1]
... [c2] [c1]

This works fine, except for a dashboard that depends on the elements of C
to be updated, and 30s is way too big of a delay. I thought I could change
the tumbling window for a sliding window of size 30s and a slide of 1s, but
this doesn't work.

If I use a sliding window to create elements of B as mentioned, each count
window would contain 3 elements of B, and I would get one element of C
every second as intended, but those elements in B encapsulates almost the
same elements of A. This results in stats that are wrong.

For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and b3
share most of the elements from stream A.

Question: is there any way to create a count window with the last 3
elements of B that would have gone into the same tumbling window, not with
the last 3 consecutive elements?

I hope the problem is clear, don't hesitate to ask for further
clarification!

Regards,
Matt

Re: Updating a Tumbling Window every second?

Posted by Matt <dr...@gmail.com>.
Fabian,

Thanks for your answer. Since elements in B are expensive to create, I
wanted to reuse them. I understand I can plug two consumers into stream A,
but in that case -if I'm not wrong- I would have to create repeated
elements of B: one to save them into stream B and one to create C objects
for stream C.

Anyway, I've already solved this problem a few days back.

Regards,
Matt

On Mon, Dec 19, 2016 at 5:57 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Matt,
>
> the combination of a tumbling time window and a count window is one way to
> define a sliding window.
> In your example of a 30 secs tumbling window and a (3,1) count window
> results in a time sliding window of 90 secs width and 30 secs slide.
>
> You could define a time sliding window of 90 secs width and 1 secs slide
> on stream A to get a stream C with faster updates.
> If you still need stream B with the 30 secs tumbling window, you can have
> both windows defined on stream A.
>
> Hope this helps,
> Fabian
>
> 2016-12-16 12:58 GMT+01:00 Matt <dr...@gmail.com>:
>
>> I have reduced the problem to a simple image [1].
>>
>> Those shown on the image are the streams I have, and the problem now is
>> how to create a custom window assigner such that objects in B that *don't
>> share* elements in A, are put together in the same window.
>>
>> Why? Because in order to create elements in C (triangles), I have to
>> process n *independent* elements of B (n=2 in the example).
>>
>> Maybe there's a better or simpler way to do this. Any idea is appreciated!
>>
>> Regards,
>> Matt
>>
>> [1] http://i.imgur.com/dG5AkJy.png
>>
>> On Thu, Dec 15, 2016 at 3:22 AM, Matt <dr...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have a rather simple problem with a difficult explanation...
>>>
>>> I have 3 streams, one of objects of class A (stream A), one of class B
>>> (stream B) and one of class C (stream C). The elements of A are generated
>>> at a rate of about 3 times every second. Elements of type B encapsulates
>>> some key features of the stream A (like the number of elements of A in the
>>> window) during the last 30 seconds (tumbling window 30s). Finally, the
>>> elements of type C contains statistics (for simplicity let's say the
>>> average of elements processed by each element in B) of the last 3 elements
>>> in B and are produced on every new element of B (count window 3, 1).
>>>
>>> Illustrative example, () and [] denotes windows:
>>>
>>> ... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
>>> ... (b4 [b3 b2) b1]
>>> ... [c2] [c1]
>>>
>>> This works fine, except for a dashboard that depends on the elements of
>>> C to be updated, and 30s is way too big of a delay. I thought I could
>>> change the tumbling window for a sliding window of size 30s and a slide of
>>> 1s, but this doesn't work.
>>>
>>> If I use a sliding window to create elements of B as mentioned, each
>>> count window would contain 3 elements of B, and I would get one element of
>>> C every second as intended, but those elements in B encapsulates almost the
>>> same elements of A. This results in stats that are wrong.
>>>
>>> For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and
>>> b3 share most of the elements from stream A.
>>>
>>> Question: is there any way to create a count window with the last 3
>>> elements of B that would have gone into the same tumbling window, not with
>>> the last 3 consecutive elements?
>>>
>>> I hope the problem is clear, don't hesitate to ask for further
>>> clarification!
>>>
>>> Regards,
>>> Matt
>>>
>>>
>>
>

Re: Updating a Tumbling Window every second?

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Matt,

the combination of a tumbling time window and a count window is one way to
define a sliding window.
In your example of a 30 secs tumbling window and a (3,1) count window
results in a time sliding window of 90 secs width and 30 secs slide.

You could define a time sliding window of 90 secs width and 1 secs slide on
stream A to get a stream C with faster updates.
If you still need stream B with the 30 secs tumbling window, you can have
both windows defined on stream A.

Hope this helps,
Fabian

2016-12-16 12:58 GMT+01:00 Matt <dr...@gmail.com>:

> I have reduced the problem to a simple image [1].
>
> Those shown on the image are the streams I have, and the problem now is
> how to create a custom window assigner such that objects in B that *don't
> share* elements in A, are put together in the same window.
>
> Why? Because in order to create elements in C (triangles), I have to
> process n *independent* elements of B (n=2 in the example).
>
> Maybe there's a better or simpler way to do this. Any idea is appreciated!
>
> Regards,
> Matt
>
> [1] http://i.imgur.com/dG5AkJy.png
>
> On Thu, Dec 15, 2016 at 3:22 AM, Matt <dr...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a rather simple problem with a difficult explanation...
>>
>> I have 3 streams, one of objects of class A (stream A), one of class B
>> (stream B) and one of class C (stream C). The elements of A are generated
>> at a rate of about 3 times every second. Elements of type B encapsulates
>> some key features of the stream A (like the number of elements of A in the
>> window) during the last 30 seconds (tumbling window 30s). Finally, the
>> elements of type C contains statistics (for simplicity let's say the
>> average of elements processed by each element in B) of the last 3 elements
>> in B and are produced on every new element of B (count window 3, 1).
>>
>> Illustrative example, () and [] denotes windows:
>>
>> ... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
>> ... (b4 [b3 b2) b1]
>> ... [c2] [c1]
>>
>> This works fine, except for a dashboard that depends on the elements of C
>> to be updated, and 30s is way too big of a delay. I thought I could change
>> the tumbling window for a sliding window of size 30s and a slide of 1s, but
>> this doesn't work.
>>
>> If I use a sliding window to create elements of B as mentioned, each
>> count window would contain 3 elements of B, and I would get one element of
>> C every second as intended, but those elements in B encapsulates almost the
>> same elements of A. This results in stats that are wrong.
>>
>> For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and b3
>> share most of the elements from stream A.
>>
>> Question: is there any way to create a count window with the last 3
>> elements of B that would have gone into the same tumbling window, not with
>> the last 3 consecutive elements?
>>
>> I hope the problem is clear, don't hesitate to ask for further
>> clarification!
>>
>> Regards,
>> Matt
>>
>>
>

Re: Updating a Tumbling Window every second?

Posted by Matt <dr...@gmail.com>.
I have reduced the problem to a simple image [1].

Those shown on the image are the streams I have, and the problem now is how
to create a custom window assigner such that objects in B that *don't share*
elements in A, are put together in the same window.

Why? Because in order to create elements in C (triangles), I have to
process n *independent* elements of B (n=2 in the example).

Maybe there's a better or simpler way to do this. Any idea is appreciated!

Regards,
Matt

[1] http://i.imgur.com/dG5AkJy.png

On Thu, Dec 15, 2016 at 3:22 AM, Matt <dr...@gmail.com> wrote:

> Hello,
>
> I have a rather simple problem with a difficult explanation...
>
> I have 3 streams, one of objects of class A (stream A), one of class B
> (stream B) and one of class C (stream C). The elements of A are generated
> at a rate of about 3 times every second. Elements of type B encapsulates
> some key features of the stream A (like the number of elements of A in the
> window) during the last 30 seconds (tumbling window 30s). Finally, the
> elements of type C contains statistics (for simplicity let's say the
> average of elements processed by each element in B) of the last 3 elements
> in B and are produced on every new element of B (count window 3, 1).
>
> Illustrative example, () and [] denotes windows:
>
> ... [a10 a9 a8] [a7 a6] [a5 a4] [a3 a2 a1]
> ... (b4 [b3 b2) b1]
> ... [c2] [c1]
>
> This works fine, except for a dashboard that depends on the elements of C
> to be updated, and 30s is way too big of a delay. I thought I could change
> the tumbling window for a sliding window of size 30s and a slide of 1s, but
> this doesn't work.
>
> If I use a sliding window to create elements of B as mentioned, each count
> window would contain 3 elements of B, and I would get one element of C
> every second as intended, but those elements in B encapsulates almost the
> same elements of A. This results in stats that are wrong.
>
> For instance, c1 may have the average of b1, b2 and b3. But b1, b2 and b3
> share most of the elements from stream A.
>
> Question: is there any way to create a count window with the last 3
> elements of B that would have gone into the same tumbling window, not with
> the last 3 consecutive elements?
>
> I hope the problem is clear, don't hesitate to ask for further
> clarification!
>
> Regards,
> Matt
>
>