You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Nick Travers <n....@gmail.com> on 2016/10/29 23:18:35 UTC

Combining and ranking fired panes

Hi - I'm wondering how I'd go about combining results from repeated
speculative firings of a window into a single, consolidated "pane".

In my current use-case, I have items with scores arriving continuously, and
I'm using hourly windows with speculative firings every minute, with the
panes being accumulated. Every time a pane fires, I'd like to be able to
(re-)rank the top ten items by score, descending.

For example, if I have three items A, B and C arriving over the course of
an hour with continuously changing scores, as follows:

------- window start
(A, 1)
(B, 2)
(C, 3)
------- first firing (EARLY)
(B, 4)
------- second firing (EARLY)
(C, 0)
------- window closes (ON_TIME)

then I'm hoping to see the following results when each pane is fired.

After first firing:
(C, 3)
(B, 2)
(A, 1)

After second firing:
(B, 4)
(C, 3)
(A, 1)

On close of the window:
(B, 4)
(A, 1)
(C, 0)

I'm currently using Top.of().withoutDefaults() to give me the ranking, but
this seems to only gives a single ON_TIME pane with all of the interim
panes combined first and _then_ ranked on the score, so I get something
like:
(B, 4)
(B, 4)
(B, 2)
(C, 3)
(C, 3)
(A, 1)
(A, 1)
(A, 1)
(C, 0)

Should I be using a different approach / pattern to continually rank each
accumulated pane that is fired?

Testing this with the DirectRunner, but I also see something similar when
running with BlockingDataflowRunner.

Thanks in advance!
- nick

Re: Combining and ranking fired panes

Posted by Aleksandr <al...@gmail.com>.
Hello Nick,
I suppose that sliding window will suit your project
https://cloud.google.com/dataflow/model/windowing

Best regards
Aleksandr

вс, 30 окт. 2016 г. в 2:18, Nick Travers <n....@gmail.com>:

> Hi - I'm wondering how I'd go about combining results from repeated
> speculative firings of a window into a single, consolidated "pane".
>
> In my current use-case, I have items with scores arriving continuously,
> and I'm using hourly windows with speculative firings every minute, with
> the panes being accumulated. Every time a pane fires, I'd like to be able
> to (re-)rank the top ten items by score, descending.
>
> For example, if I have three items A, B and C arriving over the course of
> an hour with continuously changing scores, as follows:
>
> ------- window start
> (A, 1)
> (B, 2)
> (C, 3)
> ------- first firing (EARLY)
> (B, 4)
> ------- second firing (EARLY)
> (C, 0)
> ------- window closes (ON_TIME)
>
> then I'm hoping to see the following results when each pane is fired.
>
> After first firing:
> (C, 3)
> (B, 2)
> (A, 1)
>
> After second firing:
> (B, 4)
> (C, 3)
> (A, 1)
>
> On close of the window:
> (B, 4)
> (A, 1)
> (C, 0)
>
> I'm currently using Top.of().withoutDefaults() to give me the ranking, but
> this seems to only gives a single ON_TIME pane with all of the interim
> panes combined first and _then_ ranked on the score, so I get something
> like:
> (B, 4)
> (B, 4)
> (B, 2)
> (C, 3)
> (C, 3)
> (A, 1)
> (A, 1)
> (A, 1)
> (C, 0)
>
> Should I be using a different approach / pattern to continually rank each
> accumulated pane that is fired?
>
> Testing this with the DirectRunner, but I also see something similar when
> running with BlockingDataflowRunner.
>
> Thanks in advance!
> - nick
>