You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Harshvardhan Agrawal <ha...@gmail.com> on 2018/06/21 19:01:54 UTC

Perform aggregations across multiple windows

Hello,

We are currently working on implementing a data pipeline using Beam on top
of Flink. We have an unbounded data source that sends us some financial
positions data. For each account, we perform certain aggregations (let’s
assume it’s summation for simplicity) across all products owned by the
account. I’m processing the data in windows of 30 seconds.
At any time when I am processing a window I want to be able to access the
aggregation from the previous window and add it to the current aggregation.
The stateful API in beam stores data by (key,window) and hence I can’t
really store a global state that I could access with account being the key.

The only way I could think was to write the data  of the previous window to
some DB and then read it in my following window which I don’t think is
efficient because I have to wait until the previous window is written to
the DB. Do you have suggestions on how to approach this problem?

Thank you.
-- 
Regards,
Harshvardhan

Re: Perform aggregations across multiple windows

Posted by Robert Bradshaw <ro...@google.com>.
You can re-window after the first aggregation (say, into the global
window) and state will be stored with respect to this window.
On Thu, Jun 21, 2018 at 12:02 PM Harshvardhan Agrawal
<ha...@gmail.com> wrote:
>
> Hello,
>
> We are currently working on implementing a data pipeline using Beam on top of Flink. We have an unbounded data source that sends us some financial positions data. For each account, we perform certain aggregations (let’s assume it’s summation for simplicity) across all products owned by the account. I’m processing the data in windows of 30 seconds.
> At any time when I am processing a window I want to be able to access the aggregation from the previous window and add it to the current aggregation. The stateful API in beam stores data by (key,window) and hence I can’t really store a global state that I could access with account being the key.
>
> The only way I could think was to write the data  of the previous window to some DB and then read it in my following window which I don’t think is efficient because I have to wait until the previous window is written to the DB. Do you have suggestions on how to approach this problem?
>
> Thank you.
> --
> Regards,
> Harshvardhan