You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Guagliardo, Patrizio via user" <us...@beam.apache.org> on 2023/08/01 07:17:58 UTC
Delete window information
Hi together,
I have a question regarding Apache Beam in Python:
When I create a window with timestamps and then make a groupby, then the information for the windows remains in the elements. Afterwards, I want to make another groupby (something like a cumsum) by certain keys, but that does not work as the windows are still there and so it takes the keys + windows to split my data in the combineFn. Question now: How can I “delete” the windows and timestamps so that I would combine records just by the defined keys in the last combineFn.
Hope this is clear.
Best,
Patrizio
________________________________
This e-mail and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient, you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information contained herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your cooperation. For more information on how we use your personal data please see our Privacy Notice<https://www.oliverwyman.com/policies/privacy-notice.html>.
Re: Delete window information
Posted by Wiśniowski Piotr <co...@gmail.com>.
Hi,
This is very typical usage. Beam abstraction requires that all events
must belong to a window ( and default window is a single global window),
so it is not possible to delete a window information. But what You
really want is to overwrite the current window information set for the
first aggregation with new window information set for second
aggregation. This is very typical for Beam pipelines to set new windows
per each aggregation operation.
In practice I imagine You would need to just
|"back to global window" >>WindowInto(GlobalWindows())
before second aggregation operation in case of bounded sources. If You
are working with unbounded source (streaming aggregation), using global
window directly will fail as it needs to wait for infinite to get all
data before emitting output. IN this case You would need to also set
some trigger and accumulation_mode to tell beam when to emit elements
for the aggregated keys.
Hope this helps. If not please provide a bit more details what exactly
You are trying to do so that I can try to help.
Also You could try to get more familiar with the abstraction with the
help of https://tour.beam.apache.org/.
Best
Wiśniowski Piotr
On 1.08.2023 09:17, Guagliardo, Patrizio via user wrote:
>
> Hi together,
>
> I have a question regarding Apache Beam in Python:
>
> When I create a window with timestamps and then make a groupby, then
> the information for the windows remains in the elements. Afterwards, I
> want to make another groupby (something like a cumsum) by certain
> keys, but that does not work as the windows are still there and so it
> takes the keys + windows to split my data in the combineFn. Question
> now: How can I “delete” the windows and timestamps so that I would
> combine records just by the defined keys in the last combineFn.
>
> Hope this is clear.
>
> Best,
>
> Patrizio
>
>
> ------------------------------------------------------------------------
> This e-mail and any attachments may be confidential or legally
> privileged. If you received this message in error or are not the
> intended recipient, you should destroy the e-mail message and any
> attachments or copies, and you are prohibited from retaining,
> distributing, disclosing or using any information contained herein.
> Please inform us of the erroneous delivery by return e-mail. Thank you
> for your cooperation. For more information on how we use your personal
> data please see our Privacy Notice
> <https://www.oliverwyman.com/policies/privacy-notice.html>.