You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Guagliardo, Patrizio via user" <us...@beam.apache.org> on 2023/08/01 07:17:58 UTC

Delete window information

Hi together,

I have a question regarding Apache Beam in Python:

When I create a window with timestamps and then make a groupby, then the information for the windows remains in the elements. Afterwards, I want to make another groupby (something like a cumsum) by certain keys, but that does not work as the windows are still there and so it takes the keys + windows to split my data in the combineFn. Question now: How can I “delete” the windows and timestamps so that I would combine records just by the defined keys in the last combineFn.

Hope this is clear.

Best,
Patrizio

________________________________
This e-mail and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient, you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information contained herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your cooperation. For more information on how we use your personal data please see our Privacy Notice<https://www.oliverwyman.com/policies/privacy-notice.html>.

Re: Delete window information

Posted by Wiśniowski Piotr <co...@gmail.com>.
Hi,

This is very typical usage. Beam abstraction requires that all events 
must belong to a window ( and default window is a single global window), 
so it is not possible to delete a window information. But what You 
really want is to overwrite the current window information set for the 
first aggregation with new window information set for second 
aggregation. This is very typical for Beam pipelines to set new windows 
per each aggregation operation.

In practice I imagine You would need to just

|"back to global window" >>WindowInto(GlobalWindows())

before second aggregation operation in case of bounded sources. If You 
are working with unbounded source (streaming aggregation), using  global 
window directly will fail as it needs to wait for infinite to get all 
data before emitting output. IN this case You would need to also set 
some trigger and accumulation_mode to tell beam when to emit elements 
for the aggregated keys.

Hope this helps. If not please provide a bit more details what exactly 
You are trying to do so that I can try to help.
Also You could try to get more familiar with the abstraction with the 
help of https://tour.beam.apache.org/.

Best
Wiśniowski Piotr


On 1.08.2023 09:17, Guagliardo, Patrizio via user wrote:
>
> Hi together,
>
> I have a question regarding Apache Beam in Python:
>
> When I create a window with timestamps and then make a groupby, then 
> the information for the windows remains in the elements. Afterwards, I 
> want to make another groupby (something like a cumsum) by certain 
> keys, but that does not work as the windows are still there and so it 
> takes the keys + windows to split my data in the combineFn. Question 
> now: How can I “delete” the windows and timestamps so that I would 
> combine records just by the defined keys in the last combineFn.
>
> Hope this is clear.
>
> Best,
>
> Patrizio
>
>
> ------------------------------------------------------------------------
> This e-mail and any attachments may be confidential or legally 
> privileged. If you received this message in error or are not the 
> intended recipient, you should destroy the e-mail message and any 
> attachments or copies, and you are prohibited from retaining, 
> distributing, disclosing or using any information contained herein. 
> Please inform us of the erroneous delivery by return e-mail. Thank you 
> for your cooperation. For more information on how we use your personal 
> data please see our Privacy Notice 
> <https://www.oliverwyman.com/policies/privacy-notice.html>.