You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Pradip Thachile <pr...@thachile.com> on 2020/06/11 07:12:31 UTC

Python/Beam Windowing + Triggering Recommendations

Hi,

I'm writing a pipeline in Python/Beam 2.19 and wanted to get the opinions of the folks on list here around the best way to implement the following logic, specifically around the window + trigger combinations to use.

HourlySource := a PCollection that receives an event an hour from another pipeline (hourly aggregated data). This PCollection doesn't generate data on the weekends; there is no upstream data available on weekends.

DesiredOutput := a PCollection that is the last 100 events as a list that is updated hourly.

What is the correct windowing and triggering combination to go from HourlySource to DesiredOutput? Let's assume I'm not worried about late data.

My initial thinking was a SlidingWindow(size=100*60*60, period=60*60) with a Repeatedly(AfterProcessingTime(60*60)) trigger in accumulating mode. If I understand correctly, this should create a 100-hour window that will emit results hourly. Is my understanding correct here? Do I need an orFinally (I assume not since I would like the trigger to fire forever to always give me the last 100 hours)?  The largest gap here is that this seems like it would be the past 100-hours not 100 events.

I also thought of using the same window, but the default AfterWatermark trigger with early firings of Repeatedly(AfterProcessingTime(60*60)), however this doesn't solve the 100-hours vs. 100-events issue, albeit potentially the same as above?

The other idea was to use the GlobalWindow() and then a composite trigger of Repeatedly(AfterCount(1)).orFinally(AfterCount(100)) in accumulating mode, but to be honest I don't think I have a strong handle on the interaction between the different trigger types, window types and accumulation modes to be able to figure out the best path forward. 

So rather than a bunch of trial and error (or perhaps more correctly in addition to that) I was hoping the folks here could give me some insights into how to approach this. Thanks!

-Pradip