You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by Daniel Debrunner <dj...@debrunners.com> on 2019/03/04 18:22:08 UTC

Scope of windows?

The windowing section of the Beam programming model guide shows a
window defined and used in the GropyByKey transform after a ParDo.
(section 7.1.1).

However I couldn't find any documentation on how long the window
remains in scope for subsequent transforms.

I have an application with this pipeline:

PCollection<KV<A,B>> -> FixedWindow<KV<A,B>> -> GroupByKey ->
PCollection<X> -> FixedWindow<X> -> Combine<X,R>.globally ->
PCollection<R>

The idea is that the first window is aggregating by key but in the
second window I need to combine elements across all keys.

With my initial app I was seeing some runtime errors in/after the
combine where a KV<null,R> was being seen, even though at that point
there should be no key for the PCollection<R>.

In a simpler test I can apply  FixedWindow<X> -> Combine<X,R>.globally
-> PCollection<R> to a PCollection without an upstream window and the
combine correctly happens once.
But then adding the keyed upstream window, the combine occurs once per
key without any final combine across the keys.

So it seems somehow the memory of the key exists even with the new
window transform,

I'm probably misunderstanding some detail of windowing, but I couldn't
find any deeper documentation than the simple examples in the
programming model guide.

Can anyone point me in the correct direction?

Thanks,
Dan.