You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Aaron Dixon <at...@gmail.com> on 2020/01/13 21:43:22 UTC

No AfterWatermark firings in Dataflow

I have the following trigger:

.apply(Window
      .configure()
      .triggering(AfterWatermark
           .pastEndOfWindow()
           .withEarlyFirings(AfterPane
                .elementCountAtLeast(1)))
      .accumulatingFiredPanes()
      .withAllowedLateness(Duration.ZERO)

But in Dataflow I notice that I never get an ON_TIME firing for my window
-- I only see early firing for elements, and then nothing.

My assumption is that AfterWatermark should give me a last, on-time pane
under this configuration when the watermark surpasses the window's end.

Is my expectation correct?

Re: No AfterWatermark firings in Dataflow

Posted by Kenneth Knowles <ke...@apache.org>.
This sounds like a bug, as described.

Here's the logic, shared by all runners:
https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java#L958

Regarding "race condition equivalent" I mean that when you have an early
trigger set up there is a benign race condition that determines the
presence or absence of an on time pane. So you should never count on it
unless you insist on it.

Regarding Robert's point I agree that if we try to interpret triggers as an
expression of human intent then it seems that AfterWatermark variants
should all output an ON_TIME pane. It is really the whole windowing
strategy by which the user specifies their intent. IMO there are a couple
reasons. Foremost, it becomes hard/impossible to reason about systems where
there are layers built upon the design philosophy of guessing intent. So it
is better to focus on clear boundaries; in this case triggers govern
whether the runner is permitted to produce non-lifecycle outputs, while
separate flags control the lifecycle outputs.

You can imagine a small DSL where the combination of trigger, on time
behavior, and closing behavior are expressed together. That is more or less
what WindowingStrategy is trying to get at. I also think the space of
triggers is sufficiently arbitrary and unexplored that I don't want to
couple them to things that are more obviously fundamental, like on time
behavior, lateness, and closing behavior. Previously, triggers were a
user-defined state machine, which I think is overly general and too
inefficient in a portable setting. But our particular set of triggers has
cruft you don't need and also sometimes lacks things you do need. If/when
we invest in and move toward sink triggers (aka specify business logic and
let the runner figure out the rest) seems good to re-open the design space.

Kenn

On Mon, Jan 13, 2020 at 7:10 PM Aaron Dixon <at...@gmail.com> wrote:

> Using ClosingBehavior I was able to see all windows get final panes fired
> (w/ isLast=true).
>
> Still unsure why OnTimeBehavior=ALWAYS wasn't enough to ensure an on-time
> pane for all windows. Though not clear on the meaning of the 'race
> condition equivalency' you mentioned Kenneth. Am very interested in
> understanding how this is supposed to work in general - but am very happy
> that I was able to get final panes by setting
> ClosingBehavior=ALWAYS....thank you for that pro-tip!!
>
> On Mon, Jan 13, 2020 at 7:57 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> I think AfterWatermark in particular should *alway* produce an ON_TIME
>> pane, regardless of whether there were early panes. (It's less clear
>> with non-watermark triggers like after count or processing time.) This
>> makes it feel like the on time behavior is a property of the trigger,
>> not the windowing strategy.
>>
>> On Mon, Jan 13, 2020 at 5:06 PM Aaron Dixon <at...@gmail.com> wrote:
>> >
>> > Kenn, thank you! There is OnTimeBehavior (default FIRE_ALWAYS) and
>> ClosingBehavior (default FIRE_IF_NON_EMPTY). Given that OnTimeBehavior is
>> always-fire, shouldn't I see empty ON_TIME panes?
>> >
>> > Since my lateness config is 0, I'm going to try ClosingBehavior =
>> FIRE_ALWAYS and see if I can rely on .isLast() to pick out the last pane
>> downstream. But curious if given that the OnTimeBehavior default is ALWAYS,
>> shouldn't I be seeing on-time panes in my current config?
>> >
>> >
>> >
>> > On Mon, Jan 13, 2020 at 6:45 PM Kenneth Knowles <ke...@apache.org>
>> wrote:
>> >>
>> >> On my phone, so I can't grab the jira so easily, but quickly: EARLY
>> panes are "race condition equivalent" to ON_TIME panes. The early panes
>> consume all the pending elements then the on time pane is "empty". This is
>> WAI if it is what is causing it. You need to explicitly set
>> Window.configure().fireAlways()*. I know this is counterintuitive in
>> accumulating mode, where the empty pane is not the identity element.
>> >>
>> >> Kenn
>> >>
>> >> *I don't recall if this is the default or not, and also because on
>> phone it is slow to look up. From your experience I think not default.
>> >>
>> >> On Mon, Jan 13, 2020, 15:03 Aaron Dixon <at...@gmail.com> wrote:
>> >>>
>> >>> Any confirmation on this from anyone? Whether per Beam spec, runners
>> are obligated to send ON_TIME panes for AfterWatermark triggers? I'm stuck
>> because this seems fundamental, so it's hard to imagine this is a Dataflow
>> bug, but OTOH it's also hard to imagine that trigger specs like
>> AfterWatermark are "optional"... ?
>> >>>
>> >>> On Mon, Jan 13, 2020 at 4:18 PM Aaron Dixon <at...@gmail.com>
>> wrote:
>> >>>>
>> >>>> Yes. Using calendar day-based windows and watermark is completely
>> caught up to today ... calendar window ends several days ago. I got EARLY
>> panes for each element but never ON_TIME pane.
>> >>>>
>> >>>> On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik <lc...@google.com> wrote:
>> >>>>>
>> >>>>> Is the watermark advancing past the end of the window?
>> >>>>>
>> >>>>> On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <at...@gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>> The window is not empty fwiw; it has elements; I get an early
>> firing pane for the window but well after the watermark passes there is no
>> ON_TIME pane. Would this be a bug in Dataflow? Seems fundamental, so I'm
>> concerned perhaps the Beam spec doesn't obligate ON_TIME firings?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> I would have expected an empty on time pane since the default on
>> time behavior is FIRE_ALWAYS.
>> >>>>>>>
>> >>>>>>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com>
>> wrote:
>> >>>>>>>>
>> >>>>>>>> Can anyone confirm?
>> >>>>>>>>
>> >>>>>>>> This is intermittent. Some (it seems, sparse) windows don't get
>> an ON_TIME firing after watermark. Is this a bug or is there a reason to
>> not expect ON_TIME firings for every window?
>> >>>>>>>>
>> >>>>>>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com>
>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> If it indeed happened as you have described, I will be very
>> interested in the expected behaviour.
>> >>>>>>>>>
>> >>>>>>>>> Something I remembered before: the trigger condition meets just
>> gives the runner/engine "permission" to fire, but runner/engine may not
>> fire immediately. But I don't know if the engine/runner will guarantee to
>> fire.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> -Rui
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> I have the following trigger:
>> >>>>>>>>>>
>> >>>>>>>>>> .apply(Window
>> >>>>>>>>>>       .configure()
>> >>>>>>>>>>       .triggering(AfterWatermark
>> >>>>>>>>>>            .pastEndOfWindow()
>> >>>>>>>>>>            .withEarlyFirings(AfterPane
>> >>>>>>>>>>                 .elementCountAtLeast(1)))
>> >>>>>>>>>>       .accumulatingFiredPanes()
>> >>>>>>>>>>       .withAllowedLateness(Duration.ZERO)
>> >>>>>>>>>>
>> >>>>>>>>>> But in Dataflow I notice that I never get an ON_TIME firing
>> for my window -- I only see early firing for elements, and then nothing.
>> >>>>>>>>>>
>> >>>>>>>>>> My assumption is that AfterWatermark should give me a last,
>> on-time pane under this configuration when the watermark surpasses the
>> window's end.
>> >>>>>>>>>>
>> >>>>>>>>>> Is my expectation correct?
>>
>

Re: No AfterWatermark firings in Dataflow

Posted by Aaron Dixon <at...@gmail.com>.
Using ClosingBehavior I was able to see all windows get final panes fired
(w/ isLast=true).

Still unsure why OnTimeBehavior=ALWAYS wasn't enough to ensure an on-time
pane for all windows. Though not clear on the meaning of the 'race
condition equivalency' you mentioned Kenneth. Am very interested in
understanding how this is supposed to work in general - but am very happy
that I was able to get final panes by setting
ClosingBehavior=ALWAYS....thank you for that pro-tip!!

On Mon, Jan 13, 2020 at 7:57 PM Robert Bradshaw <ro...@google.com> wrote:

> I think AfterWatermark in particular should *alway* produce an ON_TIME
> pane, regardless of whether there were early panes. (It's less clear
> with non-watermark triggers like after count or processing time.) This
> makes it feel like the on time behavior is a property of the trigger,
> not the windowing strategy.
>
> On Mon, Jan 13, 2020 at 5:06 PM Aaron Dixon <at...@gmail.com> wrote:
> >
> > Kenn, thank you! There is OnTimeBehavior (default FIRE_ALWAYS) and
> ClosingBehavior (default FIRE_IF_NON_EMPTY). Given that OnTimeBehavior is
> always-fire, shouldn't I see empty ON_TIME panes?
> >
> > Since my lateness config is 0, I'm going to try ClosingBehavior =
> FIRE_ALWAYS and see if I can rely on .isLast() to pick out the last pane
> downstream. But curious if given that the OnTimeBehavior default is ALWAYS,
> shouldn't I be seeing on-time panes in my current config?
> >
> >
> >
> > On Mon, Jan 13, 2020 at 6:45 PM Kenneth Knowles <ke...@apache.org> wrote:
> >>
> >> On my phone, so I can't grab the jira so easily, but quickly: EARLY
> panes are "race condition equivalent" to ON_TIME panes. The early panes
> consume all the pending elements then the on time pane is "empty". This is
> WAI if it is what is causing it. You need to explicitly set
> Window.configure().fireAlways()*. I know this is counterintuitive in
> accumulating mode, where the empty pane is not the identity element.
> >>
> >> Kenn
> >>
> >> *I don't recall if this is the default or not, and also because on
> phone it is slow to look up. From your experience I think not default.
> >>
> >> On Mon, Jan 13, 2020, 15:03 Aaron Dixon <at...@gmail.com> wrote:
> >>>
> >>> Any confirmation on this from anyone? Whether per Beam spec, runners
> are obligated to send ON_TIME panes for AfterWatermark triggers? I'm stuck
> because this seems fundamental, so it's hard to imagine this is a Dataflow
> bug, but OTOH it's also hard to imagine that trigger specs like
> AfterWatermark are "optional"... ?
> >>>
> >>> On Mon, Jan 13, 2020 at 4:18 PM Aaron Dixon <at...@gmail.com> wrote:
> >>>>
> >>>> Yes. Using calendar day-based windows and watermark is completely
> caught up to today ... calendar window ends several days ago. I got EARLY
> panes for each element but never ON_TIME pane.
> >>>>
> >>>> On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik <lc...@google.com> wrote:
> >>>>>
> >>>>> Is the watermark advancing past the end of the window?
> >>>>>
> >>>>> On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <at...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> The window is not empty fwiw; it has elements; I get an early
> firing pane for the window but well after the watermark passes there is no
> ON_TIME pane. Would this be a bug in Dataflow? Seems fundamental, so I'm
> concerned perhaps the Beam spec doesn't obligate ON_TIME firings?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com> wrote:
> >>>>>>>
> >>>>>>> I would have expected an empty on time pane since the default on
> time behavior is FIRE_ALWAYS.
> >>>>>>>
> >>>>>>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>> Can anyone confirm?
> >>>>>>>>
> >>>>>>>> This is intermittent. Some (it seems, sparse) windows don't get
> an ON_TIME firing after watermark. Is this a bug or is there a reason to
> not expect ON_TIME firings for every window?
> >>>>>>>>
> >>>>>>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com>
> wrote:
> >>>>>>>>>
> >>>>>>>>> If it indeed happened as you have described, I will be very
> interested in the expected behaviour.
> >>>>>>>>>
> >>>>>>>>> Something I remembered before: the trigger condition meets just
> gives the runner/engine "permission" to fire, but runner/engine may not
> fire immediately. But I don't know if the engine/runner will guarantee to
> fire.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Rui
> >>>>>>>>>
> >>>>>>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I have the following trigger:
> >>>>>>>>>>
> >>>>>>>>>> .apply(Window
> >>>>>>>>>>       .configure()
> >>>>>>>>>>       .triggering(AfterWatermark
> >>>>>>>>>>            .pastEndOfWindow()
> >>>>>>>>>>            .withEarlyFirings(AfterPane
> >>>>>>>>>>                 .elementCountAtLeast(1)))
> >>>>>>>>>>       .accumulatingFiredPanes()
> >>>>>>>>>>       .withAllowedLateness(Duration.ZERO)
> >>>>>>>>>>
> >>>>>>>>>> But in Dataflow I notice that I never get an ON_TIME firing for
> my window -- I only see early firing for elements, and then nothing.
> >>>>>>>>>>
> >>>>>>>>>> My assumption is that AfterWatermark should give me a last,
> on-time pane under this configuration when the watermark surpasses the
> window's end.
> >>>>>>>>>>
> >>>>>>>>>> Is my expectation correct?
>

Re: No AfterWatermark firings in Dataflow

Posted by Robert Bradshaw <ro...@google.com>.
I think AfterWatermark in particular should *alway* produce an ON_TIME
pane, regardless of whether there were early panes. (It's less clear
with non-watermark triggers like after count or processing time.) This
makes it feel like the on time behavior is a property of the trigger,
not the windowing strategy.

On Mon, Jan 13, 2020 at 5:06 PM Aaron Dixon <at...@gmail.com> wrote:
>
> Kenn, thank you! There is OnTimeBehavior (default FIRE_ALWAYS) and ClosingBehavior (default FIRE_IF_NON_EMPTY). Given that OnTimeBehavior is always-fire, shouldn't I see empty ON_TIME panes?
>
> Since my lateness config is 0, I'm going to try ClosingBehavior = FIRE_ALWAYS and see if I can rely on .isLast() to pick out the last pane downstream. But curious if given that the OnTimeBehavior default is ALWAYS, shouldn't I be seeing on-time panes in my current config?
>
>
>
> On Mon, Jan 13, 2020 at 6:45 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>> On my phone, so I can't grab the jira so easily, but quickly: EARLY panes are "race condition equivalent" to ON_TIME panes. The early panes consume all the pending elements then the on time pane is "empty". This is WAI if it is what is causing it. You need to explicitly set Window.configure().fireAlways()*. I know this is counterintuitive in accumulating mode, where the empty pane is not the identity element.
>>
>> Kenn
>>
>> *I don't recall if this is the default or not, and also because on phone it is slow to look up. From your experience I think not default.
>>
>> On Mon, Jan 13, 2020, 15:03 Aaron Dixon <at...@gmail.com> wrote:
>>>
>>> Any confirmation on this from anyone? Whether per Beam spec, runners are obligated to send ON_TIME panes for AfterWatermark triggers? I'm stuck because this seems fundamental, so it's hard to imagine this is a Dataflow bug, but OTOH it's also hard to imagine that trigger specs like AfterWatermark are "optional"... ?
>>>
>>> On Mon, Jan 13, 2020 at 4:18 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>
>>>> Yes. Using calendar day-based windows and watermark is completely caught up to today ... calendar window ends several days ago. I got EARLY panes for each element but never ON_TIME pane.
>>>>
>>>> On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>> Is the watermark advancing past the end of the window?
>>>>>
>>>>> On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>>>
>>>>>> The window is not empty fwiw; it has elements; I get an early firing pane for the window but well after the watermark passes there is no ON_TIME pane. Would this be a bug in Dataflow? Seems fundamental, so I'm concerned perhaps the Beam spec doesn't obligate ON_TIME firings?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com> wrote:
>>>>>>>
>>>>>>> I would have expected an empty on time pane since the default on time behavior is FIRE_ALWAYS.
>>>>>>>
>>>>>>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Can anyone confirm?
>>>>>>>>
>>>>>>>> This is intermittent. Some (it seems, sparse) windows don't get an ON_TIME firing after watermark. Is this a bug or is there a reason to not expect ON_TIME firings for every window?
>>>>>>>>
>>>>>>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>>>
>>>>>>>>> If it indeed happened as you have described, I will be very interested in the expected behaviour.
>>>>>>>>>
>>>>>>>>> Something I remembered before: the trigger condition meets just gives the runner/engine "permission" to fire, but runner/engine may not fire immediately. But I don't know if the engine/runner will guarantee to fire.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> I have the following trigger:
>>>>>>>>>>
>>>>>>>>>> .apply(Window
>>>>>>>>>>       .configure()
>>>>>>>>>>       .triggering(AfterWatermark
>>>>>>>>>>            .pastEndOfWindow()
>>>>>>>>>>            .withEarlyFirings(AfterPane
>>>>>>>>>>                 .elementCountAtLeast(1)))
>>>>>>>>>>       .accumulatingFiredPanes()
>>>>>>>>>>       .withAllowedLateness(Duration.ZERO)
>>>>>>>>>>
>>>>>>>>>> But in Dataflow I notice that I never get an ON_TIME firing for my window -- I only see early firing for elements, and then nothing.
>>>>>>>>>>
>>>>>>>>>> My assumption is that AfterWatermark should give me a last, on-time pane under this configuration when the watermark surpasses the window's end.
>>>>>>>>>>
>>>>>>>>>> Is my expectation correct?

Re: No AfterWatermark firings in Dataflow

Posted by Aaron Dixon <at...@gmail.com>.
Kenn, thank you! There is OnTimeBehavior (default FIRE_ALWAYS) and
ClosingBehavior (default FIRE_IF_NON_EMPTY). Given that OnTimeBehavior is
always-fire, shouldn't I see empty ON_TIME panes?

Since my lateness config is 0, I'm going to try ClosingBehavior =
FIRE_ALWAYS and see if I can rely on .isLast() to pick out the last pane
downstream. But curious if given that the OnTimeBehavior default is ALWAYS,
shouldn't I be seeing on-time panes in my current config?



On Mon, Jan 13, 2020 at 6:45 PM Kenneth Knowles <ke...@apache.org> wrote:

> On my phone, so I can't grab the jira so easily, but quickly: EARLY panes
> are "race condition equivalent" to ON_TIME panes. The early panes consume
> all the pending elements then the on time pane is "empty". This is WAI if
> it is what is causing it. You need to explicitly set
> Window.configure().fireAlways()*. I know this is counterintuitive in
> accumulating mode, where the empty pane is not the identity element.
>
> Kenn
>
> *I don't recall if this is the default or not, and also because on phone
> it is slow to look up. From your experience I think not default.
>
> On Mon, Jan 13, 2020, 15:03 Aaron Dixon <at...@gmail.com> wrote:
>
>> Any confirmation on this from anyone? Whether per Beam spec, runners are
>> obligated to send ON_TIME panes for AfterWatermark triggers? I'm stuck
>> because this seems fundamental, so it's hard to imagine this is a Dataflow
>> bug, but OTOH it's also hard to imagine that trigger specs like
>> AfterWatermark are "optional"... ?
>>
>> On Mon, Jan 13, 2020 at 4:18 PM Aaron Dixon <at...@gmail.com> wrote:
>>
>>> Yes. Using calendar day-based windows and watermark is completely caught
>>> up to today ... calendar window ends several days ago. I got EARLY panes
>>> for each element but never ON_TIME pane.
>>>
>>> On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> Is the watermark advancing past the end of the window?
>>>>
>>>> On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>
>>>>> The window is not empty fwiw; it has elements; I get an early firing
>>>>> pane for the window but well after the watermark passes there is no ON_TIME
>>>>> pane. Would this be a bug in Dataflow? Seems fundamental, so I'm concerned
>>>>> perhaps the Beam spec doesn't obligate ON_TIME firings?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> I would have expected an empty on time pane since the default on time
>>>>>> behavior is FIRE_ALWAYS.
>>>>>>
>>>>>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Can anyone confirm?
>>>>>>>
>>>>>>> This is intermittent. Some (it seems, sparse) windows don't get an
>>>>>>> ON_TIME firing after watermark. Is this a bug or is there a reason to not
>>>>>>> expect ON_TIME firings for every window?
>>>>>>>
>>>>>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>
>>>>>>>> If it indeed happened as you have described, I will be very
>>>>>>>> interested in the expected behaviour.
>>>>>>>>
>>>>>>>> Something I remembered before: the trigger condition meets just
>>>>>>>> gives the runner/engine "permission" to fire, but runner/engine may not
>>>>>>>> fire immediately. But I don't know if the engine/runner will guarantee to
>>>>>>>> fire.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have the following trigger:
>>>>>>>>>
>>>>>>>>> .apply(Window
>>>>>>>>>       .configure()
>>>>>>>>>       .triggering(AfterWatermark
>>>>>>>>>            .pastEndOfWindow()
>>>>>>>>>            .withEarlyFirings(AfterPane
>>>>>>>>>                 .elementCountAtLeast(1)))
>>>>>>>>>       .accumulatingFiredPanes()
>>>>>>>>>       .withAllowedLateness(Duration.ZERO)
>>>>>>>>>
>>>>>>>>> But in Dataflow I notice that I never get an ON_TIME firing for my
>>>>>>>>> window -- I only see early firing for elements, and then nothing.
>>>>>>>>>
>>>>>>>>> My assumption is that AfterWatermark should give me a last,
>>>>>>>>> on-time pane under this configuration when the watermark surpasses the
>>>>>>>>> window's end.
>>>>>>>>>
>>>>>>>>> Is my expectation correct?
>>>>>>>>>
>>>>>>>>

Re: No AfterWatermark firings in Dataflow

Posted by Kenneth Knowles <ke...@apache.org>.
On my phone, so I can't grab the jira so easily, but quickly: EARLY panes
are "race condition equivalent" to ON_TIME panes. The early panes consume
all the pending elements then the on time pane is "empty". This is WAI if
it is what is causing it. You need to explicitly set
Window.configure().fireAlways()*. I know this is counterintuitive in
accumulating mode, where the empty pane is not the identity element.

Kenn

*I don't recall if this is the default or not, and also because on phone it
is slow to look up. From your experience I think not default.

On Mon, Jan 13, 2020, 15:03 Aaron Dixon <at...@gmail.com> wrote:

> Any confirmation on this from anyone? Whether per Beam spec, runners are
> obligated to send ON_TIME panes for AfterWatermark triggers? I'm stuck
> because this seems fundamental, so it's hard to imagine this is a Dataflow
> bug, but OTOH it's also hard to imagine that trigger specs like
> AfterWatermark are "optional"... ?
>
> On Mon, Jan 13, 2020 at 4:18 PM Aaron Dixon <at...@gmail.com> wrote:
>
>> Yes. Using calendar day-based windows and watermark is completely caught
>> up to today ... calendar window ends several days ago. I got EARLY panes
>> for each element but never ON_TIME pane.
>>
>> On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> Is the watermark advancing past the end of the window?
>>>
>>> On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <at...@gmail.com> wrote:
>>>
>>>> The window is not empty fwiw; it has elements; I get an early firing
>>>> pane for the window but well after the watermark passes there is no ON_TIME
>>>> pane. Would this be a bug in Dataflow? Seems fundamental, so I'm concerned
>>>> perhaps the Beam spec doesn't obligate ON_TIME firings?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> I would have expected an empty on time pane since the default on time
>>>>> behavior is FIRE_ALWAYS.
>>>>>
>>>>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>>
>>>>>> Can anyone confirm?
>>>>>>
>>>>>> This is intermittent. Some (it seems, sparse) windows don't get an
>>>>>> ON_TIME firing after watermark. Is this a bug or is there a reason to not
>>>>>> expect ON_TIME firings for every window?
>>>>>>
>>>>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:
>>>>>>
>>>>>>> If it indeed happened as you have described, I will be very
>>>>>>> interested in the expected behaviour.
>>>>>>>
>>>>>>> Something I remembered before: the trigger condition meets just
>>>>>>> gives the runner/engine "permission" to fire, but runner/engine may not
>>>>>>> fire immediately. But I don't know if the engine/runner will guarantee to
>>>>>>> fire.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have the following trigger:
>>>>>>>>
>>>>>>>> .apply(Window
>>>>>>>>       .configure()
>>>>>>>>       .triggering(AfterWatermark
>>>>>>>>            .pastEndOfWindow()
>>>>>>>>            .withEarlyFirings(AfterPane
>>>>>>>>                 .elementCountAtLeast(1)))
>>>>>>>>       .accumulatingFiredPanes()
>>>>>>>>       .withAllowedLateness(Duration.ZERO)
>>>>>>>>
>>>>>>>> But in Dataflow I notice that I never get an ON_TIME firing for my
>>>>>>>> window -- I only see early firing for elements, and then nothing.
>>>>>>>>
>>>>>>>> My assumption is that AfterWatermark should give me a last, on-time
>>>>>>>> pane under this configuration when the watermark surpasses the window's end.
>>>>>>>>
>>>>>>>> Is my expectation correct?
>>>>>>>>
>>>>>>>

Re: No AfterWatermark firings in Dataflow

Posted by Aaron Dixon <at...@gmail.com>.
Any confirmation on this from anyone? Whether per Beam spec, runners are
obligated to send ON_TIME panes for AfterWatermark triggers? I'm stuck
because this seems fundamental, so it's hard to imagine this is a Dataflow
bug, but OTOH it's also hard to imagine that trigger specs like
AfterWatermark are "optional"... ?

On Mon, Jan 13, 2020 at 4:18 PM Aaron Dixon <at...@gmail.com> wrote:

> Yes. Using calendar day-based windows and watermark is completely caught
> up to today ... calendar window ends several days ago. I got EARLY panes
> for each element but never ON_TIME pane.
>
> On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik <lc...@google.com> wrote:
>
>> Is the watermark advancing past the end of the window?
>>
>> On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <at...@gmail.com> wrote:
>>
>>> The window is not empty fwiw; it has elements; I get an early firing
>>> pane for the window but well after the watermark passes there is no ON_TIME
>>> pane. Would this be a bug in Dataflow? Seems fundamental, so I'm concerned
>>> perhaps the Beam spec doesn't obligate ON_TIME firings?
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> I would have expected an empty on time pane since the default on time
>>>> behavior is FIRE_ALWAYS.
>>>>
>>>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>
>>>>> Can anyone confirm?
>>>>>
>>>>> This is intermittent. Some (it seems, sparse) windows don't get an
>>>>> ON_TIME firing after watermark. Is this a bug or is there a reason to not
>>>>> expect ON_TIME firings for every window?
>>>>>
>>>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> If it indeed happened as you have described, I will be very
>>>>>> interested in the expected behaviour.
>>>>>>
>>>>>> Something I remembered before: the trigger condition meets just gives
>>>>>> the runner/engine "permission" to fire, but runner/engine may not fire
>>>>>> immediately. But I don't know if the engine/runner will guarantee to fire.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I have the following trigger:
>>>>>>>
>>>>>>> .apply(Window
>>>>>>>       .configure()
>>>>>>>       .triggering(AfterWatermark
>>>>>>>            .pastEndOfWindow()
>>>>>>>            .withEarlyFirings(AfterPane
>>>>>>>                 .elementCountAtLeast(1)))
>>>>>>>       .accumulatingFiredPanes()
>>>>>>>       .withAllowedLateness(Duration.ZERO)
>>>>>>>
>>>>>>> But in Dataflow I notice that I never get an ON_TIME firing for my
>>>>>>> window -- I only see early firing for elements, and then nothing.
>>>>>>>
>>>>>>> My assumption is that AfterWatermark should give me a last, on-time
>>>>>>> pane under this configuration when the watermark surpasses the window's end.
>>>>>>>
>>>>>>> Is my expectation correct?
>>>>>>>
>>>>>>

Re: No AfterWatermark firings in Dataflow

Posted by Aaron Dixon <at...@gmail.com>.
Yes. Using calendar day-based windows and watermark is completely caught up
to today ... calendar window ends several days ago. I got EARLY panes for
each element but never ON_TIME pane.

On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik <lc...@google.com> wrote:

> Is the watermark advancing past the end of the window?
>
> On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <at...@gmail.com> wrote:
>
>> The window is not empty fwiw; it has elements; I get an early firing pane
>> for the window but well after the watermark passes there is no ON_TIME
>> pane. Would this be a bug in Dataflow? Seems fundamental, so I'm concerned
>> perhaps the Beam spec doesn't obligate ON_TIME firings?
>>
>>
>>
>>
>>
>> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> I would have expected an empty on time pane since the default on time
>>> behavior is FIRE_ALWAYS.
>>>
>>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com> wrote:
>>>
>>>> Can anyone confirm?
>>>>
>>>> This is intermittent. Some (it seems, sparse) windows don't get an
>>>> ON_TIME firing after watermark. Is this a bug or is there a reason to not
>>>> expect ON_TIME firings for every window?
>>>>
>>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:
>>>>
>>>>> If it indeed happened as you have described, I will be very interested
>>>>> in the expected behaviour.
>>>>>
>>>>> Something I remembered before: the trigger condition meets just gives
>>>>> the runner/engine "permission" to fire, but runner/engine may not fire
>>>>> immediately. But I don't know if the engine/runner will guarantee to fire.
>>>>>
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>>
>>>>>> I have the following trigger:
>>>>>>
>>>>>> .apply(Window
>>>>>>       .configure()
>>>>>>       .triggering(AfterWatermark
>>>>>>            .pastEndOfWindow()
>>>>>>            .withEarlyFirings(AfterPane
>>>>>>                 .elementCountAtLeast(1)))
>>>>>>       .accumulatingFiredPanes()
>>>>>>       .withAllowedLateness(Duration.ZERO)
>>>>>>
>>>>>> But in Dataflow I notice that I never get an ON_TIME firing for my
>>>>>> window -- I only see early firing for elements, and then nothing.
>>>>>>
>>>>>> My assumption is that AfterWatermark should give me a last, on-time
>>>>>> pane under this configuration when the watermark surpasses the window's end.
>>>>>>
>>>>>> Is my expectation correct?
>>>>>>
>>>>>

Re: No AfterWatermark firings in Dataflow

Posted by Luke Cwik <lc...@google.com>.
Is the watermark advancing past the end of the window?

On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <at...@gmail.com> wrote:

> The window is not empty fwiw; it has elements; I get an early firing pane
> for the window but well after the watermark passes there is no ON_TIME
> pane. Would this be a bug in Dataflow? Seems fundamental, so I'm concerned
> perhaps the Beam spec doesn't obligate ON_TIME firings?
>
>
>
>
>
> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com> wrote:
>
>> I would have expected an empty on time pane since the default on time
>> behavior is FIRE_ALWAYS.
>>
>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com> wrote:
>>
>>> Can anyone confirm?
>>>
>>> This is intermittent. Some (it seems, sparse) windows don't get an
>>> ON_TIME firing after watermark. Is this a bug or is there a reason to not
>>> expect ON_TIME firings for every window?
>>>
>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:
>>>
>>>> If it indeed happened as you have described, I will be very interested
>>>> in the expected behaviour.
>>>>
>>>> Something I remembered before: the trigger condition meets just gives
>>>> the runner/engine "permission" to fire, but runner/engine may not fire
>>>> immediately. But I don't know if the engine/runner will guarantee to fire.
>>>>
>>>>
>>>>
>>>> -Rui
>>>>
>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com> wrote:
>>>>
>>>>> I have the following trigger:
>>>>>
>>>>> .apply(Window
>>>>>       .configure()
>>>>>       .triggering(AfterWatermark
>>>>>            .pastEndOfWindow()
>>>>>            .withEarlyFirings(AfterPane
>>>>>                 .elementCountAtLeast(1)))
>>>>>       .accumulatingFiredPanes()
>>>>>       .withAllowedLateness(Duration.ZERO)
>>>>>
>>>>> But in Dataflow I notice that I never get an ON_TIME firing for my
>>>>> window -- I only see early firing for elements, and then nothing.
>>>>>
>>>>> My assumption is that AfterWatermark should give me a last, on-time
>>>>> pane under this configuration when the watermark surpasses the window's end.
>>>>>
>>>>> Is my expectation correct?
>>>>>
>>>>

Re: No AfterWatermark firings in Dataflow

Posted by Aaron Dixon <at...@gmail.com>.
The window is not empty fwiw; it has elements; I get an early firing pane
for the window but well after the watermark passes there is no ON_TIME
pane. Would this be a bug in Dataflow? Seems fundamental, so I'm concerned
perhaps the Beam spec doesn't obligate ON_TIME firings?





On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <lc...@google.com> wrote:

> I would have expected an empty on time pane since the default on time
> behavior is FIRE_ALWAYS.
>
> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com> wrote:
>
>> Can anyone confirm?
>>
>> This is intermittent. Some (it seems, sparse) windows don't get an
>> ON_TIME firing after watermark. Is this a bug or is there a reason to not
>> expect ON_TIME firings for every window?
>>
>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:
>>
>>> If it indeed happened as you have described, I will be very interested
>>> in the expected behaviour.
>>>
>>> Something I remembered before: the trigger condition meets just gives
>>> the runner/engine "permission" to fire, but runner/engine may not fire
>>> immediately. But I don't know if the engine/runner will guarantee to fire.
>>>
>>>
>>>
>>> -Rui
>>>
>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com> wrote:
>>>
>>>> I have the following trigger:
>>>>
>>>> .apply(Window
>>>>       .configure()
>>>>       .triggering(AfterWatermark
>>>>            .pastEndOfWindow()
>>>>            .withEarlyFirings(AfterPane
>>>>                 .elementCountAtLeast(1)))
>>>>       .accumulatingFiredPanes()
>>>>       .withAllowedLateness(Duration.ZERO)
>>>>
>>>> But in Dataflow I notice that I never get an ON_TIME firing for my
>>>> window -- I only see early firing for elements, and then nothing.
>>>>
>>>> My assumption is that AfterWatermark should give me a last, on-time
>>>> pane under this configuration when the watermark surpasses the window's end.
>>>>
>>>> Is my expectation correct?
>>>>
>>>

Re: No AfterWatermark firings in Dataflow

Posted by Luke Cwik <lc...@google.com>.
I would have expected an empty on time pane since the default on time
behavior is FIRE_ALWAYS.

On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <at...@gmail.com> wrote:

> Can anyone confirm?
>
> This is intermittent. Some (it seems, sparse) windows don't get an ON_TIME
> firing after watermark. Is this a bug or is there a reason to not expect
> ON_TIME firings for every window?
>
> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:
>
>> If it indeed happened as you have described, I will be very interested in
>> the expected behaviour.
>>
>> Something I remembered before: the trigger condition meets just gives the
>> runner/engine "permission" to fire, but runner/engine may not fire
>> immediately. But I don't know if the engine/runner will guarantee to fire.
>>
>>
>>
>> -Rui
>>
>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com> wrote:
>>
>>> I have the following trigger:
>>>
>>> .apply(Window
>>>       .configure()
>>>       .triggering(AfterWatermark
>>>            .pastEndOfWindow()
>>>            .withEarlyFirings(AfterPane
>>>                 .elementCountAtLeast(1)))
>>>       .accumulatingFiredPanes()
>>>       .withAllowedLateness(Duration.ZERO)
>>>
>>> But in Dataflow I notice that I never get an ON_TIME firing for my
>>> window -- I only see early firing for elements, and then nothing.
>>>
>>> My assumption is that AfterWatermark should give me a last, on-time pane
>>> under this configuration when the watermark surpasses the window's end.
>>>
>>> Is my expectation correct?
>>>
>>

Re: No AfterWatermark firings in Dataflow

Posted by Aaron Dixon <at...@gmail.com>.
Can anyone confirm?

This is intermittent. Some (it seems, sparse) windows don't get an ON_TIME
firing after watermark. Is this a bug or is there a reason to not expect
ON_TIME firings for every window?

On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <ru...@google.com> wrote:

> If it indeed happened as you have described, I will be very interested in
> the expected behaviour.
>
> Something I remembered before: the trigger condition meets just gives the
> runner/engine "permission" to fire, but runner/engine may not fire
> immediately. But I don't know if the engine/runner will guarantee to fire.
>
>
>
> -Rui
>
> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com> wrote:
>
>> I have the following trigger:
>>
>> .apply(Window
>>       .configure()
>>       .triggering(AfterWatermark
>>            .pastEndOfWindow()
>>            .withEarlyFirings(AfterPane
>>                 .elementCountAtLeast(1)))
>>       .accumulatingFiredPanes()
>>       .withAllowedLateness(Duration.ZERO)
>>
>> But in Dataflow I notice that I never get an ON_TIME firing for my window
>> -- I only see early firing for elements, and then nothing.
>>
>> My assumption is that AfterWatermark should give me a last, on-time pane
>> under this configuration when the watermark surpasses the window's end.
>>
>> Is my expectation correct?
>>
>

Re: No AfterWatermark firings in Dataflow

Posted by Rui Wang <ru...@google.com>.
If it indeed happened as you have described, I will be very interested in
the expected behaviour.

Something I remembered before: the trigger condition meets just gives the
runner/engine "permission" to fire, but runner/engine may not fire
immediately. But I don't know if the engine/runner will guarantee to fire.



-Rui

On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <at...@gmail.com> wrote:

> I have the following trigger:
>
> .apply(Window
>       .configure()
>       .triggering(AfterWatermark
>            .pastEndOfWindow()
>            .withEarlyFirings(AfterPane
>                 .elementCountAtLeast(1)))
>       .accumulatingFiredPanes()
>       .withAllowedLateness(Duration.ZERO)
>
> But in Dataflow I notice that I never get an ON_TIME firing for my window
> -- I only see early firing for elements, and then nothing.
>
> My assumption is that AfterWatermark should give me a last, on-time pane
> under this configuration when the watermark surpasses the window's end.
>
> Is my expectation correct?
>