You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Kenneth Knowles (JIRA)" <ji...@apache.org> on 2017/06/01 17:53:04 UTC

[jira] [Comment Edited] (BEAM-101) Data-driven triggers

    [ https://issues.apache.org/jira/browse/BEAM-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033379#comment-16033379 ] 

Kenneth Knowles edited comment on BEAM-101 at 6/1/17 5:52 PM:
--------------------------------------------------------------

I believe you could address the use case in a couple of ways:

1. A {{DoFn}} that uses state and timers to implement this behavior. You can do essentially any custom triggering with this. The only issue is that your runner needs to support it.
2. The approach of a {{CombineFn}} does not work as described - you cannot apply it right at the GBK because the element type may not match. You cannot apply it right at the {{Window.into}} because the element may lead to many output elements and there's not really a good story around propagating metadata in that case. You could have a {{CombineFn<Instant, AccumT, Boolean>}} and it could work.

The other trouble is that including a {{CombineFn}} in a trigger is not as portable; it needs a different execution strategy that calls a UDF, possibly over the Fn API. Today, triggers are just syntax, so they can be executed easily and efficiently within a runner via any means.

The most coherent approach I know of for custom triggers (which is not really fleshed out) is to do the combine on a PCollection explicitly and then have a trigger that just references that PCollection. The runner then just needs to be able to decode a bool, not run a {{CombineFn}}.

In my mind, data-driven trigger means a trigger that is aware of the details of the data type. A timestamp-driven trigger would not really be data-driven in this way. But until we have some clear design for custom triggers, we could definitely consider adding new syntax to triggers for particular common uses. If the existing solutions don't work for you, please open a JIRA for your specific use case.


was (Author: kenn):
I believe you could address the use case in a couple of ways:

1. A {{DoFn}} that uses state and timers to implement this behavior. You can do essentially any custom triggering with this. The only issue is that your runner needs to support it.
2. The approach of a {{CombineFn}} does not work as described - you cannot apply it right at the GBK because the element type may not match. You cannot apply it right at the {{Window.into}} because the element may lead to many output elements and there's not really a good story around propagating metadata in that case. You could have a {{CombineFn<Instant, AccumT, Boolean>}} and it could work.

The other trouble is that including a {{CombineFn}} in a trigger is not as portable; it needs a different execution strategy that calls a UDF, possibly over the Fn API. Today, triggers are just syntax, so they can be executed. The most coherent approach I know of (which is not really fleshed out) is to do the combine on a PCollection explicitly and then have a trigger that just references that PCollection. The runner then just needs to be able to decode a bool, not run a {{CombineFn}}.

In my mind, data-driven trigger means a trigger that is aware of the details of the data type. A timestamp-driven trigger would not really be data-driven in this way. But until we have some clear design for custom triggers, we could definitely consider adding new syntax to triggers for particular common uses. If the existing solutions don't work for you, please open a JIRA for your specific use case.

> Data-driven triggers
> --------------------
>
>                 Key: BEAM-101
>                 URL: https://issues.apache.org/jira/browse/BEAM-101
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Robert Bradshaw
>
> For some applications, it's useful to declare a pane/window to be emitted (or finished) based on its contents. The simplest of these is the AfterCount trigger, but more sophisticated predicates could be constructed.
> The requirements for consistent trigger firing are essentially that the state of the trigger form a lattice and that the "should fire?" question is a monotonic predicate on the lattice. Basically it asks "are we high enough up the lattice?"
> Because the element types may change between the application of Windowing and the actuation of the trigger, one idea is to extract the relevant data from the element at Windowing and pass it along implicitly where it can be combined and inspected in a type safe way later (similar to how timestamps and windows are implicitly passed with elements).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)