You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/21 15:53:00 UTC

[jira] [Work logged] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

     [ https://issues.apache.org/jira/browse/BEAM-2535?focusedWorklogId=82787&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82787 ]

ASF GitHub Bot logged work on BEAM-2535:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Mar/18 15:52
            Start Date: 21/Mar/18 15:52
    Worklog Time Spent: 10m 
      Work Description: tgroh commented on a change in pull request #4700: [BEAM-2535] Added control to set output timestamp independent of firing time for event time timers. (Direct Runner implementation)
URL: https://github.com/apache/beam/pull/4700#discussion_r176134308
 
 

 ##########
 File path: runners/direct-java/src/main/java/org/apache/beam/runners/direct/StatefulParDoEvaluatorFactory.java
 ##########
 @@ -232,10 +245,17 @@ public Runnable load(
       implements TransformEvaluator<KeyedWorkItem<K, KV<K, InputT>>> {
 
     private final DoFnLifecycleManagerRemovingTransformEvaluator<KV<K, InputT>> delegateEvaluator;
+    DirectStepContext stepContext;
+
+    EvaluationContext e;
 
     public StatefulParDoEvaluator(
-        DoFnLifecycleManagerRemovingTransformEvaluator<KV<K, InputT>> delegateEvaluator) {
+        DoFnLifecycleManagerRemovingTransformEvaluator<KV<K, InputT>> delegateEvaluator,
+        DirectStepContext stepContext,
 
 Review comment:
   It's pretty important to document that these are owned by the `StatefulParDoEvaluator` - otherwise the mutations aren't ok.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 82787)
    Time Spent: 3h 20m  (was: 3h 10m)

> Allow explicit output time independent of firing specification for all timers
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-2535
>                 URL: https://issues.apache.org/jira/browse/BEAM-2535
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model, sdk-java-core
>            Reporter: Kenneth Knowles
>            Assignee: Batkhuyag Batsaikhan
>            Priority: Major
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the time of processing.
> But for both of these, we may want to reserve the right to output a particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making sure output is not droppable, but does not fully explain window expiration and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, timers should be viewed as another channel of input, with a watermark, and items on that channel _all need event time timestamps even if they are delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be separated (with nice defaults) from the specification of the event time of resulting outputs. These timestamps will determine a side channel with a new "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum of the input watermark and the timer watermark. In this way, whenever a timer is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is expired; this may be as simple as exhausting the timer channel as soon as the input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It seems reasonable to use timers as an implementation detail (e.g. in runners-core utilities) without wanting any of this additional machinery. For example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)