You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/05/06 03:12:12 UTC

[jira] [Commented] (BEAM-34) Change default output timestamp for combined values to end of window

    [ https://issues.apache.org/jira/browse/BEAM-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273577#comment-15273577 ] 

ASF GitHub Bot commented on BEAM-34:
------------------------------------

GitHub user kennknowles opened a pull request:

    https://github.com/apache/incubator-beam/pull/296

    [BEAM-34][BEAM-145] Make WindowingStrategy combine WindowFn with OutputTimeFn

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [x] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [x] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [x] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt).
    
    ---
    
    Previously:
    
     - Any user-specified OutputTimeFn overrode the WindowFn#getOutputTime
     - WindowFn#getOutputTimeFn provided a default OutputTimeFn
     - The default varied from "earliest" to "end of window"
    
    Now:
    
     - The user-specified OutputTimeFn is used to combine the WindowFn's
       assigned output timestamps.
     - The WindowFn does not provide the default.
     - The default is always to output at end of window.
    
    For each of the tests that this effects, I had a choice: either update the timestamps in the test to be the end of window, or explicitly reset the windowing strategy to choose the minimum timestamp. The latter generally gets more useful coverage, since the latter is fairly trivial, so I generally favored it. It is also easier to migrate to. And most of the tests are overspecified anyhow and should not be examining the timestamps.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kennknowles/incubator-beam OutputAtEndOfWindow

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #296
    
----
commit 3aec78e34c1f1a1045d091048d3fa018a7cc0d3d
Author: Kenneth Knowles <kl...@google.com>
Date:   2016-05-06T02:33:16Z

    Make WindowingStrategy combine WindowFn with OutputTimeFn
    
    Previously:
    
     - Any user-specified OutputTimeFn overrode the WindowFn#getOutputTime
     - WindowFn#getOutputTimeFn provided a default OutputTimeFn
     - The default varied from "earliest" to "end of window"
    
    Now:
    
     - The user-specified OutputTimeFn is used to combine the WindowFn's
       assigned output timestamps.
     - The WindowFn does not provide the default.
     - The default is always to output at end of window.

----


> Change default output timestamp for combined values to end of window
> --------------------------------------------------------------------
>
>                 Key: BEAM-34
>                 URL: https://issues.apache.org/jira/browse/BEAM-34
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Kenneth Knowles
>            Assignee: Kenneth Knowles
>              Labels: Windowing
>
> Today, the output timestamp for a combined value (including iterables output by GroupByKey) is the earliest non-late timestamp of any element combined together. The user can customize this by specifying an OutputTimeFn, for example choosing the latest non-late timestamp, or the end of the window.
> In many respects, the end of the window is the best behaved as a default, and is very easy to make performant. We have deferred making this change for backwards compatibility; this is a good time to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)