You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Daniel Halperin (JIRA)" <ji...@apache.org> on 2017/05/09 17:35:04 UTC

[jira] [Comment Edited] (BEAM-1283) DoFn finishBundle should be required to specify the window for output

    [ https://issues.apache.org/jira/browse/BEAM-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003130#comment-16003130 ] 

Daniel Halperin edited comment on BEAM-1283 at 5/9/17 5:34 PM:
---------------------------------------------------------------

[~sb2nov] This is marked as a Fix for 2.0.0, but the PR #2753 does not appear to have been CPed to {{release-2.0.0}} branch (from what I can tell in JIRA history).

I think you need to send a CP to the release branch before we can close.

[Otherwise, we need to revert as this is a backwards-incompatible change.]


was (Author: dhalperi@google.com):
[~sb2nov] This is marked as a Fix for 2.0.0, but the PR #2753 does not appear to have been CPed to {{release-2.0.0}} branch (from what I can tell in JIRA history).

Do you want to close it and take it off of Fix Version 2.0.0, or will you send a CP to the release branch?

> DoFn finishBundle should be required to specify the window for output
> ---------------------------------------------------------------------
>
>                 Key: BEAM-1283
>                 URL: https://issues.apache.org/jira/browse/BEAM-1283
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model, sdk-java-core, sdk-py
>            Reporter: Kenneth Knowles
>            Assignee: Sourabh Bajaj
>              Labels: backward-incompatible
>             Fix For: 2.0.0
>
>
> The spec is here in Javadoc: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L128
> "If invoked from {{@StartBundle}} or {{@FinishBundle}}, this will attempt to use the {{WindowFn}} of the input {{PCollection}} to determine what windows the element should be in, throwing an exception if the {{WindowFn}} attempts to access any information about the input element. The output element will have a timestamp of negative infinity."
> This is a collection of caveats that make this method not always technically wrong, but quite a mess. Ideas that reasonable folks have suggested lately:
>  - The {{WindowFn}} cannot actually be applied because {{WindowFn}} is allowed to see the element type. The spec just avoids this by limiting which {{WindowFn}} can be used.
>  - There is no natural output timestamp, so it should always be provided. The spec avoids this by specifying an arbitrary and fairly useless timestamp.
>  - If it is a merging {{WindowFn}} like sessions that has already been merged then you'll just have a bogus proto window regardless of explicit timestamp or not.
> The use cases for these methods are best addressed by state plus window expiry callback, so we should revisit this spec and probably just wipe it.
> There are some rare case where you might need to output from {{FinishBundle}} in a way that is not _actually_ sensitive to bundling (perhaps modulo some downstream notion of equivalence) in which case you had better know what window you are outputting to. Often it should be the global window.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)