You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/08/18 17:07:27 UTC

[jira] [Commented] (BEAM-1983) SDF should properly support windowed side inputs

    [ https://issues.apache.org/jira/browse/BEAM-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179898#comment-17179898 ] 

Beam JIRA Bot commented on BEAM-1983:
-------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean.


> SDF should properly support windowed side inputs
> ------------------------------------------------
>
>                 Key: BEAM-1983
>                 URL: https://issues.apache.org/jira/browse/BEAM-1983
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-apex, runner-dataflow, runner-direct, runner-flink, sdk-java-core
>            Reporter: Eugene Kirpichov
>            Priority: P2
>              Labels: stale-P2
>
> Currently there is no test coverage for Splittable DoFn + windowed side inputs, especially when not all of the side input windows are ready.
> Moreover, current implementation of SDF in the direct runner is definitely wrong: it uses a ParDoEvaluator to run the ProcessFn, and this ParDoEvaluator looks at the wrong windows to decide which windows are ready and which are not: https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoEvaluator.java#L134 - the WindowedValue in question is a KeyedWorkItem, and they are always in the global window, but the important windows are windows of elements inside this KWI's elementsIterable().
> The Flink implementation is also wrong in the same way.
> This JIRA is to:
> 1) add test coverage for this case
> 2) implement proper support in all runners
> I believe the easiest way to do 2) is to:
> - make SplittableParDo, in case the DoFn has side inputs, pre-explode windows before feeding them into GroupByKeyIntoKeyedWorkItems , so that the resulting KWI's have elements only in a single window
> - tweak runners to look at the proper window, and assert that there's only one window, while evaluating ProcessFn, in case the DoFn uses side inputs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)