You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Ankur Goenka (JIRA)" <ji...@apache.org> on 2018/08/01 19:13:00 UTC

[jira] [Commented] (BEAM-4826) Flink runner sends bad flatten to SDK

    [ https://issues.apache.org/jira/browse/BEAM-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565838#comment-16565838 ] 

Ankur Goenka commented on BEAM-4826:
------------------------------------

I see. 

The split of flatten is ok. I will see how we can structure the code to remove unused pPcollection in flatten or other transforms input.

Though unused outputs will still be present as parDo will generate them in the user code.

 

> Flink runner sends bad flatten to SDK
> -------------------------------------
>
>                 Key: BEAM-4826
>                 URL: https://issues.apache.org/jira/browse/BEAM-4826
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Henning Rohde
>            Assignee: Ankur Goenka
>            Priority: Major
>              Labels: portability
>
> For a Go flatten test w/ 3 input, the Flink runner splits this into 3 bundle descriptors. But it sends the original 3-input flatten but w/ 1 actual input present in each bundle descriptor. This is inconsistent and the SDK shouldn't expect dangling PCollections. In contrast, Dataflow removes the flatten when it does the same split.
> Snippet:
> register: <
>   process_bundle_descriptor: <
>     id: "3"
>     transforms: <
>       key: "e4"
>       value: <
>         unique_name: "github.com/apache/beam/sdks/go/pkg/beam.createFn'1"
>         spec: <
>           urn: "urn:beam:transform:pardo:v1"
>           payload: [...]
>         >
>         inputs: <
>           key: "i0"
>           value: "n3"
>         >
>         outputs: <
>           key: "i0"
>           value: "n4"
>         >
>       >
>     >
>     transforms: <
>       key: "e7"
>       value: <
>         unique_name: "Flatten"
>         spec: <
>           urn: "beam:transform:flatten:v1"
>         >
>         inputs: <
>           key: "i0"
>           value: "n2"
>         >
>         inputs: <
>           key: "i1"
>           value: "n4" . // <----------- only one present.
>         >
>         inputs: <
>           key: "i2"
>           value: "n6"
>         >
>         outputs: <
>           key: "i0"
>           value: "n7"
>         >
>       >
>     >
> [...]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)