You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2020/10/29 20:26:00 UTC

[jira] [Updated] (BEAM-11154) Missing coder in pipeline components with dataflow runner v2

     [ https://issues.apache.org/jira/browse/BEAM-11154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kenneth Knowles updated BEAM-11154:
-----------------------------------
    Status: Open  (was: Triage Needed)

> Missing coder in pipeline components with dataflow runner v2
> ------------------------------------------------------------
>
>                 Key: BEAM-11154
>                 URL: https://issues.apache.org/jira/browse/BEAM-11154
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Yichi Zhang
>            Assignee: Yichi Zhang
>            Priority: P2
>
> When running pipelines with Top combine function on dataflow runner v2, the backend complains about missing coder id for example missing BoundedHeapCoder1.
> After some troubleshooting this problem seems more generic:
> The step context translation phase would not recognize already registered Coder with incorrect hashCode() function, and will try to give it a new uniqified name to the pipeline_proto_coder_id,
> code pointers:
> https://github.com/apache/beam/blob/5675108933de6eb601ca2e4f21870d2ababe0ec7/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L268
> In this case, since the comparator field in BoundedHeapCoder often does not implement hashCode() and equals() the BoundedHeapCoder will also have a different hashCode() each time a new instance is created. The duplicated coder does not exist in already translated pipeline proto and will lead to the aforementioned missing coder id issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)