You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2021/03/05 18:42:00 UTC

[jira] [Updated] (BEAM-11928) Go SDK should use the combine_globally urn for global combines.

     [ https://issues.apache.org/jira/browse/BEAM-11928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Hulette updated BEAM-11928:
---------------------------------
    Status: Open  (was: Triage Needed)

> Go SDK should use the combine_globally urn for global combines.
> ---------------------------------------------------------------
>
>                 Key: BEAM-11928
>                 URL: https://issues.apache.org/jira/browse/BEAM-11928
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Priority: P2
>
> Reported on [https://stackoverflow.com/questions/66446338/issue-with-combine-function-in-apache-beam-go-sdk/66486052#66486052]
> The root is that the Go SDK doesn't use the "beam:transform:combine_globally:v1" URN, and always uses "beam:transform:combine_per_key:v1" even for global combines, with a AddFixedKey DoFn.
> URN in the proto: [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L347] 
> Go SDK only having combine_per_key [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L42]
> We currently "detect" combines via a CombinePerKey scope [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/edge.go#L434] 
> added at beam.TryCombinePerKey 
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/combine.go#L58]
> We convert combines into the CombinePayload here
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L253]
> called above here: 
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L241] 
> We probably want to just add a graph.CombineGlobal op ( vs the existing combine node), or modify the "CombinePerKey" scope hack to have a CombineCombineGlobal variant, or somehting that is cleaner than currently exists.
> We'd also want to make sure the optimization takes place properly, which should be simple enough to detect timing wise at least once, if not as a regular benchmark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)