You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Burke (Jira)" <ji...@apache.org> on 2021/03/05 18:11:00 UTC

[jira] [Created] (BEAM-11928) Go SDK should use the combine_globally urn for global combines.

Robert Burke created BEAM-11928:
-----------------------------------

             Summary: Go SDK should use the combine_globally urn for global combines.
                 Key: BEAM-11928
                 URL: https://issues.apache.org/jira/browse/BEAM-11928
             Project: Beam
          Issue Type: Improvement
          Components: sdk-go
            Reporter: Robert Burke


Reported on [https://stackoverflow.com/questions/66446338/issue-with-combine-function-in-apache-beam-go-sdk/66486052#66486052]

The root is that the Go SDK doesn't use the "beam:transform:combine_globally:v1" URN, and always uses "beam:transform:combine_per_key:v1" even for global combines, with a AddFixedKey DoFn.

URN in the proto: [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L347] 

Go SDK only having combine_per_key [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L42]

We currently "detect" combines via a CombinePerKey scope [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/edge.go#L434] 
added at beam.TryCombinePerKey 
[https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/combine.go#L58]

We convert combines into the CombinePayload here
[https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L253]
called above here: 
[https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L241] 

We probably want to just add a graph.CombineGlobal op ( vs the existing combine node), or modify the "CombinePerKey" scope hack to have a CombineCombineGlobal variant, or somehting that is cleaner than currently exists.



We'd also want to make sure the optimization takes place properly, which should be simple enough to detect timing wise at least once, if not as a regular benchmark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)