You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2021/03/05 18:42:00 UTC
[jira] [Updated] (BEAM-11928) Go SDK should use the
combine_globally urn for global combines.
[ https://issues.apache.org/jira/browse/BEAM-11928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Hulette updated BEAM-11928:
---------------------------------
Status: Open (was: Triage Needed)
> Go SDK should use the combine_globally urn for global combines.
> ---------------------------------------------------------------
>
> Key: BEAM-11928
> URL: https://issues.apache.org/jira/browse/BEAM-11928
> Project: Beam
> Issue Type: Improvement
> Components: sdk-go
> Reporter: Robert Burke
> Priority: P2
>
> Reported on [https://stackoverflow.com/questions/66446338/issue-with-combine-function-in-apache-beam-go-sdk/66486052#66486052]
> The root is that the Go SDK doesn't use the "beam:transform:combine_globally:v1" URN, and always uses "beam:transform:combine_per_key:v1" even for global combines, with a AddFixedKey DoFn.
> URN in the proto: [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L347]
> Go SDK only having combine_per_key [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L42]
> We currently "detect" combines via a CombinePerKey scope [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/edge.go#L434]
> added at beam.TryCombinePerKey
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/combine.go#L58]
> We convert combines into the CombinePayload here
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L253]
> called above here:
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L241]
> We probably want to just add a graph.CombineGlobal op ( vs the existing combine node), or modify the "CombinePerKey" scope hack to have a CombineCombineGlobal variant, or somehting that is cleaner than currently exists.
> We'd also want to make sure the optimization takes place properly, which should be simple enough to detect timing wise at least once, if not as a regular benchmark.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)