You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "D. Yang (Jira)" <ji...@apache.org> on 2019/12/09 11:23:00 UTC

[jira] [Created] (BEAM-8920) Go SDK: faster transforms/filter.Distinct with CombinePerKey

D. Yang created BEAM-8920:
-----------------------------

             Summary: Go SDK: faster transforms/filter.Distinct with CombinePerKey
                 Key: BEAM-8920
                 URL: https://issues.apache.org/jira/browse/BEAM-8920
             Project: Beam
          Issue Type: Improvement
          Components: sdk-go
            Reporter: D. Yang


The current implementation:

1. add fixed value 1: P<T> --> P<<T, 1>>
2. group by key: P<<T, 1>> --> GBK<T, 1>
3. drop the value: P<distinct T>

The new proposed implementation:
1. ditto
2. combine by key: P<<T, 1>> --> P<<distinct T, 1>>
3. ditto

CombinePerKey performs a pre-GBK ParDo, which is useful to reduce the shuffle size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)