You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Burke (Jira)" <ji...@apache.org> on 2019/12/16 19:46:00 UTC

[jira] [Closed] (BEAM-8920) Go SDK: faster transforms/filter.Distinct with CombinePerKey

     [ https://issues.apache.org/jira/browse/BEAM-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Burke closed BEAM-8920.
------------------------------
    Fix Version/s: Not applicable
       Resolution: Fixed

[~stephydx] resolved this in the linked PR. Thanks!

> Go SDK: faster transforms/filter.Distinct with CombinePerKey
> ------------------------------------------------------------
>
>                 Key: BEAM-8920
>                 URL: https://issues.apache.org/jira/browse/BEAM-8920
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: D. Yang
>            Priority: Minor
>             Fix For: Not applicable
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current implementation:
> 1. add fixed value 1: P<T> --> P<<T, 1>>
> 2. group by key: P<<T, 1>> --> GBK<T, 1>
> 3. drop the value: P<distinct T>
> The new proposed implementation:
> 1. ditto
> 2. combine by key: P<<T, 1>> --> P<<distinct T, 1>>
> 3. ditto
> CombinePerKey performs a pre-GBK ParDo, which is useful to reduce the shuffle size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)