You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2021/05/15 18:00:02 UTC

[jira] [Updated] (BEAM-11303) DataFrame GroupBy().size() aggregation produces incorrect results

     [ https://issues.apache.org/jira/browse/BEAM-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kenneth Knowles updated BEAM-11303:
-----------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Resolved)

Hello! Due to a bug in our Jira configuration, this issue had status:Resolved but resolution:Unresolved.

I am bulk editing these issues to have resolution:Fixed

If a different resolution is appropriate, please change it. To do this, click the "Resolve" button (you can do this even for closed issues) and set the Resolution field to the right value.

> DataFrame GroupBy().size() aggregation produces incorrect results
> -----------------------------------------------------------------
>
>                 Key: BEAM-11303
>                 URL: https://issues.apache.org/jira/browse/BEAM-11303
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.25.0
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: P2
>             Fix For: 2.26.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> size is treated as a liftable aggregation which assumes it is commutative and associative, but it's not actually associative. It can be lifted, but the post agg step needs to be a sum.
> This means the size aggregation will produce incorrect results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)