You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2020/11/18 23:40:00 UTC

[jira] [Updated] (BEAM-11305) df.groupby(df.group) produces duplicate column for some aggregation functon

     [ https://issues.apache.org/jira/browse/BEAM-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Hulette updated BEAM-11305:
---------------------------------
    Status: Open  (was: Triage Needed)

> df.groupby(df.group) produces duplicate column for some aggregation functon
> ---------------------------------------------------------------------------
>
>                 Key: BEAM-11305
>                 URL: https://issues.apache.org/jira/browse/BEAM-11305
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: P2
>
> It should be possible to use {{df.groupby(df.group)}} or {{df.groupby('group')}} and get the same result. Unfortunately for some aggregation functions (max, min, all, any), the former produces an output with an extraneous 'group' column. Note this doesn't happen for some functions, like size.
> In groupby, we should check if the the series is one of this dataframe's columns when setting the index: https://github.com/apache/beam/blob/cdb882d9ae554556156bff4843f18567b214df13/sdks/python/apache_beam/dataframe/frames.py#L156



--
This message was sent by Atlassian Jira
(v8.3.4#803005)