You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2021/02/25 17:19:00 UTC

[jira] [Commented] (BEAM-11305) df.groupby(df.group) produces duplicate column for some aggregation functons

    [ https://issues.apache.org/jira/browse/BEAM-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291057#comment-17291057 ] 

Beam JIRA Bot commented on BEAM-11305:
--------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean.


> df.groupby(df.group) produces duplicate column for some aggregation functons
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-11305
>                 URL: https://issues.apache.org/jira/browse/BEAM-11305
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.25.0
>            Reporter: Brian Hulette
>            Priority: P2
>              Labels: stale-P2
>
> It should be possible to use {{df.groupby(df.group)}} or {{df.groupby('group')}} and get the same result. Unfortunately for some aggregation functions (max, min, all, any), the former produces an output with an extraneous 'group' column. Note this doesn't happen for some functions, like size.
> In groupby, we should check if the the series is one of this dataframe's columns when setting the index: https://github.com/apache/beam/blob/cdb882d9ae554556156bff4843f18567b214df13/sdks/python/apache_beam/dataframe/frames.py#L156



--
This message was sent by Atlassian Jira
(v8.3.4#803005)