You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 18:18:45 UTC

[GitHub] [beam] damccorm opened a new issue, #20630: df.groupby(df.group) produces duplicate column for some aggregation functons

damccorm opened a new issue, #20630:
URL: https://github.com/apache/beam/issues/20630

   It should be possible to use `df.groupby(df.group)` or `df.groupby('group')` and get the same result. Unfortunately for some aggregation functions (max, min, all, any), the former produces an output with an extraneous 'group' column. Note this doesn't happen for some functions, like size.
   
   In groupby, we should check if the the series is one of this dataframe's columns when setting the index: https://github.com/apache/beam/blob/cdb882d9ae554556156bff4843f18567b214df13/sdks/python/apache_beam/dataframe/frames.py#L156
   
   Imported from Jira [BEAM-11305](https://issues.apache.org/jira/browse/BEAM-11305). Original Jira may contain additional context.
   Reported by: bhulette.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org