You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Sam Albers (Jira)" <ji...@apache.org> on 2022/02/14 18:50:00 UTC

[jira] [Created] (ARROW-15679) count should return an ungrouped dataframe

Sam Albers created ARROW-15679:
----------------------------------

             Summary: count should return an ungrouped dataframe
                 Key: ARROW-15679
                 URL: https://issues.apache.org/jira/browse/ARROW-15679
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 7.0.0
            Reporter: Sam Albers


Unless grouped before `dplyr::count` returns a ungrouped data.frame. The arrow implement preserves the grouping variables:



 
{code:java}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
tf1 <- tempfile()
dir.create(tf1)
starwars |>
  write_dataset(tf1)

# no group ----------------------------------------------------------------
## dplyr behaviour
count_dplyr_no_group <- starwars %>%
  count(gender, homeworld, species)
group_vars(count_dplyr_no_group)
#> character(0)
## arrow behaviour
count_arrow_no_group <- open_dataset(tf1) %>%
  count(gender, homeworld, species) %>%
  collect()
group_vars(count_arrow_no_group)
#> [1] "gender"    "homeworld"
{code}
If I am correct that this is a undesired behaviour I think it can be fixed [here|https://github.com/apache/arrow/blob/5ad5ddcafee8fada9cebb341df638b750c98efb7/r/R/dplyr-count.R#L20-L35] using this patch:



 
{code:java}
count.arrow_dplyr_query <- function(x, ..., wt = NULL, sort = FALSE, name = NULL) {
  if (!missing(...)) {
    out <- dplyr::group_by(x, ..., .add = TRUE)
  } else {
    out <- x
  }
  out <- dplyr::tally(out, wt = {{ wt }}, sort = sort, name = name)

  gv <- dplyr::group_vars(x)
  if (rlang::is_empty(gv)) {
    out <- dplyr::ungroup(out)
  } else {
    # Restore original group vars
    out$group_by_vars <- gv
  }
  out
}
{code}
 


I can submit a PR with some tests if that would be helpful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)