You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "eitsupi (via GitHub)" <gi...@apache.org> on 2023/05/05 09:12:09 UTC
[GitHub] [arrow] eitsupi opened a new issue, #35445: [R] Behavior something like `group_by(foo) |> across(everything())` is different from dplyr
eitsupi opened a new issue, #35445:
URL: https://github.com/apache/arrow/issues/35445
### Describe the bug, including details regarding any error messages, version, and platform.
In dplyr, I believe that using `across(everything())` on a grouped data frame will not select the column used for grouping.
``` r
mtcars |>
dplyr::group_by(cyl) |>
dplyr::summarise(dplyr::across(everything(), sum))
#> # A tibble: 3 × 11
#> cyl mpg disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 293. 1156. 909 44.8 25.1 211. 10 8 45 17
#> 2 6 138. 1283. 856 25.1 21.8 126. 4 3 27 24
#> 3 8 211. 4943. 2929 45.2 56.0 235. 0 2 46 49
```
<sup>Created on 2023-05-05 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
However, arrow does not seem to exclude the columns used for grouping. The following example results in an error.
(I installed arrow 12.0.0.20230503 from R-universe)
``` r
mtcars |>
arrow::as_arrow_table() |>
dplyr::group_by(cyl) |>
dplyr::summarise(dplyr::across(everything(), sum)) |>
dplyr::collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! Invalid: Multiple matches for FieldRef.Name(cyl) in mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#> cyl: double
#> Backtrace:
#> ▆
#> 1. ├─dplyr::collect(...)
#> 2. └─arrow:::collect.arrow_dplyr_query(...)
#> 3. └─arrow:::compute.arrow_dplyr_query(x)
#> 4. └─base::tryCatch(...)
#> 5. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 6. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 7. └─value[[3L]](cond)
#> 8. └─arrow:::augment_io_error_msg(e, call, schema = schema())
#> 9. └─rlang::abort(msg, call = call)
```
<sup>Created on 2023-05-05 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] thisisnic commented on issue #35445: [R] Behavior something like `group_by(foo) |> across(everything())` is different from dplyr
Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on issue #35445:
URL: https://github.com/apache/arrow/issues/35445#issuecomment-1536487182
Thanks for reporting this @eitsupi; can confirm this is reproducible and is a bug we should fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] thisisnic closed issue #35445: [R] Behavior something like `group_by(foo) |> across(everything())` is different from dplyr
Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic closed issue #35445: [R] Behavior something like `group_by(foo) |> across(everything())` is different from dplyr
URL: https://github.com/apache/arrow/issues/35445
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org