You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by pa...@apache.org on 2023/06/27 12:19:05 UTC
[arrow] branch main updated: GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305)
This is an automated email from the ASF dual-hosted git repository.
paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 7de273b4ee GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305)
7de273b4ee is described below
commit 7de273b4ee05278c02aab84d85d27fd17e532a00
Author: Dewey Dunnington <de...@voltrondata.com>
AuthorDate: Tue Jun 27 09:18:55 2023 -0300
GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305)
### Rationale for this change
As reported by @ eitsupi, dplyr adds missing grouping variables to the beginning of the variable list; however, we add them to the *end* of the variable list. Our general policy is to match dplyr's behaviour everywhere.
Before this PR:
``` r
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
library(dplyr, warn.conflicts = FALSE)
tibble::tibble(int = 1:4, chr = letters[1:4]) |>
group_by(chr) |>
select(int) |>
collect()
#> Adding missing grouping variables: `chr`
#> # A tibble: 4 × 2
#> # Groups: chr [4]
#> chr int
#> <chr> <int>
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
arrow_table(int = 1:4, chr = letters[1:4]) |>
group_by(chr) |>
select(int) |>
collect()
#> # A tibble: 4 × 2
#> # Groups: chr [4]
#> int chr
#> <int> <chr>
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
```
After this PR:
``` r
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
library(dplyr, warn.conflicts = FALSE)
tibble::tibble(int = 1:4, chr = letters[1:4]) |>
group_by(chr) |>
select(int) |>
collect()
#> Adding missing grouping variables: `chr`
#> # A tibble: 4 × 2
#> # Groups: chr [4]
#> chr int
#> <chr> <int>
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
arrow_table(int = 1:4, chr = letters[1:4]) |>
group_by(chr) |>
select(int) |>
collect()
#> # A tibble: 4 × 2
#> # Groups: chr [4]
#> chr int
#> <chr> <int>
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
```
### Are these changes tested?
Yes, a test was added.
### Are there any user-facing changes?
Yes, column ordering will be different. This could be a breaking change because existing code that refers to columns by index may change; however, referring to a column by name is much more common.
* Closes: #35534
Authored-by: Dewey Dunnington <de...@voltrondata.com>
Signed-off-by: Dewey Dunnington <de...@voltrondata.com>
---
r/R/dplyr.R | 4 ++--
r/tests/testthat/test-dplyr-select.R | 11 +++++++++++
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/r/R/dplyr.R b/r/R/dplyr.R
index 62b345e1ce..f11b88d301 100644
--- a/r/R/dplyr.R
+++ b/r/R/dplyr.R
@@ -316,8 +316,8 @@ ensure_group_vars <- function(x) {
if (length(gv)) {
# Add them back
x$selected_columns <- c(
- x$selected_columns,
- make_field_refs(gv)
+ make_field_refs(gv),
+ x$selected_columns
)
}
}
diff --git a/r/tests/testthat/test-dplyr-select.R b/r/tests/testthat/test-dplyr-select.R
index dff73c063b..ba2db175ef 100644
--- a/r/tests/testthat/test-dplyr-select.R
+++ b/r/tests/testthat/test-dplyr-select.R
@@ -28,6 +28,7 @@ test_that("Empty select returns no columns", {
tbl
)
})
+
test_that("Empty select still includes the group_by columns", {
expect_message(
compare_dplyr_binding(
@@ -38,6 +39,16 @@ test_that("Empty select still includes the group_by columns", {
)
})
+test_that("Missing grouping columns are added to the beginning of the list", {
+ expect_message(
+ compare_dplyr_binding(
+ .input %>% group_by(chr) %>% select(int) %>% collect(),
+ tbl
+ ),
+ "Adding missing grouping variables"
+ )
+})
+
test_that("select/rename/rename_with", {
compare_dplyr_binding(
.input %>%