You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by pa...@apache.org on 2023/06/27 12:19:05 UTC

[arrow] branch main updated: GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305)

This is an automated email from the ASF dual-hosted git repository.

paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 7de273b4ee GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305)
7de273b4ee is described below

commit 7de273b4ee05278c02aab84d85d27fd17e532a00
Author: Dewey Dunnington <de...@voltrondata.com>
AuthorDate: Tue Jun 27 09:18:55 2023 -0300

    GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305)
    
    ### Rationale for this change
    
    As reported by @ eitsupi, dplyr adds missing grouping variables to the beginning of the variable list; however, we add them to the *end* of the variable list. Our general policy is to match dplyr's behaviour everywhere.
    
    Before this PR:
    
    ``` r
    library(arrow, warn.conflicts = FALSE)
    #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
    library(dplyr, warn.conflicts = FALSE)
    
    tibble::tibble(int = 1:4, chr = letters[1:4]) |>
      group_by(chr) |>
      select(int) |>
      collect()
    #> Adding missing grouping variables: `chr`
    #> # A tibble: 4 × 2
    #> # Groups:   chr [4]
    #>   chr     int
    #>   <chr> <int>
    #> 1 a         1
    #> 2 b         2
    #> 3 c         3
    #> 4 d         4
    
    arrow_table(int = 1:4, chr = letters[1:4]) |>
      group_by(chr) |>
      select(int) |>
      collect()
    #> # A tibble: 4 × 2
    #> # Groups:   chr [4]
    #>     int chr
    #>   <int> <chr>
    #> 1     1 a
    #> 2     2 b
    #> 3     3 c
    #> 4     4 d
    ```
    
    After this PR:
    
    ``` r
    library(arrow, warn.conflicts = FALSE)
    #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
    library(dplyr, warn.conflicts = FALSE)
    
    tibble::tibble(int = 1:4, chr = letters[1:4]) |>
      group_by(chr) |>
      select(int) |>
      collect()
    #> Adding missing grouping variables: `chr`
    #> # A tibble: 4 × 2
    #> # Groups:   chr [4]
    #>   chr     int
    #>   <chr> <int>
    #> 1 a         1
    #> 2 b         2
    #> 3 c         3
    #> 4 d         4
    
    arrow_table(int = 1:4, chr = letters[1:4]) |>
      group_by(chr) |>
      select(int) |>
      collect()
    #> # A tibble: 4 × 2
    #> # Groups:   chr [4]
    #>   chr     int
    #>   <chr> <int>
    #> 1 a         1
    #> 2 b         2
    #> 3 c         3
    #> 4 d         4
    ```
    
    ### Are these changes tested?
    
    Yes, a test was added.
    
    ### Are there any user-facing changes?
    
    Yes, column ordering will be different. This could be a breaking change because existing code that refers to columns by index may change; however, referring to a column by name is much more common.
    * Closes: #35534
    
    Authored-by: Dewey Dunnington <de...@voltrondata.com>
    Signed-off-by: Dewey Dunnington <de...@voltrondata.com>
---
 r/R/dplyr.R                          |  4 ++--
 r/tests/testthat/test-dplyr-select.R | 11 +++++++++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/r/R/dplyr.R b/r/R/dplyr.R
index 62b345e1ce..f11b88d301 100644
--- a/r/R/dplyr.R
+++ b/r/R/dplyr.R
@@ -316,8 +316,8 @@ ensure_group_vars <- function(x) {
     if (length(gv)) {
       # Add them back
       x$selected_columns <- c(
-        x$selected_columns,
-        make_field_refs(gv)
+        make_field_refs(gv),
+        x$selected_columns
       )
     }
   }
diff --git a/r/tests/testthat/test-dplyr-select.R b/r/tests/testthat/test-dplyr-select.R
index dff73c063b..ba2db175ef 100644
--- a/r/tests/testthat/test-dplyr-select.R
+++ b/r/tests/testthat/test-dplyr-select.R
@@ -28,6 +28,7 @@ test_that("Empty select returns no columns", {
     tbl
   )
 })
+
 test_that("Empty select still includes the group_by columns", {
   expect_message(
     compare_dplyr_binding(
@@ -38,6 +39,16 @@ test_that("Empty select still includes the group_by columns", {
   )
 })
 
+test_that("Missing grouping columns are added to the beginning of the list", {
+  expect_message(
+    compare_dplyr_binding(
+      .input %>% group_by(chr) %>% select(int) %>% collect(),
+      tbl
+    ),
+    "Adding missing grouping variables"
+  )
+})
+
 test_that("select/rename/rename_with", {
   compare_dplyr_binding(
     .input %>%