You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2023/06/26 14:43:55 UTC

[GitHub] [arrow] paleolimbot opened a new pull request, #36305: GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list

paleolimbot opened a new pull request, #36305:
URL: https://github.com/apache/arrow/pull/36305

   ### Rationale for this change
   
   As reported by @eitsupi, dplyr adds missing grouping variables to the beginning of the variable list; however, we add them to the *end* of the variable list. Our general policy is to match dplyr's behaviour everywhere.
   
   Before this PR:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   library(dplyr, warn.conflicts = FALSE)
   
   tibble::tibble(int = 1:4, chr = letters[1:4]) |> 
     group_by(chr) |> 
     select(int) |> 
     collect()
   #> Adding missing grouping variables: `chr`
   #> # A tibble: 4 × 2
   #> # Groups:   chr [4]
   #>   chr     int
   #>   <chr> <int>
   #> 1 a         1
   #> 2 b         2
   #> 3 c         3
   #> 4 d         4
   
   arrow_table(int = 1:4, chr = letters[1:4]) |> 
     group_by(chr) |> 
     select(int) |> 
     collect()
   #> # A tibble: 4 × 2
   #> # Groups:   chr [4]
   #>     int chr  
   #>   <int> <chr>
   #> 1     1 a    
   #> 2     2 b    
   #> 3     3 c    
   #> 4     4 d
   ```
   
   After this PR:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   library(dplyr, warn.conflicts = FALSE)
   
   tibble::tibble(int = 1:4, chr = letters[1:4]) |> 
     group_by(chr) |> 
     select(int) |> 
     collect()
   #> Adding missing grouping variables: `chr`
   #> # A tibble: 4 × 2
   #> # Groups:   chr [4]
   #>   chr     int
   #>   <chr> <int>
   #> 1 a         1
   #> 2 b         2
   #> 3 c         3
   #> 4 d         4
   
   arrow_table(int = 1:4, chr = letters[1:4]) |> 
     group_by(chr) |> 
     select(int) |> 
     collect()
   #> # A tibble: 4 × 2
   #> # Groups:   chr [4]
   #>   chr     int
   #>   <chr> <int>
   #> 1 a         1
   #> 2 b         2
   #> 3 c         3
   #> 4 d         4
   ```
   
   ### Are these changes tested?
   
   Yes, a test was added.
   
   ### Are there any user-facing changes?
   
   Yes, column ordering will be different. This could be a breaking change because existing code that refers to columns by index may change; however, referring to a column by name is much more common.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36305: GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36305:
URL: https://github.com/apache/arrow/pull/36305#issuecomment-1607639755

   :warning: GitHub issue #35534 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36305: GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36305:
URL: https://github.com/apache/arrow/pull/36305#issuecomment-1613964536

   Conbench analyzed the 6 benchmark runs on commit `7de273b4`.
   
   There was 1 benchmark result indicating a performance regression:
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-06-28 18:14:26Z](http://conbench.ursa.dev/compare/runs/b5d584ed630f47aaad7c14e02b28c5cc...823e024dd0ae471dbdf98609dd25e495/)
     - [params=<STATIC_VECTOR(std::shared_ptr<int>)>, source=cpp-micro, suite=arrow-small-vector-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649c5206c3b7f228000a8bbd8f35624...0649c78c3270758a80001f2cc5e1d9e4)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14673123880) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot merged pull request #36305: GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot merged PR #36305:
URL: https://github.com/apache/arrow/pull/36305


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org