You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2021/07/24 14:18:00 UTC
[jira] [Resolved] (ARROW-13434) [R] group_by() with an unnammed
expression
[ https://issues.apache.org/jira/browse/ARROW-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Keane resolved ARROW-13434.
------------------------------------
Fix Version/s: 6.0.0
Resolution: Fixed
Issue resolved by pull request 10785
[https://github.com/apache/arrow/pull/10785]
> [R] group_by() with an unnammed expression
> ------------------------------------------
>
> Key: ARROW-13434
> URL: https://issues.apache.org/jira/browse/ARROW-13434
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Jonathan Keane
> Assignee: Jonathan Keane
> Priority: Major
> Labels: pull-request-available
> Fix For: 6.0.0
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> With dplyr, when we group_by with an unnamed expression, a column is added to the dataframe that has the result of the expression.
> {code}
> > example_data %>%
> + group_by(int < 4) %>% collect()
> # A tibble: 10 x 8
> # Groups: int < 4 [3]
> int dbl dbl2 lgl false chr fct `int < 4`
> <int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>
> 1 1 1.1 5 TRUE FALSE a a TRUE
> 2 2 2.1 5 NA FALSE b b TRUE
> 3 3 3.1 5 TRUE FALSE c c TRUE
> 4 NA 4.1 5 FALSE FALSE d d NA
> 5 5 5.1 5 TRUE FALSE e NA FALSE
> 6 6 6.1 5 NA FALSE NA NA FALSE
> 7 7 7.1 5 NA FALSE g g FALSE
> 8 8 8.1 5 FALSE FALSE h h FALSE
> 9 9 NA 5 FALSE FALSE i i FALSE
> 10 10 10.1 5 NA FALSE j j FALSE
> {code}
> Arrow doesn't do this, however because we (currently) only add columns when the expression is named.
> {code}
> > Table$create(example_data) %>%
> + group_by(int < 4) %>% collect()
> Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
> dbl: double
> dbl2: double
> lgl: bool
> false: bool
> chr: string
> fct: dictionary<values=string, indices=int8, ordered=0>
> {code}
> This isn't a big deal right now since grouped aggregations aren't (quite) here yet, but once we start having support for that, we will have people using examples like this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)