You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2021/07/24 14:18:00 UTC

[jira] [Resolved] (ARROW-13434) [R] group_by() with an unnammed expression

     [ https://issues.apache.org/jira/browse/ARROW-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Keane resolved ARROW-13434.
------------------------------------
    Fix Version/s: 6.0.0
       Resolution: Fixed

Issue resolved by pull request 10785
[https://github.com/apache/arrow/pull/10785]

> [R] group_by() with an unnammed expression
> ------------------------------------------
>
>                 Key: ARROW-13434
>                 URL: https://issues.apache.org/jira/browse/ARROW-13434
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Jonathan Keane
>            Assignee: Jonathan Keane
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 6.0.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> With dplyr, when we group_by with an unnamed expression, a column is added to the dataframe that has the result of the expression.
> {code}
> > example_data %>% 
> +   group_by(int < 4) %>% collect()
> # A tibble: 10 x 8
> # Groups:   int < 4 [3]
>      int   dbl  dbl2 lgl   false chr   fct   `int < 4`
>    <int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>    
>  1     1   1.1     5 TRUE  FALSE a     a     TRUE     
>  2     2   2.1     5 NA    FALSE b     b     TRUE     
>  3     3   3.1     5 TRUE  FALSE c     c     TRUE     
>  4    NA   4.1     5 FALSE FALSE d     d     NA       
>  5     5   5.1     5 TRUE  FALSE e     NA    FALSE    
>  6     6   6.1     5 NA    FALSE NA    NA    FALSE    
>  7     7   7.1     5 NA    FALSE g     g     FALSE    
>  8     8   8.1     5 FALSE FALSE h     h     FALSE    
>  9     9  NA       5 FALSE FALSE i     i     FALSE    
> 10    10  10.1     5 NA    FALSE j     j     FALSE    
> {code}
> Arrow doesn't do this, however because we (currently) only add columns when the expression is named.
> {code}
> > Table$create(example_data) %>% 
> +   group_by(int < 4) %>% collect()
>  Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
> dbl: double
> dbl2: double
> lgl: bool
> false: bool
> chr: string
> fct: dictionary<values=string, indices=int8, ordered=0> 
> {code}
> This isn't a big deal right now since grouped aggregations aren't (quite) here yet, but once we start having support for that, we will have people using examples like this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)