You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2021/07/22 14:04:00 UTC

[jira] [Updated] (ARROW-13434) [R] group_by() with an unnammed expression

     [ https://issues.apache.org/jira/browse/ARROW-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Keane updated ARROW-13434:
-----------------------------------
    Summary: [R] group_by() with an unnammed expression  (was: [R] group_by() with an expression)

> [R] group_by() with an unnammed expression
> ------------------------------------------
>
>                 Key: ARROW-13434
>                 URL: https://issues.apache.org/jira/browse/ARROW-13434
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Jonathan Keane
>            Priority: Major
>
> With dplyr, when we group_by with an expression, a column is added to the dataframe that has the result of the expression.
> {code}
> > example_data %>% 
> +   group_by(int < 4) %>% collect()
> # A tibble: 10 x 8
> # Groups:   int < 4 [3]
>      int   dbl  dbl2 lgl   false chr   fct   `int < 4`
>    <int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>    
>  1     1   1.1     5 TRUE  FALSE a     a     TRUE     
>  2     2   2.1     5 NA    FALSE b     b     TRUE     
>  3     3   3.1     5 TRUE  FALSE c     c     TRUE     
>  4    NA   4.1     5 FALSE FALSE d     d     NA       
>  5     5   5.1     5 TRUE  FALSE e     NA    FALSE    
>  6     6   6.1     5 NA    FALSE NA    NA    FALSE    
>  7     7   7.1     5 NA    FALSE g     g     FALSE    
>  8     8   8.1     5 FALSE FALSE h     h     FALSE    
>  9     9  NA       5 FALSE FALSE i     i     FALSE    
> 10    10  10.1     5 NA    FALSE j     j     FALSE    
> {code}
> Arrow doesn't do this, however:
> {code}
> > Table$create(example_data) %>% 
> +   group_by(int < 4) %>% collect()
>  Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
> dbl: double
> dbl2: double
> lgl: bool
> false: bool
> chr: string
> fct: dictionary<values=string, indices=int8, ordered=0> 
> {code}
> This isn't a big deal right now since grouped aggregations aren't (quite) here yet, but once we start having support for that, we will have people using examples like this. This might actually be something we need/want to do in C++ instead of in the R client.
> The workaround is relatively simple: add the expression in a mutate, then group_by that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)