You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2021/07/22 13:46:00 UTC
[jira] [Created] (ARROW-13434) [R] group_by() with an expression
Jonathan Keane created ARROW-13434:
--------------------------------------
Summary: [R] group_by() with an expression
Key: ARROW-13434
URL: https://issues.apache.org/jira/browse/ARROW-13434
Project: Apache Arrow
Issue Type: Improvement
Components: R
Reporter: Jonathan Keane
With dplyr, when we group_by with an expression, a column is added to the dataframe that has the result of the expression.
{code}
> example_data %>%
+ group_by(int < 4) %>% collect()
# A tibble: 10 x 8
# Groups: int < 4 [3]
int dbl dbl2 lgl false chr fct `int < 4`
<int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>
1 1 1.1 5 TRUE FALSE a a TRUE
2 2 2.1 5 NA FALSE b b TRUE
3 3 3.1 5 TRUE FALSE c c TRUE
4 NA 4.1 5 FALSE FALSE d d NA
5 5 5.1 5 TRUE FALSE e NA FALSE
6 6 6.1 5 NA FALSE NA NA FALSE
7 7 7.1 5 NA FALSE g g FALSE
8 8 8.1 5 FALSE FALSE h h FALSE
9 9 NA 5 FALSE FALSE i i FALSE
10 10 10.1 5 NA FALSE j j FALSE
{code}
Arrow doesn't do this, however:
{code}
> Table$create(example_data) %>%
+ group_by(int < 4) %>% collect()
Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
dbl: double
dbl2: double
lgl: bool
false: bool
chr: string
fct: dictionary<values=string, indices=int8, ordered=0>
{code}
This isn't a big deal right now since grouped aggregations aren't (quite) here yet, but once we start having support for that, we will have people using examples like this. This might actually be something we need/want to do in C++ instead of in the R client.
The workaround is relatively simple: add the expression in a mutate, then group_by that.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)