You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/09/10 13:27:00 UTC
[jira] [Commented] (ARROW-13926) [R] Support mutate/summarize with
implicit join
[ https://issues.apache.org/jira/browse/ARROW-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413185#comment-17413185 ]
Ian Cook commented on ARROW-13926:
----------------------------------
FWIW, this is perhaps better construed as an implicit *window function* (i.e. an OVER expression in SQL). When you do this type of operation with dbplyr, the SQL it generates uses an OVER expression:
{code:java}
mtcars_db <- dbplyr::memdb_frame(mtcars)
mtcars_db %>%
group_by(cyl) %>%
transmute(x = hp - mean(hp)) %>%
show_query()
#> <SQL>
#> SELECT `hp` - AVG(`hp`) OVER (PARTITION BY `cyl`) AS `x`
#> FROM `dbplyr_002`{code}
> [R] Support mutate/summarize with implicit join
> -----------------------------------------------
>
> Key: ARROW-13926
> URL: https://issues.apache.org/jira/browse/ARROW-13926
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Neal Richardson
> Assignee: Neal Richardson
> Priority: Major
> Fix For: 6.0.0
>
>
> {code}
> mtcars %>%
> group_by(cyl) %>%
> mutate(x = hp - mean(hp)
> {code}
> essentially means something like
> {code}
> mtcars %>%
> left_join(mtcars %>%
> group_by(cyl) %>%
> summarize(tmp = mean(hp))
> ) %>%
> mutate(x = hp - tmp) %>%
> select(-tmp)
> {code}
> Apparently you can do the same inside summarize() too (though IDK if that's behavior we want to encourage). Once we can do joins, we can support these queries.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)