You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/09/10 13:27:00 UTC

[jira] [Commented] (ARROW-13926) [R] Support mutate/summarize with implicit join

    [ https://issues.apache.org/jira/browse/ARROW-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413185#comment-17413185 ] 

Ian Cook commented on ARROW-13926:
----------------------------------

FWIW, this is perhaps better construed as an implicit *window function* (i.e. an OVER expression in SQL). When you do this type of operation with dbplyr, the SQL it generates uses an OVER expression:
{code:java}
mtcars_db <- dbplyr::memdb_frame(mtcars)
mtcars_db %>%
  group_by(cyl) %>%
  transmute(x = hp - mean(hp)) %>%
  show_query()

#> <SQL>
#> SELECT `hp` - AVG(`hp`) OVER (PARTITION BY `cyl`) AS `x`
#> FROM `dbplyr_002`{code}

> [R] Support mutate/summarize with implicit join
> -----------------------------------------------
>
>                 Key: ARROW-13926
>                 URL: https://issues.apache.org/jira/browse/ARROW-13926
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Neal Richardson
>            Priority: Major
>             Fix For: 6.0.0
>
>
> {code}
> mtcars %>%
>   group_by(cyl) %>%
>   mutate(x = hp - mean(hp)
> {code}
> essentially means something like
> {code}
> mtcars %>%
>   left_join(mtcars %>%
>     group_by(cyl) %>%
>     summarize(tmp = mean(hp))
>   ) %>%
>   mutate(x = hp - tmp) %>%
>   select(-tmp)
> {code}
> Apparently you can do the same inside summarize() too (though IDK if that's behavior we want to encourage). Once we can do joins, we can support these queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)