You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/09/07 13:31:00 UTC
[jira] [Commented] (ARROW-9927) [R] Add dplyr group_by, summarise
and mutate support in function open_dataset R arrow package
[ https://issues.apache.org/jira/browse/ARROW-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191709#comment-17191709 ]
Wes McKinney commented on ARROW-9927:
-------------------------------------
In short, easier said than done. However, it would be good to have a tracking JIRA for dplyr feature coverage. We have issues covering much of the essential C++ query engine work but no idea on timeline when individuals will be able to complete the work.
> [R] Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package
> -----------------------------------------------------------------------------------------------
>
> Key: ARROW-9927
> URL: https://issues.apache.org/jira/browse/ARROW-9927
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 1.0.1
> Reporter: Pal
> Priority: Major
>
> Hi,
>
> The open_dataset() function in the R arrow package already includes the support for dplyr filter, select and rename functions. However, it would be a huge improvement if it also could include other functions such as group_by, summarise and mutate before calling collect(). Is there any idea or projet going on to do so ? Would be it possible to include those features (compatible also with dplyr version < 1) ?
> Many thanks for this excellent job.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)