You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/09/07 13:31:00 UTC

[jira] [Commented] (ARROW-9927) [R] Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package

    [ https://issues.apache.org/jira/browse/ARROW-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191709#comment-17191709 ] 

Wes McKinney commented on ARROW-9927:
-------------------------------------

In short, easier said than done. However, it would be good to have a tracking JIRA for dplyr feature coverage. We have issues covering much of the essential C++ query engine work but no idea on timeline when individuals will be able to complete the work. 

> [R] Add dplyr group_by, summarise and mutate support in function open_dataset R arrow package  
> -----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-9927
>                 URL: https://issues.apache.org/jira/browse/ARROW-9927
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 1.0.1
>            Reporter: Pal
>            Priority: Major
>
> Hi, 
>  
> The open_dataset() function in the R arrow package already includes the support for dplyr filter, select and rename functions. However, it would be a huge improvement if it also could include other functions such as group_by, summarise and mutate before calling collect(). Is there any idea or projet going on to do so ? Would be it possible to include those features (compatible also with dplyr version < 1) ?
> Many thanks for this excellent job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)