You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/05/15 14:48:00 UTC

[jira] [Commented] (ARROW-8813) [R] Implementing tidyr interface

    [ https://issues.apache.org/jira/browse/ARROW-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108367#comment-17108367 ] 

Neal Richardson commented on ARROW-8813:
----------------------------------------

If you wanted to explore this, one challenge I see is that pivot_longer and pivot_wider aren't generics, so you can't just make arrow methods for them.

> [R] Implementing tidyr interface
> --------------------------------
>
>                 Key: ARROW-8813
>                 URL: https://issues.apache.org/jira/browse/ARROW-8813
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dominic Dennenmoser
>            Priority: Major
>              Labels: extension, feature, improvement
>
> I think it would be reasonable to implement an interface to the {{tidyr}} package. The implementation would allow to lazily process ArrowTables before put it back into the memory. However, currently you need to collect the table first before applying tidyr methods. The following code chunk shows an example routine:
> {code:r}
> library(magrittr)
> arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE) nested_df <-
>    arrow_table %>%
>    dplyr::select(ID, 4:7, Value) %>%
>    dplyr::filter(Value >= 5) %>%
>    dplyr::group_by(ID) %>%
>    dplyr::collect() %>%
>    tidyr::nest(){code}
> The main focus might be the following three methods:
>  * {{tidyr::[un]nest()}},
>  * {{tidyr::pivot_[longer|wider]()}}, and
>  * {{tidyr::seperate()}}.
> I suppose the last two can be fairly quickly implemented, but {{tidyr::nest()}} and {{tidyr::unnest()}} cannot be implement before conversion to List<Struct> will be accessible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)