You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nigel McKernan (Jira)" <ji...@apache.org> on 2022/07/20 19:43:00 UTC

[jira] [Comment Edited] (ARROW-8813) [R] Implementing tidyr interface

    [ https://issues.apache.org/jira/browse/ARROW-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569154#comment-17569154 ] 

Nigel McKernan edited comment on ARROW-8813 at 7/20/22 7:42 PM:
----------------------------------------------------------------

The issue [~domiden] references was committed into {{tidyr}}  1.1.0 back in May of 2020, as you can see [here|https://github.com/tidyverse/tidyr/releases#:~:text=pivot_longer()%20and%20pivot_wider()%20are%20now%20generic%20so%20implementations%0Acan%20be%20provided%20for%20objects%20other%20than%20data%20frames], more than 2 years ago.

 

Would it be possible now to incorporate some {{tidyr}} methods that have been converted to generics into {{{}arrow{}}}?

EDIT: As well, the {{nest()}} generic is now [lazily-evaluated|https://github.com/tidyverse/tidyr/releases#:~:text=The%20nest()%20generic%20now%20avoids%20computing%20on%20.data%2C%20making%20it%20more%0Acompatible%20with%20lazy%20tibbles], making it easier to do remote operations.


was (Author: JIRAUSER293150):
The issue [~domiden] references was committed into {{tidyr}}  1.1.0 back in May of 2020, as you can see [here|https://github.com/tidyverse/tidyr/releases#:~:text=pivot_longer()%20and%20pivot_wider()%20are%20now%20generic%20so%20implementations%0Acan%20be%20provided%20for%20objects%20other%20than%20data%20frames], more than 2 years ago.

 

Would it be possible now to incorporate some {{tidyr}} methods that have been converted to generics into {{{}arrow{}}}?

> [R] Implementing tidyr interface
> --------------------------------
>
>                 Key: ARROW-8813
>                 URL: https://issues.apache.org/jira/browse/ARROW-8813
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dominic Dennenmoser
>            Priority: Major
>              Labels: extension, feature, improvement
>
> I think it would be reasonable to implement an interface to the {{tidyr}} package. The implementation would allow to lazily process ArrowTables before put it back into the memory. However, currently you need to collect the table first before applying tidyr methods. The following code chunk shows an example routine:
> {code:r}
> library(magrittr)
> arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE) 
> nested_df <-
>    arrow_table %>%
>    dplyr::select(ID, 4:7, Value) %>%
>    dplyr::filter(Value >= 5) %>%
>    dplyr::group_by(ID) %>%
>    dplyr::collect() %>%
>    tidyr::nest(){code}
> The main focus might be the following three methods:
>  * {{tidyr::[un]nest()}},
>  * {{tidyr::pivot_[longer|wider]()}}, and
>  * {{tidyr::seperate()}}.
> I suppose the last two can be fairly quickly implemented, but {{tidyr::nest()}} and {{tidyr::unnest()}} cannot be implement before conversion to List<Struct> will be accessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)