You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2022/04/19 21:42:00 UTC

[jira] [Resolved] (ARROW-15517) [R] Use WriteNode in write_dataset()

     [ https://issues.apache.org/jira/browse/ARROW-15517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Keane resolved ARROW-15517.
------------------------------------
    Resolution: Fixed

Issue resolved by pull request 12316
[https://github.com/apache/arrow/pull/12316]

> [R] Use WriteNode in write_dataset()
> ------------------------------------
>
>                 Key: ARROW-15517
>                 URL: https://issues.apache.org/jira/browse/ARROW-15517
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Neal Richardson
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently, write_dataset uses the Scanner interface, which can't handle everything that the ExecPlan does. So if your arrow_dplyr_query contains things like aggregations or (more importantly) joins, you have to materialize the Table in memory before you can write to disk. The WriteNode added in ARROW-13542 is a special sink node that can be put at the end of an ExecPlan, so data should be able to stream to disk in more cases, and will benefit from future improvements to ExecPlan memory usage and spillover.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)