You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/01/20 19:42:00 UTC

[jira] [Commented] (ARROW-15271) [R] Refactor do_exec_plan to return a RecordBatchReader

    [ https://issues.apache.org/jira/browse/ARROW-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479636#comment-17479636 ] 

Dewey Dunnington commented on ARROW-15271:
------------------------------------------

Just collecting a few related code comments here:

- https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/dataset-scan.R#L89
- https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/query-engine.R#L23-L26
- https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/dataset-scan.R#L184

Related is the ability to write files directly in a query plan using the {{WriteNode}} that was added in ARROW-13542. For example, there is a ticket open for using the {{WriteNode}} to write data sets (ARROW-14266). Writing files is useful but perhaps orthogonal to the ability to iterate over a {{RecordBatchReader}}, which is exemplified by the revamped {{map_batches()}} + vignette addition.

> [R] Refactor do_exec_plan to return a RecordBatchReader
> -------------------------------------------------------
>
>                 Key: ARROW-15271
>                 URL: https://issues.apache.org/jira/browse/ARROW-15271
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 6.0.1
>            Reporter: Will Jones
>            Priority: Major
>
> Right now [{{do_exec_plan}}|https://github.com/apache/arrow/blob/master/r/R/query-engine.R#L18] returns an Arrow table because {{head}}, {{tail}}, and {{arrange}} do. If ARROW-14289 is completed and similar work is done for {{arrange}}, we may be able to alter {{do_exec_plan}} to return a RBR instead.
> The {{map_batches()}} implementation (ARROW-14029) could benefit from this refactor. And it might make ARROW-15040 more useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)