You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/07/19 14:32:00 UTC

[jira] [Updated] (ARROW-16703) [R] Refactor map_batches() so it can stream results

     [ https://issues.apache.org/jira/browse/ARROW-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-16703:
-----------------------------------
    Labels: pull-request-available  (was: )

> [R] Refactor map_batches() so it can stream results
> ---------------------------------------------------
>
>                 Key: ARROW-16703
>                 URL: https://issues.apache.org/jira/browse/ARROW-16703
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 8.0.0
>            Reporter: Will Jones
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of ARROW-15271, {{map_batches()}} was modified to return a {{RecordBatchReader}}, but the implementation collects all results as a list of record batches and then converts that to a reader. In theory, if we push the implementation down to C++, we should be able to make a proper streaming RBR.
> We won't know the schema ahead of time. We could optionally accept it, which would allow the function to be lazy. Or we could eagerly evaluate just the first batch to determine the schema. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)