You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Remi Dettai (Jira)" <ji...@apache.org> on 2020/10/22 12:00:00 UTC

[jira] [Comment Edited] (ARROW-10368) [Rust][DataFusion] Make InMemoryScan work on iterators of RecordBatch

    [ https://issues.apache.org/jira/browse/ARROW-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218952#comment-17218952 ] 

Remi Dettai edited comment on ARROW-10368 at 10/22/20, 11:59 AM:
-----------------------------------------------------------------

[~andygrove] If this change seems reasonable to you, I can give it a try! I wonder if we could not go one step further and try to add a new logical plan that makes it possible to add custom sources. This would make it possible to also have access to the projection info...


was (Author: rdettai):
[~andygrove] If this change seems reasonable to you, I can give it a try!

> [Rust][DataFusion] Make InMemoryScan work on iterators of RecordBatch
> ---------------------------------------------------------------------
>
>                 Key: ARROW-10368
>                 URL: https://issues.apache.org/jira/browse/ARROW-10368
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust, Rust - DataFusion
>            Reporter: Remi Dettai
>            Priority: Major
>
> Currently, InMemoryScan takes a Vec<Vec<RecordBatch>> as data.
> - the outer Vec separates the partitions
> - the inner Vec contains all the RecordBatch for one partition
> The inner Vec is then converted into an iterator when the LogicalPlan is turned into a PhysicalPlan.
> I suggest that InMemoryScan should take Vec<Iter<RecordBatch>>.  This would make it possible to plug custom Scan implementations into datafusion without the need to read them entirely into memory. It would still work pretty seamlessly with Vec<Vec<RecordBatch>> that would just need a to be converted with data.map(|x| x.iter()) first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)