You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/13 15:04:51 UTC

[GitHub] [arrow] jorgecarleitao commented on pull request #7729: ARROW-9420 [Rust][DataFusion] Added repartion physical plan

jorgecarleitao commented on pull request #7729:
URL: https://github.com/apache/arrow/pull/7729#issuecomment-657614516


   One thing that is not clear to me yet is the idiom to handle RecordBatch and partitions. My understanding is that a Partition can be executed in parallel (thread), but a RecordBatch is generally executed on the same thread, i.e. we normally loop through each RecordBatch using the same thread.
   
   Is the goal of RecordBatch to split a partition in smaller chunks of data to avoid too much memory usage?
   
   In this PR, I have not merged all the RecordBatches within a given partition in a single batch, and instead kept them separate. I am not sure if this is the correct approach here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org