You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/22 06:26:07 UTC

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8503: ARROW-10366: [Rust] [DataFusion] Remove collect from within threads on merge

jorgecarleitao commented on a change in pull request #8503:
URL: https://github.com/apache/arrow/pull/8503#discussion_r509907817



##########
File path: rust/datafusion/src/physical_plan/merge.rs
##########
@@ -103,37 +105,53 @@ impl ExecutionPlan for MergeExec {
                 self.input.execute(0).await
             }
             _ => {
-                let tasks = (0..input_partitions).map(|part_i| {
+                // todo: buffer size should be configurable or dependent of metrics
+                let (sender, receiver) = mpsc::channel::<ArrowResult<RecordBatch>>(1);
+
+                // spawn independent tasks whose resulting streams (of batches)
+                // are sent to the channel for consumption.
+                (0..input_partitions).for_each(|part_i| {
                     let input = self.input.clone();
+                    let mut sender = sender.clone();
                     tokio::spawn(async move {

Review comment:
       we need to join the handles of this one or the main thread may finish before the spawn task does.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org