You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/17 11:58:47 UTC

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #1459: Avoid send empty batches for Hash partitioning.

Dandandan commented on a change in pull request #1459:
URL: https://github.com/apache/arrow-datafusion/pull/1459#discussion_r771090481



##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -326,6 +326,11 @@ impl RepartitionExec {
                 Partitioning::Hash(exprs, _) => {
                     let timer = r_metrics.repart_time.timer();
                     let input_batch = result?;
+                    //avoid send empty batch to next plan
+                    if input_batch.num_rows() == 0 {

Review comment:
       Sorry if my issue description wasn't clear enough.
   
   I think a much more common thing might be that one output batch is empty *after partitioning*, somewhat later in this method.
   
   So after hashing / dividng them into partitions we could avoid creating / sending empty batches.

##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -326,6 +326,11 @@ impl RepartitionExec {
                 Partitioning::Hash(exprs, _) => {
                     let timer = r_metrics.repart_time.timer();
                     let input_batch = result?;
+                    //avoid send empty batch to next plan
+                    if input_batch.num_rows() == 0 {

Review comment:
       
   
   So after the line here
   
   ` for​ (num_output_partition, partition_indices) ​in ...`
   
   We can add a check and `continue` when partition_indices is empty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org