You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/27 22:20:24 UTC

[GitHub] [arrow] alamb commented on a change in pull request #9271: ARROW-11300: [Rust][DataFusion] Further performance improvements on hash aggregation with small groups

alamb commented on a change in pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#discussion_r565673085



##########
File path: rust/datafusion/src/physical_plan/hash_aggregate.rs
##########
@@ -322,48 +325,76 @@ fn group_aggregate_batch(
             });
     }
 
+    // Collect all indices + offsets based on keys in this vec
+    let mut batch_indices: UInt32Builder = UInt32Builder::new(0);

Review comment:
       I wonder if you could get any additional performance by using the knowledge of the size of `batch_keys`
   
   ```suggestion
       let mut batch_indices: UInt32Builder = UInt32Builder::new(batch_keys.len());
   ```

##########
File path: rust/datafusion/src/physical_plan/hash_aggregate.rs
##########
@@ -322,48 +325,76 @@ fn group_aggregate_batch(
             });
     }
 
+    // Collect all indices + offsets based on keys in this vec
+    let mut batch_indices: UInt32Builder = UInt32Builder::new(0);

Review comment:
       Or maybe you have to scale it by the number of accumulators too

##########
File path: rust/datafusion/src/physical_plan/hash_aggregate.rs
##########
@@ -322,48 +325,76 @@ fn group_aggregate_batch(
             });
     }
 
+    // Collect all indices + offsets based on keys in this vec
+    let mut batch_indices: UInt32Builder = UInt32Builder::new(0);
+    let mut offsets = vec![0];

Review comment:
       ```suggestion
       let mut offsets = Vec::with_capacity(batch_keys.len());
       offsets.push(0);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org