You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "yjshen (via GitHub)" <gi...@apache.org> on 2023/04/16 04:40:03 UTC

[GitHub] [arrow-datafusion] yjshen commented on a diff in pull request #6003: Row accumulator support update Scalar values

yjshen commented on code in PR #6003:
URL: https://github.com/apache/arrow-datafusion/pull/6003#discussion_r1167689030


##########
datafusion/core/src/physical_plan/aggregates/row_hash.rs:
##########
@@ -516,35 +573,50 @@ impl GroupedHashAggregateStream {
         for group_values in &group_by_values {
             let groups_with_rows =
                 self.update_group_state(group_values, &mut allocated)?;
-
-            // Collect all indices + offsets based on keys in this vec
-            let mut batch_indices: UInt32Builder = UInt32Builder::with_capacity(0);
-            let mut offsets = vec![0];
-            let mut offset_so_far = 0;
-            for &group_idx in groups_with_rows.iter() {
-                let indices = &self.aggr_state.group_states[group_idx].indices;
-                batch_indices.append_slice(indices);
-                offset_so_far += indices.len();
-                offsets.push(offset_so_far);
+            // Decide the accumulators update mode, use scalar value to update the accumulators when all of the conditions are meet:
+            // 1) The aggregation mode is Partial or Single
+            // 2) There is not normal aggregation expressions
+            // 3) The number of affected groups is high (entries in `aggr_state` have rows need to update). Usually the high cardinality case
+            if matches!(self.mode, AggregateMode::Partial | AggregateMode::Single)
+                && normal_aggr_input_values.is_empty()
+                && normal_filter_values.is_empty()
+                && groups_with_rows.len() >= batch.num_rows() / 10

Review Comment:
   This magic number `10` is used to identify high cardinality. Shall we make it configurable or document how this `10` is chosen?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org