You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/06 06:37:58 UTC

[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #68: Experimenting with arrow2

jorgecarleitao commented on a change in pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r646083187



##########
File path: datafusion/src/physical_plan/hash_aggregate.rs
##########
@@ -339,6 +325,36 @@ pin_project! {
     }
 }
 
+fn hash_(group_values: &[ArrayRef]) -> Result<MutableBuffer<u64>> {
+    // compute the hashes
+    // todo: we should be able to use `MutableBuffer<u64>` to compute the hash and ^ them without
+    // allocating all the hashes before ^ them
+    let hashes = group_values
+        .iter()
+        .map(|x| {
+            let a = match x.data_type() {
+                DataType::Dictionary(_, d) => {
+                    // todo: think about how to perform this more efficiently
+                    // * first hash, then unpack
+                    // * do not unpack at all, and instead figure out a way to leverage dictionary-encoded.
+                    let unpacked = arrow2::compute::cast::cast(x.as_ref(), d)?;
+                    arrow2::compute::hash::hash(unpacked.as_ref())
+                }
+                _ => arrow2::compute::hash::hash(x.as_ref()),
+            };
+            Ok(a?)
+        })
+        .collect::<Result<Vec<_>>>()?;
+    let hash = MutableBuffer::<u64>::from(hashes[0].values());
+
+    Ok(hashes.iter().skip(1).fold(hash, |mut acc, x| {
+        acc.iter_mut()
+            .zip(x.values().iter())
+            .for_each(|(hash, other)| *hash ^= other);

Review comment:
       Thanks @Dandandan , this is now migrated; it uses the same that `hash_join` does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org