You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/20 02:35:53 UTC

[GitHub] [arrow-datafusion] james727 commented on a change in pull request #1579: Implement ARRAY_AGG(DISTINCT ...)

james727 commented on a change in pull request #1579:
URL: https://github.com/apache/arrow-datafusion/pull/1579#discussion_r788302507



##########
File path: datafusion/src/physical_plan/expressions/distinct_expressions.rs
##########
@@ -705,4 +844,151 @@ mod tests {
 
         Ok(())
     }
+
+    // Ordering is unpredictable when using ARRAY_AGG(DISTINCT). Thus we cannot test by simply
+    // checking for equality of output, and it is difficult to sort since ORD is not implemented
+    // for ScalarValue. Thus we check for equality via the following:
+    //   1. `expected` and `actual` have the same number of elements.
+    //   2. `expected` contains no duplicates.
+    //   3. `expected` and `actual` contain the same unique elements.
+    fn check_distinct_array_agg(
+        input: ArrayRef,
+        expected: ScalarValue,
+        datatype: DataType,
+    ) -> Result<()> {
+        let schema = Schema::new(vec![Field::new("a", datatype.clone(), false)]);
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![input])?;
+
+        let agg = Arc::new(DistinctArrayAgg::new(
+            col("a", &schema)?,
+            "bla".to_string(),
+            datatype,
+        ));
+        let actual = aggregate(&batch, agg)?;
+
+        match (expected, actual) {
+            (ScalarValue::List(Some(e), _), ScalarValue::List(Some(a), _)) => {
+                // Check that the inputs are the same length.
+                assert_eq!(e.len(), a.len());
+
+                let h1: HashSet<ScalarValue> = HashSet::from_iter(e.clone().into_iter());
+                let h2: HashSet<ScalarValue> = HashSet::from_iter(a.into_iter());
+
+                // Check that e's elements are unique.
+                assert_eq!(h1.len(), e.len());
+
+                // Check that a contains the same unique elements as e.
+                assert_eq!(h1, h2);

Review comment:
       Thank you! This is much nicer - I noticed the `PartialOrd` implementation but was unsure of how to actually use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org