You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/14 16:31:37 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #6669: Minor: Add tests for User Defined Aggregate functions

alamb opened a new pull request, #6669:
URL: https://github.com/apache/arrow-datafusion/pull/6669

   # Which issue does this PR close?
   
   This is part of https://github.com/apache/arrow-datafusion/issues/6611
   
   # Rationale for this change
   
   I am adding support for sliding windows in user defined aggregates and I need some way to test it.  
   
   The actual code change is relatively small but it takes non trivial effort to test, so I wanted to add the tests first so the actual code change is clearer. 
   
   # What changes are included in this PR?
   
   Add more tests to `user_defined_aggregates.rs` for a user defined aggregates. 
   
   # Are these changes tested?
   it is only tests
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] stuartcarnie commented on a diff in pull request #6669: Minor: Add tests for User Defined Aggregate functions

Posted by "stuartcarnie (via GitHub)" <gi...@apache.org>.
stuartcarnie commented on code in PR #6669:
URL: https://github.com/apache/arrow-datafusion/pull/6669#discussion_r1230075514


##########
datafusion/core/tests/user_defined_aggregates.rs:
##########
@@ -82,56 +150,151 @@ async fn execute(ctx: &SessionContext, sql: &str) -> Vec<RecordBatch> {
 ///  3.0  | 1970-01-01T00:00:00.000003
 ///  2.0  | 1970-01-01T00:00:00.000002
 ///  1.0  | 1970-01-01T00:00:00.000004
+///  5.0  | 1970-01-01T00:00:00.000005
+///  5.0  | 1970-01-01T00:00:00.000005
 /// ```
-fn udaf_struct_context() -> SessionContext {
-    let value: Float64Array = vec![3.0, 2.0, 1.0].into_iter().map(Some).collect();
-    let time = TimestampNanosecondArray::from(vec![3000, 2000, 4000]);
+struct TestContext {
+    ctx: SessionContext,
+    counters: Arc<TestCounters>,
+}
+
+impl TestContext {
+    fn new() -> Self {
+        let counters = Arc::new(TestCounters::new());
+
+        let value = Float64Array::from(vec![3.0, 2.0, 1.0, 5.0, 5.0]);
+        let time = TimestampNanosecondArray::from(vec![3000, 2000, 4000, 5000, 5000]);
+
+        let batch = RecordBatch::try_from_iter(vec![
+            ("value", Arc::new(value) as _),
+            ("time", Arc::new(time) as _),
+        ])
+        .unwrap();
 
-    let batch = RecordBatch::try_from_iter(vec![
-        ("value", Arc::new(value) as _),
-        ("time", Arc::new(time) as _),
-    ])
-    .unwrap();
+        let mut ctx = SessionContext::new();
 
-    let mut ctx = SessionContext::new();
-    ctx.register_batch("t", batch).unwrap();
+        ctx.register_batch("t", batch).unwrap();
 
-    // Tell datafusion about the "first" function
-    register_aggregate(&mut ctx);
+        // Tell DataFusion about the "first" function
+        FirstSelector::register(&mut ctx);
+        // Tell DataFusion about the "time_sum" function
+        TimeSum::register(&mut ctx, Arc::clone(&counters));
 
-    ctx
+        Self { ctx, counters }
+    }
+}
+
+#[derive(Debug, Default)]
+struct TestCounters {
+    /// was update_batch called?
+    update_batch: AtomicBool,
+    /// was retract batch called?
+    retract_batch: AtomicBool,

Review Comment:
   Nice way to test expectations 👌🏻



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #6669: Minor: Add tests for User Defined Aggregate functions

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #6669:
URL: https://github.com/apache/arrow-datafusion/pull/6669


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] viirya commented on a diff in pull request #6669: Minor: Add tests for User Defined Aggregate functions

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #6669:
URL: https://github.com/apache/arrow-datafusion/pull/6669#discussion_r1230110166


##########
datafusion/core/tests/user_defined_aggregates.rs:
##########
@@ -82,56 +150,151 @@ async fn execute(ctx: &SessionContext, sql: &str) -> Vec<RecordBatch> {
 ///  3.0  | 1970-01-01T00:00:00.000003
 ///  2.0  | 1970-01-01T00:00:00.000002
 ///  1.0  | 1970-01-01T00:00:00.000004
+///  5.0  | 1970-01-01T00:00:00.000005
+///  5.0  | 1970-01-01T00:00:00.000005
 /// ```
-fn udaf_struct_context() -> SessionContext {
-    let value: Float64Array = vec![3.0, 2.0, 1.0].into_iter().map(Some).collect();
-    let time = TimestampNanosecondArray::from(vec![3000, 2000, 4000]);
+struct TestContext {
+    ctx: SessionContext,
+    counters: Arc<TestCounters>,
+}
+
+impl TestContext {
+    fn new() -> Self {
+        let counters = Arc::new(TestCounters::new());
+
+        let value = Float64Array::from(vec![3.0, 2.0, 1.0, 5.0, 5.0]);
+        let time = TimestampNanosecondArray::from(vec![3000, 2000, 4000, 5000, 5000]);
+
+        let batch = RecordBatch::try_from_iter(vec![
+            ("value", Arc::new(value) as _),
+            ("time", Arc::new(time) as _),
+        ])
+        .unwrap();
 
-    let batch = RecordBatch::try_from_iter(vec![
-        ("value", Arc::new(value) as _),
-        ("time", Arc::new(time) as _),
-    ])
-    .unwrap();
+        let mut ctx = SessionContext::new();
 
-    let mut ctx = SessionContext::new();
-    ctx.register_batch("t", batch).unwrap();
+        ctx.register_batch("t", batch).unwrap();
 
-    // Tell datafusion about the "first" function
-    register_aggregate(&mut ctx);
+        // Tell DataFusion about the "first" function
+        FirstSelector::register(&mut ctx);
+        // Tell DataFusion about the "time_sum" function
+        TimeSum::register(&mut ctx, Arc::clone(&counters));
 
-    ctx
+        Self { ctx, counters }
+    }
+}
+
+#[derive(Debug, Default)]
+struct TestCounters {
+    /// was update_batch called?
+    update_batch: AtomicBool,
+    /// was retract batch called?
+    retract_batch: AtomicBool,
 }
 
-fn register_aggregate(ctx: &mut SessionContext) {
-    let return_type = Arc::new(FirstSelector::output_datatype());
-    let state_type = Arc::new(FirstSelector::state_datatypes());
+impl TestCounters {
+    fn new() -> Self {
+        Default::default()
+    }
+
+    /// Has `update_batch` been called?
+    fn update_batch(&self) -> bool {
+        self.update_batch.load(Ordering::SeqCst)
+    }
+
+    /// Has `update_batch` been called?

Review Comment:
   ```suggestion
       /// Has `retract_batch` been called?
   ```



##########
datafusion/core/tests/user_defined_aggregates.rs:
##########
@@ -82,56 +150,151 @@ async fn execute(ctx: &SessionContext, sql: &str) -> Vec<RecordBatch> {
 ///  3.0  | 1970-01-01T00:00:00.000003
 ///  2.0  | 1970-01-01T00:00:00.000002
 ///  1.0  | 1970-01-01T00:00:00.000004
+///  5.0  | 1970-01-01T00:00:00.000005
+///  5.0  | 1970-01-01T00:00:00.000005
 /// ```
-fn udaf_struct_context() -> SessionContext {
-    let value: Float64Array = vec![3.0, 2.0, 1.0].into_iter().map(Some).collect();
-    let time = TimestampNanosecondArray::from(vec![3000, 2000, 4000]);
+struct TestContext {
+    ctx: SessionContext,
+    counters: Arc<TestCounters>,
+}
+
+impl TestContext {
+    fn new() -> Self {
+        let counters = Arc::new(TestCounters::new());
+
+        let value = Float64Array::from(vec![3.0, 2.0, 1.0, 5.0, 5.0]);
+        let time = TimestampNanosecondArray::from(vec![3000, 2000, 4000, 5000, 5000]);
+
+        let batch = RecordBatch::try_from_iter(vec![
+            ("value", Arc::new(value) as _),
+            ("time", Arc::new(time) as _),
+        ])
+        .unwrap();
 
-    let batch = RecordBatch::try_from_iter(vec![
-        ("value", Arc::new(value) as _),
-        ("time", Arc::new(time) as _),
-    ])
-    .unwrap();
+        let mut ctx = SessionContext::new();
 
-    let mut ctx = SessionContext::new();
-    ctx.register_batch("t", batch).unwrap();
+        ctx.register_batch("t", batch).unwrap();
 
-    // Tell datafusion about the "first" function
-    register_aggregate(&mut ctx);
+        // Tell DataFusion about the "first" function
+        FirstSelector::register(&mut ctx);
+        // Tell DataFusion about the "time_sum" function
+        TimeSum::register(&mut ctx, Arc::clone(&counters));
 
-    ctx
+        Self { ctx, counters }
+    }
+}
+
+#[derive(Debug, Default)]
+struct TestCounters {
+    /// was update_batch called?
+    update_batch: AtomicBool,
+    /// was retract batch called?

Review Comment:
   ```suggestion
       /// was retract_batch called?
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6669: Minor: Add tests for User Defined Aggregate functions

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6669:
URL: https://github.com/apache/arrow-datafusion/pull/6669#issuecomment-1591926269

   > Very nice, @alamb – can I help you by submitting my branch as a PR to this branch?
   
   Thanks for the offer @stuartcarnie ! I already incorporated the code from your branch into https://github.com/apache/arrow-datafusion/pull/6671 (which I will have up as a PR shortly)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org