You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/09 19:52:30 UTC

[GitHub] [arrow-datafusion] james727 opened a new pull request #1534: Mark ARRAY_AGG(DISTINCT ...) not implemented

james727 opened a new pull request #1534:
URL: https://github.com/apache/arrow-datafusion/pull/1534


   # Which issue does this PR close?
   This partially addresses https://github.com/apache/arrow-datafusion/issues/1512
   
    # Rationale for this change
   Right now `array_agg(distinct ...)` doesn't work. The physical plan construction logic uses the non-distinct `array_agg` whether or not distinct was specified. Interestingly enough it still works correctly under certain conditions, due to the `SingleDistinctToGroupBy` optimizer rule. 
   
   As an example, consider the following queries:
   ```sql
   --- Works, since logical plan is rewritten with a subquery and non-distinct agg. Will return:
   --- +------------------------------------------+
   --- | ARRAYAGG(DISTINCT aggregate_test_100.c2) |
   --- +------------------------------------------+
   --- | [2, 3, 5, 1, 4]                          |
   --- +------------------------------------------+
   SELECT array_agg(DISTINCT c2) FROM aggregate_test_100;
   
   -- Returns incorrect results, since SingleDistinctToGroupBy optimizer rule does not apply:
   --- +--------------------------------------------------------------------------+
   --- | ARRAYAGG(DISTINCT aggregate_test_100.c2)      | COUNT(DISTINCT UInt8(1)) |
   --- +--------------------------------------------------------------------------+
   --- | [2, 5, 1, 1, 5, 4, 3, 3, 1, 4, 1, 4, 3, ...]  | 1                        |
   --- +--------------------------------------------------------------------------+
   SELECT array_agg(DISTINCT c2), count(distinct 1) FROM aggregate_test_100;
   ```
   
   After this change distinct array agg will throw an error when `SingleDistinctToGroupBy` does not apply.
   
   I'm planning on working on actually implementing distinct array_agg after this, but figured this was worth fixing for now.
   # What changes are included in this PR?
   This marks the aggregation not implemented, and adds a block for testing count/approx distinct/array agg with `distinct = true`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1534: Mark ARRAY_AGG(DISTINCT ...) not implemented

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1534:
URL: https://github.com/apache/arrow-datafusion/pull/1534#discussion_r782474086



##########
File path: datafusion/src/physical_plan/aggregates.rs
##########
@@ -187,11 +187,16 @@ pub fn create_aggregate_expr(
                 coerced_exprs_types[0].clone(),
             ))
         }
-        (AggregateFunction::ArrayAgg, _) => Arc::new(expressions::ArrayAgg::new(
+        (AggregateFunction::ArrayAgg, false) => Arc::new(expressions::ArrayAgg::new(
             coerced_phy_exprs[0].clone(),
             name,
             coerced_exprs_types[0].clone(),
         )),
+        (AggregateFunction::ArrayAgg, true) => {
+            return Err(DataFusionError::NotImplemented(
+                "ARRAY_AGG(DISTINCT) aggregations are not available".to_string(),

Review comment:
       ```suggestion
                   "ARRAY_AGG(DISTINCT) aggregations are not yet implemented".to_string(),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] james727 commented on pull request #1534: Mark ARRAY_AGG(DISTINCT ...) not implemented

Posted by GitBox <gi...@apache.org>.
james727 commented on pull request #1534:
URL: https://github.com/apache/arrow-datafusion/pull/1534#issuecomment-1010308085


   @alamb yep - they do! I think https://github.com/apache/arrow-datafusion/issues/1323 covers it but I can add another ticket and close the `set_agg` one if easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1534: Mark ARRAY_AGG(DISTINCT ...) not implemented

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1534:
URL: https://github.com/apache/arrow-datafusion/pull/1534


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org