You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mustafasrepo (via GitHub)" <gi...@apache.org> on 2023/06/01 07:15:53 UTC

[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #6482: Resolve contradictory requirements by conversion of ordering sensitive aggregators

mustafasrepo commented on code in PR #6482:
URL: https://github.com/apache/arrow-datafusion/pull/6482#discussion_r1212697239


##########
datafusion/core/tests/sqllogictests/test_files/groupby.slt:
##########
@@ -2384,3 +2384,197 @@ SELECT s.country, ARRAY_AGG(s.amount ORDER BY s.country DESC, s.amount DESC) AS
 FRA [200.0, 50.0] 250
 GRC [80.0, 30.0] 110
 TUR [100.0, 75.0] 175
+
+# test_reverse_aggregate_expr
+# Some of the Aggregators can be reversed, by this way we can still run aggregators
+# that have contradictory requirements at first glance.
+query TT
+EXPLAIN SELECT country, ARRAY_AGG(amount ORDER BY amount DESC) AS amounts,
+  FIRST_VALUE(amount ORDER BY amount ASC) AS fv1,
+  LAST_VALUE(amount ORDER BY amount DESC) AS fv2
+  FROM sales_global
+  GROUP BY country
+----
+logical_plan
+Projection: sales_global.country, ARRAY_AGG(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST] AS amounts, FIRST_VALUE(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST] AS fv1, LAST_VALUE(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST] AS fv2
+--Aggregate: groupBy=[[sales_global.country]], aggr=[[ARRAY_AGG(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST], FIRST_VALUE(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST], LAST_VALUE(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]]]
+----TableScan: sales_global projection=[country, amount]
+physical_plan
+ProjectionExec: expr=[country@0 as country, ARRAY_AGG(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]@1 as amounts, FIRST_VALUE(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST]@2 as fv1, LAST_VALUE(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]@3 as fv2]
+--AggregateExec: mode=Single, gby=[country@0 as country], aggr=[ARRAY_AGG(sales_global.amount), FIRST_VALUE(sales_global.amount), LAST_VALUE(sales_global.amount)]

Review Comment:
   > > in terms of behavior. Hence by converting query above to its equivalent form below
   > 
   > ```sql
   > SELECT country, LAST_VALUE(amount ORDER BY amount DESC) AS fv1,
   >   LAST_VALUE(amount ORDER BY amount DESC) AS fv2,
   >   ARRAY_AGG(amount ORDER BY amount ASC) AS amounts
   >   FROM sales_global
   >   GROUP BY country
   > ```
   > 
   > Did you mean like this (change `LAST_VALUE` to `FIRST_VALUE` and switch the order):
   > 
   > ```sql
   > SELECT country, LAST_VALUE(amount ORDER BY amount DESC) AS fv1,
   >   FIRST_VALUE(amount ORDER BY amount ASC) AS fv2,
   >   ARRAY_AGG(amount ORDER BY amount ASC) AS amounts
   >   FROM sales_global
   >   GROUP BY country
   > ```
   
   Sorry, I have updated the example in the PR body (changed requirement of `ARRAY_AGG` from `amount ASC` to `amount DESC`). In the equivalent form all requirements, of aggregate functions are same. Hence we have resolved conflicting requirements by conversion. The conversion takes place between `FIRST_VALUE` and `LAST_VALUE` and ordering requirement is switched as you say. For instance `FIRST_VALUE(a ORDER BY b DESC)` is equivalent to `LAST_VALUE(a ORDER BY b ASC)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org