You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/21 07:22:24 UTC

[GitHub] [spark] beliefer commented on a diff in pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

beliefer commented on code in PR #37825:
URL: https://github.com/apache/spark/pull/37825#discussion_r976133494


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala:
##########
@@ -413,6 +426,43 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] {
     }}
   }
 
+  private def reduceDistinctAggregateGroups(a: Aggregate): Aggregate = {
+    val aggExpressions = collectAggregateExprs(a)
+    val distinctAggs = aggExpressions.filter(_.isDistinct)
+
+    val funcChildren = distinctAggs.flatMap { e =>
+      e.aggregateFunction.children.filter(!_.foldable)
+    }
+
+    // For each function child, find the first instance that is semantically equivalent.
+    // E.g., assume funcChildren is the following three expressions:
+    //   [('a + 1), (1 + 'a), 'b]
+    // then we want the map to be:
+    //   Map(('a + 1) -> ('a + 1), (1 + 'a) -> ('a + 1), 'b -> 'b)
+    // That is, both ('a + 1) and (1 + 'a) map to ('a + 1).
+    // This is an n^2 operation, where n is the number of distinct aggregate children, but it
+    // happens only once every time this rule is called.
+    val funcChildrenLookup = funcChildren.map { e =>
+      (e, funcChildren.find(fc => e.semanticEquals(fc)).getOrElse(e))

Review Comment:
   It seems the code lead to find out itself.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala:
##########
@@ -413,6 +426,43 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] {
     }}
   }
 
+  private def reduceDistinctAggregateGroups(a: Aggregate): Aggregate = {
+    val aggExpressions = collectAggregateExprs(a)
+    val distinctAggs = aggExpressions.filter(_.isDistinct)

Review Comment:
   These code duplicated, could you reuse one of them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org