You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "rednaxelafx (via GitHub)" <gi...@apache.org> on 2023/03/21 05:21:15 UTC

[GitHub] [spark] rednaxelafx commented on a diff in pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

rednaxelafx commented on code in PR #40488:
URL: https://github.com/apache/spark/pull/40488#discussion_r1142905889


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala:
##########
@@ -296,12 +298,17 @@ object PhysicalAggregation {
       // build a set of semantically distinct aggregate expressions and re-write expressions so
       // that they reference the single copy of the aggregate function which actually gets computed.
       // Non-deterministic aggregate expressions are not deduplicated.
-      val equivalentAggregateExpressions = new EquivalentExpressions
+      val equivalentAggregateExpressions = mutable.Map.empty[Expression, Expression]
       val aggregateExpressions = resultExpressions.flatMap { expr =>
         expr.collect {
-          // addExpr() always returns false for non-deterministic expressions and do not add them.
           case a
-            if AggregateExpression.isAggregate(a) && !equivalentAggregateExpressions.addExpr(a) =>

Review Comment:
   The line of thought would be: adding the `supportedExpression` guard to `addExpr()` would cause performance regression, so let's just close our eyes and make the only remaining use of `addExpr` break away and do its own deduplication in the old logic without taking things like `NamedLambdaVariable` into account -- which is the way it's been for quite a few releases. This PR essentially inlines the `addExpr` path of the old `EquivalentExpressions` into `PhysicalAggregation` to recover what it used to do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org