You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/10/13 07:37:08 UTC
[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19488#discussion_r144483555
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
@@ -205,14 +205,15 @@ object PhysicalAggregation {
case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
// A single aggregate expression might appear multiple times in resultExpressions.
// In order to avoid evaluating an individual aggregate function multiple times, we'll
- // build a set of the distinct aggregate expressions and build a function which can
+ // build a map of the distinct aggregate expressions and build a function which can
// be used to re-write expressions so that they reference the single copy of the
// aggregate function which actually gets computed.
- val aggregateExpressions = resultExpressions.flatMap { expr =>
+ val aggregateExpressionMap = resultExpressions.flatMap { expr =>
expr.collect {
- case agg: AggregateExpression => agg
+ case agg: AggregateExpression => (agg.canonicalized, agg.deterministic) -> agg
--- End diff --
I think non-deterministic functions should not be deduplicated, e.g. `select max(a + rand()), max(a + rand()) from ...` should still eveluate 2 aggregate funcitions.
my suggestion:
```
val aggregateExpressions = resultExpressions.flatMap { expr =>
expr.collect {
case agg: AggregateExpression => agg
}
}
val aggregateExpressionMap = aggregateExpressions.filter(_.deterministic).map { agg =>
agg.canonicalized -> agg
}.toMap
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org