You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by maryannxue <gi...@git.apache.org> on 2017/10/13 05:51:32 UTC

[GitHub] spark pull request #19488: SPARK-22266 The same aggregate function was evalu...

GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/19488

    SPARK-22266 The same aggregate function was evaluated multiple times

    ## What changes were proposed in this pull request?
    
    To let the same aggregate function that appear multiple times in an Aggregate be evaluated only once, we need to deduplicate the aggregate expressions. The original code was trying to use a "distinct" call to get a set of aggregate expressions, but did not work, since the "distinct" did not compare semantic equality. And even if it did, further work should be done in result expression rewriting.
    In this PR, I changed the "set" to a map mapping the semantic identity of a aggregate expression to itself. Thus, later on, when rewriting result expressions (i.e., output expressions), the aggregate expression reference can be fixed.
    
    ## How was this patch tested?
    
    Added a new test in SQLQuerySuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-22266

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19488
    
----
commit 32bdf771fe70444ac23adf796702b5a26e085805
Author: maryannxue <ma...@gmail.com>
Date:   2017-10-13T05:31:10Z

    SPARK-22266 The same aggregate function was evaluated multiple times

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82722/testReport)** for PR 19488 at commit [`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145292679
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2677,4 +2678,29 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           checkAnswer(df, Row(1, 1, 1))
         }
       }
    +
    +  test("SRARK-22266: the same aggregate function was calculated multiple times") {
    +    val query = "SELECT a, max(b+1), max(b+1) + 1 FROM testData2 GROUP BY a"
    +    val df = sql(query)
    +    val physical = df.queryExecution.sparkPlan
    +    val aggregates = physical.collect {
    +      case agg : HashAggregateExec => agg
    +    }
    +    aggregates.foreach { agg =>
    +      assert (agg.aggregateExpressions.size == 1)
    +    }
    +    checkAnswer(df, Row(1, 3, 4) :: Row(2, 3, 4) :: Row(3, 3, 4) :: Nil)
    +  }
    +
    +  test("Non-deterministic aggregate functions should not be deduplicated") {
    +    val query = "SELECT a, first_value(b), first_value(b) + 1 FROM testData2 GROUP BY a"
    +    val df = sql(query)
    +    val physical = df.queryExecution.sparkPlan
    +    val aggregates = physical.collect {
    +      case agg : HashAggregateExec => agg
    +    }
    +    aggregates.foreach { agg =>
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144684245
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized expression as its
    --- End diff --
    
    good point, I think `first` is a nondeterministic function, SQL Server also document it: https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions (search for FIRST_VALUE).
    
    But here, although `first` is nondeterministic, I think `select first(a) - first(a) from ...` should return 0, as its result should be consistent in-query.
    
    CC @gatorsmile @hvanhovell 
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145025165
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized expression as its
    --- End diff --
    
    I did the check in Oracle. They return the same values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82860/testReport)** for PR 19488 at commit [`9c33a0c`](https://github.com/apache/spark/commit/9c33a0cf82aff058f117fd309643006118ea3b08).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82874/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144881001
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized expression as its
    --- End diff --
    
    Then I think the solution we have right now (in my latest ci) is what we need. But we need to refine the code comment there to further illustrate cases like "first_value". And in general we may need better documentation regarding the behavior of FIRST_VALUE and similar functions. Agree?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82821/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82877/testReport)** for PR 19488 at commit [`506c410`](https://github.com/apache/spark/commit/506c410eefb5131c6cf50e947ff01e0bda9c28fe).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    After I rethinking about it, non-determinstic expressions should not qualify this rule. Please remove it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82874/testReport)** for PR 19488 at commit [`506c410`](https://github.com/apache/spark/commit/506c410eefb5131c6cf50e947ff01e0bda9c28fe).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82860/testReport)** for PR 19488 at commit [`9c33a0c`](https://github.com/apache/spark/commit/9c33a0cf82aff058f117fd309643006118ea3b08).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82874/testReport)** for PR 19488 at commit [`506c410`](https://github.com/apache/spark/commit/506c410eefb5131c6cf50e947ff01e0bda9c28fe).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144651235
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized expression as its
    --- End diff --
    
    @cloud-fan Agreed. e.g. `first()` in Spark SQL is marked as nondeterministic right now (although for the case of `first()` I'd actually believe we should make it deterministic instead, but that's for another story)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19488


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82733/testReport)** for PR 19488 at commit [`1cca72b`](https://github.com/apache/spark/commit/1cca72ba34a9e29cf51b56f7e9c8840c75ffe5da).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    +1, before we figure out the semantic of nondeterministic aggregate functions, let's be careful here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    The point I was trying to make is, there are two types of non-deterministic aggregate functions: first being non-deterministic but not necessarily arbitrary, second being "deliberately" arbitrary, such as RAND. I think for the first category of non-deterministic functions, it's intuitive to expect that two occurrences of FIRST_VALUE(x) return the same value when user writes sth. like "SELECT FIRST(x), FIRST(x) + 1 FROM t".
    I'd propose two options:
    1. Take a step back and do not deduplicate non-deterministic functions as @cloud-fan first suggested.
    1. To have a way to distinguish arbitrary functions (for UDF as well) for other non-deterministic cases, and avoid deduplication only for arbitrary functions.
    Thoughts?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Thank you for the comment, @gatorsmile! Code comment updated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82821/testReport)** for PR 19488 at commit [`6159dcc`](https://github.com/apache/spark/commit/6159dcc04a75db5d1ed3e8f17714f90007245143).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145017429
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized expression as its
    --- End diff --
    
    That's not true, see Postgres:
    ```
    cloud=# select random(), random();
          random       |      random      
    -------------------+------------------
     0.360924747306854 | 0.90011026058346
    (1 row)
    
    cloud=# select random() - random();
          ?column?       
    ---------------------
     -0.0260558221489191
    (1 row)
    ```
    
    Seems we need a way to distinguish FIRST_VALUE and RANDOM. Both of them are nondeterministic, but `FIRST_VALUE` has consistent values in the same project/aggregate.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145281602
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,15 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    -      // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    +      // build a set of semantically distinct aggregate expressions and re-write expressions so
    +      // that they reference the single copy of the aggregate function which actually gets computed.
    +      val equivalentAggregateExpressions = new EquivalentExpressions
           val aggregateExpressions = resultExpressions.flatMap { expr =>
             expr.collect {
    -          case agg: AggregateExpression => agg
    +          case agg: AggregateExpression
    +            if (!equivalentAggregateExpressions.addExpr(agg)) => agg
    --- End diff --
    
    Add a comment above this line? `addExpr() returns false when the expression is non-deterministic`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82877/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    @cloud-fan Please see CheckAnalysis.scala:170. It checks the input expression of each aggregate expression to make sure that they are not another aggregate function and are deterministic.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82725/testReport)** for PR 19488 at commit [`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144983538
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized expression as its
    --- End diff --
    
    If we have the same non-deterministic functions in the same projection, they should return the same value.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82877/testReport)** for PR 19488 at commit [`506c410`](https://github.com/apache/spark/commit/506c410eefb5131c6cf50e947ff01e0bda9c28fe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82722/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144543377
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,15 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
           // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      val aggregateExpressionMap = resultExpressions.flatMap { expr =>
             expr.collect {
    -          case agg: AggregateExpression => agg
    +          case agg: AggregateExpression => (agg.canonicalized, agg.deterministic) -> agg
    --- End diff --
    
    Thank you, @cloud-fan! You've made a very good point! We should address non-deterministic correctly here. I have confirmed, though, that all aggregate functions should be deterministic (and fortunately Spark already guarantees this in the analysis stage). So I think we are safe to use "agg.canonicalized" without considering the "deterministic" field here, and so as not to be confusing. I have also added a code comment addressing this point in my latest commit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145281331
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -235,8 +236,16 @@ object PhysicalAggregation {
             expr.transformDown {
               case ae: AggregateExpression =>
                 // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    -            // so replace each aggregate expression by its corresponding attribute in the set:
    -            ae.resultAttribute
    +            // so replace each aggregate expression by its corresponding attribute in the set.
    +            // Note that non-deterministic aggregate expressions should not be deduplicated and
    +            // should be handled differently.
    +            val newAe = if (ae.deterministic) {
    --- End diff --
    
    nvm. It does not contain it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: SPARK-22266 The same aggregate function was evaluated mu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144483555
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,15 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
           // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      val aggregateExpressionMap = resultExpressions.flatMap { expr =>
             expr.collect {
    -          case agg: AggregateExpression => agg
    +          case agg: AggregateExpression => (agg.canonicalized, agg.deterministic) -> agg
    --- End diff --
    
    I think non-deterministic functions should not be deduplicated, e.g. `select max(a + rand()), max(a + rand()) from ...` should still eveluate 2 aggregate funcitions.
    
    my suggestion:
    ```
    val aggregateExpressions = resultExpressions.flatMap { expr =>
      expr.collect {
        case agg: AggregateExpression => agg
      }
    }
    val aggregateExpressionMap = aggregateExpressions.filter(_.deterministic).map { agg =>
      agg.canonicalized -> agg
    }.toMap
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145281119
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -235,8 +236,16 @@ object PhysicalAggregation {
             expr.transformDown {
               case ae: AggregateExpression =>
                 // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    -            // so replace each aggregate expression by its corresponding attribute in the set:
    -            ae.resultAttribute
    +            // so replace each aggregate expression by its corresponding attribute in the set.
    +            // Note that non-deterministic aggregate expressions should not be deduplicated and
    +            // should be handled differently.
    +            val newAe = if (ae.deterministic) {
    --- End diff --
    
    Do we still need to consider this? They will be different anyway, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82865/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145281452
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -235,8 +236,16 @@ object PhysicalAggregation {
             expr.transformDown {
               case ae: AggregateExpression =>
                 // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    -            // so replace each aggregate expression by its corresponding attribute in the set:
    -            ae.resultAttribute
    +            // so replace each aggregate expression by its corresponding attribute in the set.
    +            // Note that non-deterministic aggregate expressions should not be deduplicated and
    +            // should be handled differently.
    +            val newAe = if (ae.deterministic) {
    +              equivalentAggregateExpressions.getEquivalentExprs(ae)
    +                .head.asInstanceOf[AggregateExpression]
    +            } else {
    +              ae
    +            }
    --- End diff --
    
    -> `equiExprs.getEquivalentExprs(ae).headOption.getOrElse(ae).asInstanceOf[AggregateExpression]`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82821/testReport)** for PR 19488 at commit [`6159dcc`](https://github.com/apache/spark/commit/6159dcc04a75db5d1ed3e8f17714f90007245143).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82725/testReport)** for PR 19488 at commit [`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Maybe we can first try the option 1 first. 
    
    Did an offline discussion with @rednaxelafx . We can use `EquivalentExpressions`, which automatically treats `non-deterministic` expressions as different. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82733/testReport)** for PR 19488 at commit [`1cca72b`](https://github.com/apache/spark/commit/1cca72ba34a9e29cf51b56f7e9c8840c75ffe5da).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82865/testReport)** for PR 19488 at commit [`ece2062`](https://github.com/apache/spark/commit/ece206223822bde6e36d654a0d34593c405f8ffd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144656360
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized expression as its
    --- End diff --
    
    So we are talking about two types of "non-deterministic" here:
    1. Across-query non-deterministic but in-query deterministic, which means the same expression can produce different results over the same input between different runs, but should always give the same result within the same run. sum/avg on floating point numbers could be an example. Shall we make sure that "select sum(f) - sum(f) from t" always return 0? and similarly for "first()" maybe, should "select first_value(c) = first_value(c) over ..." always return true?
    It is important to define the behavior first, which will lead to opposite approaches on how to handle the "deterministic" field here.
    2. Across-query and in-query non-deterministic, which I don't think is allowed anyway.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82722/testReport)** for PR 19488 at commit [`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    @maryannxue Could you update the comments? 
    
    LGTM except the code comments. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144584455
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
           // A single aggregate expression might appear multiple times in resultExpressions.
           // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a function which can
    +      // build a map of the distinct aggregate expressions and build a function which can
           // be used to re-write expressions so that they reference the single copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized expression as its
    --- End diff --
    
    can you point out where the code is that guarantee this? I took a look at `CheckAnalysis` and seems Spark allow Aggregate to have non-deterministic expressions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    **[Test build #82865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82865/testReport)** for PR 19488 at commit [`ece2062`](https://github.com/apache/spark/commit/ece206223822bde6e36d654a0d34593c405f8ffd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    LGTM except one comment, thanks for working on it!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82860/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    add to whitelist


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82733/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145292665
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2677,4 +2678,29 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           checkAnswer(df, Row(1, 1, 1))
         }
       }
    +
    +  test("SRARK-22266: the same aggregate function was calculated multiple times") {
    +    val query = "SELECT a, max(b+1), max(b+1) + 1 FROM testData2 GROUP BY a"
    +    val df = sql(query)
    +    val physical = df.queryExecution.sparkPlan
    +    val aggregates = physical.collect {
    +      case agg : HashAggregateExec => agg
    +    }
    +    aggregates.foreach { agg =>
    --- End diff --
    
    nit:
    ```
    assert(aggregates.length == 1)
    assert(aggregates.head.aggregateExpressions.size == 1)
    ```
    If the implementation changed and we execute the query with other aggregate implementations, we should make sure this test is not ignored.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82725/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19488: SPARK-22266 The same aggregate function was evaluated mu...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    Update the title to `[SPARK-22266][SQL] The same aggregate function was evaluated multiple times`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org