You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maryann Xue (JIRA)" <ji...@apache.org> on 2017/10/12 17:43:00 UTC

[jira] [Created] (SPARK-22266) The same aggregate function was evaluated multiple times

Maryann Xue created SPARK-22266:
-----------------------------------

             Summary: The same aggregate function was evaluated multiple times
                 Key: SPARK-22266
                 URL: https://issues.apache.org/jira/browse/SPARK-22266
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Maryann Xue
            Priority: Minor


We should avoid the same aggregate function being evaluated more than once, and this is what has been stated in the code comment below (patterns.scala:206). However things didn't work as expected.
{code}
      // A single aggregate expression might appear multiple times in resultExpressions.
      // In order to avoid evaluating an individual aggregate function multiple times, we'll
      // build a set of the distinct aggregate expressions and build a function which can
      // be used to re-write expressions so that they reference the single copy of the
      // aggregate function which actually gets computed.
{code}
For example, the physical plan of
{code}
SELECT a, max(b+1), max(b+1) + 1 FROM testData2 GROUP BY a
{code}
was
{code}
HashAggregate(keys=[a#23], functions=[max((b#24 + 1)), max((b#24 + 1))], output=[a#23, max((b + 1))#223, (max((b + 1)) + 1)#224])
+- HashAggregate(keys=[a#23], functions=[partial_max((b#24 + 1)), partial_max((b#24 + 1))], output=[a#23, max#231, max#232])
   +- SerializeFromObject [assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true]).a AS a#23, assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true]).b AS b#24]
      +- Scan ExternalRDDScan[obj#22]
{code}
, where in each HashAggregate there were two identical aggregate functions "max(b#24 + 1)".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org