You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yhuai <gi...@git.apache.org> on 2015/11/09 05:48:51 UTC

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/9556

    [SPARK-9830] [SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

    https://issues.apache.org/jira/browse/SPARK-9830
    
    This PR removes `AggregateExpression1`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark removeAgg1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9556.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9556
    
----
commit 96e3e369759067c863ae6e0065eb0b1804dbfb5d
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-06T23:35:20Z

    Remove agg1 from the planner.

commit 9921733491930581aefe8f65bf9d5d8b995f99e5
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-07T06:21:27Z

    Add needed constructors.

commit f126520283f127b610f6255275de98e5c4ce1d37
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-07T06:26:23Z

    Delete old agg functions.

commit 5a177ec92726274abe3c5acec01e29d7b0f688d4
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-07T06:40:58Z

    Update function registry and DataFrame functions.

commit f5a389faf073dbcb5bc13fda81f993a63498e448
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-07T06:53:31Z

    Update analyzer.

commit 3d06cfd6c471fb5976b9124083b6d8b2667b411a
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-07T06:56:05Z

    Update optimizer.

commit 7e77628e57acaba0ed3c24645fe69470e487c135
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-07T07:05:31Z

    Update parsers, check analysis, and hive type coercion.

commit 55841a551f9ff20e21c1d1a6afc0a6b39dd15d9e
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-07T19:45:02Z

    Update catalyst dsl.

commit d2ca2902a60a0ea3796d1737a7c01fd863d1781c
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-07T19:49:11Z

    Update HiveTypeCoercion

commit aa8daa99398392b6f3ba5c0e18576e7f0b99553b
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-08T00:38:28Z

    Update logical operators.

commit 10630e1a230e5fde9671fe0bac07c8c1af74c087
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-08T00:41:41Z

    Merge remote-tracking branch 'upstream/master' into removeAgg1

commit e8d39d146b2a629f640cfd45d54f97564228a772
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-08T00:56:52Z

    Remove the mapping from old agg functions to new agg functions.

commit a3df1fd9c978a3e2a4fdbec04ee0827066eb1e7a
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-08T01:01:30Z

    Remove old agg operator.

commit 5ed4970382ab09d4e7f8a606c1fbde41c8cfe348
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-08T02:41:33Z

    Throw exception when we have count distinct. Once https://github.com/apache/spark/pull/9409 is in, we will remove this limitation.

commit 881afe8637acfa74044947563a08d80a8150fd91
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-08T05:11:03Z

    Update tests.

commit 24512337ad035de47d6402ee4fb50206575973d5
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-08T05:12:50Z

    Style.

commit 1ef0b92ac4472d0c4e9c662a8b08eb4494712b3a
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-09T00:03:06Z

    Check analysis.

commit 35dfe9340e7f62fa3c59636a0024e4d2caf5892d
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-09T01:07:45Z

    Fix AggregateQuerySuite.

commit 6d2919554b53176c1bdf4f2837404e044eba9876
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-09T02:47:35Z

    Merge remote-tracking branch 'upstream/master' into removeAgg1
    
    Conflicts:
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala

commit 383511048cbadb37a392f7cde820a83bfd64bf37
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-09T03:39:49Z

    Fix test.

commit c88c97d49b83e6fc4cc9b0d1aa27462d6fe67417
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-09T04:18:49Z

    Since an agg function's children can be unresolved (when we use them in DF's API), making all fields that may use children lazy vals.

commit c6c6c094dd2efde9778bd1bff5f4cf3c897db9f1
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-09T04:41:36Z

    Fix tests.

commit b73156ad39f90ff52ad3091da1c274ae565da46f
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-09T04:42:40Z

    style

commit cb3f4d3f6486b0aa1b668ee901cde353e829b7f5
Author: Yin Huai <yh...@databricks.com>
Date:   2015-11-09T04:45:31Z

    Renaming:
    AggregateExpression2 => AggregateExpression
    AggregateFunction2 => AggregateFunction

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154950298
  
    **[Test build #45342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45342/consoleFull)** for PR 9556 at commit [`cb3f4d3`](https://github.com/apache/spark/commit/cb3f4d3f6486b0aa1b668ee901cde353e829b7f5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155262424
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44361789
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
    @@ -448,15 +448,17 @@ private[spark] object SQLConf {
         defaultValue = Some(true),
         isPublic = false)
     
    -  val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2",
    -    defaultValue = Some(true), doc = "<TODO>")
    -
       val RUN_SQL_ON_FILES = booleanConf("spark.sql.runSQLOnFiles",
         defaultValue = Some(true),
         isPublic = false,
         doc = "When true, we could use `datasource`.`path` as table in SQL query"
       )
     
    +  val SPECIALIZE_SINGLE_DISTINCT_AGG_PLANNING =
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155348763
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155257977
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44355961
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -108,7 +109,16 @@ trait CheckAnalysis {
     
               case Aggregate(groupingExprs, aggregateExprs, child) =>
                 def checkValidAggregateExpression(expr: Expression): Unit = expr match {
    -              case _: AggregateExpression => // OK
    +              case aggExpr: AggregateExpression =>
    +                // TODO: Is it possible that the child of a agg function is another
    +                // agg function?
    +                aggExpr.aggregateFunction.children.foreach {
    +                  case child if !child.deterministic =>
    +                    failAnalysis(
    +                      s"nondeterministic expression ${expr.prettyString} are not allowed " +
    +                        s"in grouping expression.")
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44451117
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala ---
    @@ -29,16 +31,13 @@ case class Sum(child: Expression) extends DeclarativeAggregate {
       // Return data type.
       override def dataType: DataType = resultType
     
    -  // Expected input data type.
    -  // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the
    -  // new version at planning time (after analysis phase). For now, NullType is added at here
    -  // to make it resolved when we have cases like `select sum(null)`.
    -  // We can use our analyzer to cast NullType to the default data type of the NumericType once
    -  // we remove the old aggregate functions. Then, we will not need NullType at here.
       override def inputTypes: Seq[AbstractDataType] =
         Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType))
    --- End diff --
    
    Should we remove the NullType here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44268156
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala ---
    @@ -21,22 +21,22 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.catalyst.InternalRow
     import org.apache.spark.sql.catalyst.errors._
     import org.apache.spark.sql.catalyst.expressions._
    -import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression2
    +import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
     import org.apache.spark.sql.catalyst.plans.physical._
     import org.apache.spark.sql.execution.metric.SQLMetrics
     import org.apache.spark.sql.execution.{SparkPlan, UnaryNode, UnsafeFixedWidthAggregationMap}
     import org.apache.spark.sql.types.StructType
     
     case class TungstenAggregate(
    -    requiredChildDistributionExpressions: Option[Seq[Expression]],
    -    groupingExpressions: Seq[NamedExpression],
    -    nonCompleteAggregateExpressions: Seq[AggregateExpression2],
    -    nonCompleteAggregateAttributes: Seq[Attribute],
    -    completeAggregateExpressions: Seq[AggregateExpression2],
    -    completeAggregateAttributes: Seq[Attribute],
    -    initialInputBufferOffset: Int,
    -    resultExpressions: Seq[NamedExpression],
    -    child: SparkPlan)
    +                              requiredChildDistributionExpressions: Option[Seq[Expression]],
    --- End diff --
    
    Nit... Style... IntelliJ IDEA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155231775
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155122434
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155264259
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44269743
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
    @@ -141,40 +141,46 @@ class WindowSpec private[sql](
        */
       private[sql] def withAggregate(aggregate: Column): Column = {
         val windowExpr = aggregate.expr match {
    -      case Average(child) => WindowExpression(
    -        UnresolvedWindowFunction("avg", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Sum(child) => WindowExpression(
    -        UnresolvedWindowFunction("sum", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Count(child) => WindowExpression(
    -        UnresolvedWindowFunction("count", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case First(child, ignoreNulls) => WindowExpression(
    -        // TODO this is a hack for Hive UDAF first_value
    -        UnresolvedWindowFunction(
    -          "first_value",
    -          child :: ignoreNulls :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Last(child, ignoreNulls) => WindowExpression(
    -        // TODO this is a hack for Hive UDAF last_value
    -        UnresolvedWindowFunction(
    -          "last_value",
    -          child :: ignoreNulls :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Min(child) => WindowExpression(
    -        UnresolvedWindowFunction("min", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Max(child) => WindowExpression(
    -        UnresolvedWindowFunction("max", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case wf: WindowFunction => WindowExpression(
    -        wf,
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +      case AggregateExpression(aggregateFunction, _, isDistinct) if !isDistinct =>
    +        aggregateFunction match {
    --- End diff --
    
    Window functions also need to be wrapped in an ```AggregateExpression``` in ```functions.scala``` for this to work. It currently fails HiveDataFrameWindowSuite due to this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44440841
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -108,7 +109,19 @@ trait CheckAnalysis {
     
               case Aggregate(groupingExprs, aggregateExprs, child) =>
                 def checkValidAggregateExpression(expr: Expression): Unit = expr match {
    -              case _: AggregateExpression => // OK
    +              case aggExpr: AggregateExpression =>
    +                // TODO: Is it possible that the child of a agg function is another
    --- End diff --
    
    After rewrite for multiple distinct, I think it's possible


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155118346
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44361188
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
    @@ -141,40 +141,56 @@ class WindowSpec private[sql](
        */
       private[sql] def withAggregate(aggregate: Column): Column = {
         val windowExpr = aggregate.expr match {
    -      case Average(child) => WindowExpression(
    -        UnresolvedWindowFunction("avg", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Sum(child) => WindowExpression(
    -        UnresolvedWindowFunction("sum", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Count(child) => WindowExpression(
    -        UnresolvedWindowFunction("count", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case First(child, ignoreNulls) => WindowExpression(
    -        // TODO this is a hack for Hive UDAF first_value
    -        UnresolvedWindowFunction(
    -          "first_value",
    -          child :: ignoreNulls :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Last(child, ignoreNulls) => WindowExpression(
    -        // TODO this is a hack for Hive UDAF last_value
    -        UnresolvedWindowFunction(
    -          "last_value",
    -          child :: ignoreNulls :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Min(child) => WindowExpression(
    -        UnresolvedWindowFunction("min", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Max(child) => WindowExpression(
    -        UnresolvedWindowFunction("max", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case wf: WindowFunction => WindowExpression(
    -        wf,
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +      // First, we check if we get an aggregate function without the DISTINCT keyword.
    +      // Right now, we do not support using a DISTINCT aggregate function as a
    +      // window function.
    +      case AggregateExpression(aggregateFunction, _, isDistinct) if !isDistinct =>
    +        aggregateFunction match {
    +          case Average(child) => WindowExpression(
    +            UnresolvedWindowFunction("avg", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Sum(child) => WindowExpression(
    +            UnresolvedWindowFunction("sum", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Count(child) => WindowExpression(
    +            UnresolvedWindowFunction("count", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case First(child, ignoreNulls) => WindowExpression(
    +            // TODO this is a hack for Hive UDAF first_value
    +            UnresolvedWindowFunction(
    +              "first_value",
    +              child :: ignoreNulls :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Last(child, ignoreNulls) => WindowExpression(
    +            // TODO this is a hack for Hive UDAF last_value
    +            UnresolvedWindowFunction(
    +              "last_value",
    +              child :: ignoreNulls :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Min(child) => WindowExpression(
    +            UnresolvedWindowFunction("min", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Max(child) => WindowExpression(
    +            UnresolvedWindowFunction("max", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case x =>
    +            throw new UnsupportedOperationException(s"$x is not supported in window operation.")
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155154682
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154930683
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44351639
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala ---
    @@ -32,23 +32,34 @@ case class Count(child: Expression) extends DeclarativeAggregate {
       // Expected input data type.
       override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType)
     
    -  private val count = AttributeReference("count", LongType)()
    +  private lazy val count = AttributeReference("count", LongType)()
     
    -  override val aggBufferAttributes = count :: Nil
    +  override lazy val aggBufferAttributes = count :: Nil
     
    -  override val initialValues = Seq(
    +  override lazy val initialValues = Seq(
         /* count = */ Literal(0L)
       )
     
    -  override val updateExpressions = Seq(
    +  override lazy val updateExpressions = Seq(
         /* count = */ If(IsNull(child), count, count + 1L)
       )
     
    -  override val mergeExpressions = Seq(
    +  override lazy val mergeExpressions = Seq(
         /* count = */ count.left + count.right
       )
     
    -  override val evaluateExpression = Cast(count, LongType)
    +  override lazy val evaluateExpression = Cast(count, LongType)
     
       override def defaultResult: Option[Literal] = Option(Literal(0L))
     }
    +
    +object Count {
    +  def apply(children: Seq[Expression]): Count = {
    --- End diff --
    
    Maybe document what is going on here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155282012
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155281959
  
    **[Test build #45474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45474/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155260980
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44371174
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala ---
    @@ -141,6 +141,10 @@ case class UnresolvedFunction(
       override def nullable: Boolean = throw new UnresolvedException(this, "nullable")
       override lazy val resolved = false
     
    +  override def prettyString: String = {
    +    s"${name}(${children.map(_.prettyString).mkString(",")})"
    --- End diff --
    
    @rxin @marmbrus It is better to change the `prettyString` of `UnresolvedFunction` to get rid of that `'` (e.g. `'abs`). The impact of this change is that if any user rely on that `'`, his/her code will break.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44451436
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala ---
    @@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends DeclarativeAggregate {
     
       override def dataType: DataType = resultType
     
    -  // Expected input data type.
    -  // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the
    -  // new version at planning time (after analysis phase). For now, NullType is added at here
    -  // to make it resolved when we have cases like `select stddev(null)`.
    -  // We can use our analyzer to cast NullType to the default data type of the NumericType once
    -  // we remove the old aggregate functions. Then, we will not need NullType at here.
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType))
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType))
    --- End diff --
    
    Is the NumericType still needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44464815
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala ---
    @@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends DeclarativeAggregate {
     
       override def dataType: DataType = resultType
     
    -  // Expected input data type.
    -  // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the
    -  // new version at planning time (after analysis phase). For now, NullType is added at here
    -  // to make it resolved when we have cases like `select stddev(null)`.
    -  // We can use our analyzer to cast NullType to the default data type of the NumericType once
    -  // we remove the old aggregate functions. Then, we will not need NullType at here.
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType))
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType))
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155119555
  
    **[Test build #45374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45374/consoleFull)** for PR 9556 at commit [`bf3dfb7`](https://github.com/apache/spark/commit/bf3dfb743ece5bbd36ad548c005eefcdf641a1ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155180027
  
    I think one follow-up work is to add tests in `ExpressionTypeCheckingSuite` to make sure those newly added agg functions (e.g. corr and variance) throw the right exception when the input data types do not match the expected types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44361592
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala ---
    @@ -88,30 +89,33 @@ class GroupedData protected[sql](
             namedExpr
           }
         }
    -    toDF(columnExprs.map(f))
    +    toDF(columnExprs.map(expr => f(expr).toAggregateExpression()))
       }
     
       private[this] def strToExpr(expr: String): (Expression => Expression) = {
    -    expr.toLowerCase match {
    -      case "avg" | "average" | "mean" => Average
    -      case "max" => Max
    -      case "min" => Min
    -      case "stddev" | "std" => StddevSamp
    -      case "stddev_pop" => StddevPop
    -      case "stddev_samp" => StddevSamp
    -      case "variance" => VarianceSamp
    -      case "var_pop" => VariancePop
    -      case "var_samp" => VarianceSamp
    -      case "sum" => Sum
    -      case "skewness" => Skewness
    -      case "kurtosis" => Kurtosis
    -      case "count" | "size" =>
    -        // Turn count(*) into count(1)
    -        (inputExpr: Expression) => inputExpr match {
    -          case s: Star => Count(Literal(1))
    -          case _ => Count(inputExpr)
    -        }
    +    val exprToFunc: (Expression => AggregateFunction) = {
    +      (inputExpr: Expression) => expr.toLowerCase match {
    +        case "avg" | "average" | "mean" => Average(inputExpr)
    +        case "max" => Max(inputExpr)
    +        case "min" => Min(inputExpr)
    +        case "stddev" | "std" => StddevSamp(inputExpr)
    +        case "stddev_pop" => StddevPop(inputExpr)
    +        case "stddev_samp" => StddevSamp(inputExpr)
    +        case "variance" => VarianceSamp(inputExpr, 0, 0)
    +        case "var_pop" => VariancePop(inputExpr, 0, 0)
    +        case "var_samp" => VarianceSamp(inputExpr, 0, 0)
    +        case "sum" => Sum(inputExpr)
    +        case "skewness" => Skewness(inputExpr, 0, 0)
    +        case "kurtosis" => Kurtosis(inputExpr, 0, 0)
    --- End diff --
    
    Done. btw, those offsets will be always set at executor side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44448953
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
    @@ -448,15 +448,24 @@ private[spark] object SQLConf {
         defaultValue = Some(true),
         isPublic = false)
     
    -  val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2",
    --- End diff --
    
    This is a public config, should we generate a warning if user try to use this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44355987
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -177,6 +178,7 @@ object FunctionRegistry {
         expression[ToRadians]("radians"),
     
         // aggregate functions
    +    expression[HyperLogLogPlusPlus]("approx_distinct"),
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155223396
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44357472
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala ---
    @@ -32,23 +32,34 @@ case class Count(child: Expression) extends DeclarativeAggregate {
       // Expected input data type.
       override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType)
     
    -  private val count = AttributeReference("count", LongType)()
    +  private lazy val count = AttributeReference("count", LongType)()
     
    -  override val aggBufferAttributes = count :: Nil
    +  override lazy val aggBufferAttributes = count :: Nil
     
    -  override val initialValues = Seq(
    +  override lazy val initialValues = Seq(
         /* count = */ Literal(0L)
       )
     
    -  override val updateExpressions = Seq(
    +  override lazy val updateExpressions = Seq(
         /* count = */ If(IsNull(child), count, count + 1L)
       )
     
    -  override val mergeExpressions = Seq(
    +  override lazy val mergeExpressions = Seq(
         /* count = */ count.left + count.right
       )
     
    -  override val evaluateExpression = Cast(count, LongType)
    +  override lazy val evaluateExpression = Cast(count, LongType)
     
       override def defaultResult: Option[Literal] = Option(Literal(0L))
     }
    +
    +object Count {
    +  def apply(children: Seq[Expression]): Count = {
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155533284
  
    We should keep reviewing and address any comments in a follow-up, but I'm going to merge this now to unblock other work.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44297570
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         }
       }
     
    -  object HashAggregation extends Strategy {
    -    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      // Aggregations that can be performed in two phases, before and after the shuffle.
    -      case PartialAggregation(
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          groupingExpressions,
    -          partialComputation,
    -          child) if !canBeConvertedToNewAggregation(plan) =>
    -        execution.Aggregate(
    -          partial = false,
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          execution.Aggregate(
    -            partial = true,
    -            groupingExpressions,
    -            partialComputation,
    -            planLater(child))) :: Nil
    -
    -      case _ => Nil
    -    }
    -
    -    def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match {
    -      case a: logical.Aggregate =>
    -        if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) {
    -          a.newAggregation.isDefined
    -        } else {
    -          Utils.checkInvalidAggregateFunction2(a)
    -          false
    -        }
    -      case _ => false
    -    }
    -
    -    def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
    -      exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
    -  }
    -
       /**
        * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
        */
       object Aggregation extends Strategy {
         def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
    -          sqlContext.conf.codegenEnabled =>
    -        val converted = p.newAggregation
    -        converted match {
    -          case None => Nil // Cannot convert to new aggregation code path.
    -          case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) =>
    -            // A single aggregate expression might appear multiple times in resultExpressions.
    -            // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -            // build a set of the distinct aggregate expressions and build a function which can
    -            // be used to re-write expressions so that they reference the single copy of the
    -            // aggregate function which actually gets computed.
    -            val aggregateExpressions = resultExpressions.flatMap { expr =>
    -              expr.collect {
    -                case agg: AggregateExpression2 => agg
    -              }
    -            }.distinct
    -            // For those distinct aggregate expressions, we create a map from the
    -            // aggregate function to the corresponding attribute of the function.
    -            val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    -              val aggregateFunction = agg.aggregateFunction
    -              val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    -              (aggregateFunction, agg.isDistinct) -> attribute
    -            }.toMap
    -
    -            val (functionsWithDistinct, functionsWithoutDistinct) =
    -              aggregateExpressions.partition(_.isDistinct)
    -            if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    -              // This is a sanity check. We should not reach here when we have multiple distinct
    -              // column sets (aggregate.NewAggregation will not match).
    -              sys.error(
    -                "Multiple distinct column sets are not supported by the new aggregation" +
    -                  "code path.")
    -            }
    +      case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
    +        // A single aggregate expression might appear multiple times in resultExpressions.
    +        // In order to avoid evaluating an individual aggregate function multiple times, we'll
    +        // build a set of the distinct aggregate expressions and build a function which can
    +        // be used to re-write expressions so that they reference the single copy of the
    +        // aggregate function which actually gets computed.
    +        val aggregateExpressions = resultExpressions.flatMap { expr =>
    +          expr.collect {
    +            case agg: AggregateExpression => agg
    +          }
    +        }.distinct
    +        // For those distinct aggregate expressions, we create a map from the
    +        // aggregate function to the corresponding attribute of the function.
    +        val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    +          val aggregateFunction = agg.aggregateFunction
    +          val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    +          (aggregateFunction, agg.isDistinct) -> attribute
    +        }.toMap
    +
    +        val (functionsWithDistinct, functionsWithoutDistinct) =
    +          aggregateExpressions.partition(_.isDistinct)
    +        if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    +          // This is a sanity check. We should not reach here when we have multiple distinct
    +          // column sets (aggregate.NewAggregation will not match).
    +          sys.error(
    +            "Multiple distinct column sets are not supported by the new aggregation" +
    +              "code path.")
    +        }
     
    -            val namedGroupingExpressions = groupingExpressions.map {
    -              case ne: NamedExpression => ne -> ne
    -              // If the expression is not a NamedExpressions, we add an alias.
    -              // So, when we generate the result of the operator, the Aggregate Operator
    -              // can directly get the Seq of attributes representing the grouping expressions.
    -              case other =>
    -                val withAlias = Alias(other, other.toString)()
    -                other -> withAlias
    -            }
    -            val groupExpressionMap = namedGroupingExpressions.toMap
    -
    -            // The original `resultExpressions` are a set of expressions which may reference
    -            // aggregate expressions, grouping column values, and constants. When aggregate operator
    -            // emits output rows, we will use `resultExpressions` to generate an output projection
    -            // which takes the grouping columns and final aggregate result buffer as input.
    -            // Thus, we must re-write the result expressions so that their attributes match up with
    -            // the attributes of the final result projection's input row:
    -            val rewrittenResultExpressions = resultExpressions.map { expr =>
    -              expr.transformDown {
    -                case AggregateExpression2(aggregateFunction, _, isDistinct) =>
    -                  // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    -                  // so replace each aggregate expression by its corresponding attribute in the set:
    -                  aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    -                case expression =>
    -                  // Since we're using `namedGroupingAttributes` to extract the grouping key
    -                  // columns, we need to replace grouping key expressions with their corresponding
    -                  // attributes. We do not rely on the equality check at here since attributes may
    -                  // differ cosmetically. Instead, we use semanticEquals.
    -                  groupExpressionMap.collectFirst {
    -                    case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    -                  }.getOrElse(expression)
    -              }.asInstanceOf[NamedExpression]
    +        val namedGroupingExpressions = groupingExpressions.map {
    +          case ne: NamedExpression => ne -> ne
    +          // If the expression is not a NamedExpressions, we add an alias.
    +          // So, when we generate the result of the operator, the Aggregate Operator
    +          // can directly get the Seq of attributes representing the grouping expressions.
    +          case other =>
    +            val withAlias = Alias(other, other.toString)()
    +            other -> withAlias
    +        }
    +        val groupExpressionMap = namedGroupingExpressions.toMap
    +
    +        // The original `resultExpressions` are a set of expressions which may reference
    +        // aggregate expressions, grouping column values, and constants. When aggregate operator
    +        // emits output rows, we will use `resultExpressions` to generate an output projection
    +        // which takes the grouping columns and final aggregate result buffer as input.
    +        // Thus, we must re-write the result expressions so that their attributes match up with
    +        // the attributes of the final result projection's input row:
    +        val rewrittenResultExpressions = resultExpressions.map { expr =>
    +          expr.transformDown {
    +            case AggregateExpression(aggregateFunction, _, isDistinct) =>
    +              // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    +              // so replace each aggregate expression by its corresponding attribute in the set:
    +              aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    +            case expression =>
    +              // Since we're using `namedGroupingAttributes` to extract the grouping key
    +              // columns, we need to replace grouping key expressions with their corresponding
    +              // attributes. We do not rely on the equality check at here since attributes may
    +              // differ cosmetically. Instead, we use semanticEquals.
    +              groupExpressionMap.collectFirst {
    +                case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    +              }.getOrElse(expression)
    +          }.asInstanceOf[NamedExpression]
    +        }
    +
    +        val aggregateOperator =
    +          if (aggregateExpressions.map(_.aggregateFunction).exists(!_.supportsPartial)) {
    +            if (functionsWithDistinct.nonEmpty) {
    +              sys.error("Distinct columns cannot exist in Aggregate operator containing " +
    +                "aggregate functions which don't support partial aggregation.")
    +            } else {
    +              aggregate.Utils.planAggregateWithoutPartial(
    +                namedGroupingExpressions.map(_._2),
    +                aggregateExpressions,
    +                aggregateFunctionToAttribute,
    +                rewrittenResultExpressions,
    +                planLater(child))
                 }
    +          } else if (functionsWithDistinct.isEmpty) {
    +            aggregate.Utils.planAggregateWithoutDistinct(
    +              namedGroupingExpressions.map(_._2),
    +              aggregateExpressions,
    +              aggregateFunctionToAttribute,
    +              rewrittenResultExpressions,
    +              planLater(child))
    +          } else {
    +            aggregate.Utils.planAggregateWithOneDistinct(
    --- End diff --
    
    Yeah, we should think about that. The benefit of not using the rewriter is that we can avoid of using `Expand`, having an extra column `gid` and an `IF` expression on `gid`. It will be good to see some evaluation results. Right now, I am inclined to use the rewriter when we have a single distinct column and we do not have any grouping expression. In other cases, we use our existing path. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44343490
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         }
       }
     
    -  object HashAggregation extends Strategy {
    -    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      // Aggregations that can be performed in two phases, before and after the shuffle.
    -      case PartialAggregation(
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          groupingExpressions,
    -          partialComputation,
    -          child) if !canBeConvertedToNewAggregation(plan) =>
    -        execution.Aggregate(
    -          partial = false,
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          execution.Aggregate(
    -            partial = true,
    -            groupingExpressions,
    -            partialComputation,
    -            planLater(child))) :: Nil
    -
    -      case _ => Nil
    -    }
    -
    -    def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match {
    -      case a: logical.Aggregate =>
    -        if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) {
    -          a.newAggregation.isDefined
    -        } else {
    -          Utils.checkInvalidAggregateFunction2(a)
    -          false
    -        }
    -      case _ => false
    -    }
    -
    -    def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
    -      exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
    -  }
    -
       /**
        * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
        */
       object Aggregation extends Strategy {
         def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
    -          sqlContext.conf.codegenEnabled =>
    -        val converted = p.newAggregation
    -        converted match {
    -          case None => Nil // Cannot convert to new aggregation code path.
    -          case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) =>
    -            // A single aggregate expression might appear multiple times in resultExpressions.
    -            // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -            // build a set of the distinct aggregate expressions and build a function which can
    -            // be used to re-write expressions so that they reference the single copy of the
    -            // aggregate function which actually gets computed.
    -            val aggregateExpressions = resultExpressions.flatMap { expr =>
    -              expr.collect {
    -                case agg: AggregateExpression2 => agg
    -              }
    -            }.distinct
    -            // For those distinct aggregate expressions, we create a map from the
    -            // aggregate function to the corresponding attribute of the function.
    -            val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    -              val aggregateFunction = agg.aggregateFunction
    -              val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    -              (aggregateFunction, agg.isDistinct) -> attribute
    -            }.toMap
    -
    -            val (functionsWithDistinct, functionsWithoutDistinct) =
    -              aggregateExpressions.partition(_.isDistinct)
    -            if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    -              // This is a sanity check. We should not reach here when we have multiple distinct
    -              // column sets (aggregate.NewAggregation will not match).
    -              sys.error(
    -                "Multiple distinct column sets are not supported by the new aggregation" +
    -                  "code path.")
    -            }
    +      case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
    +        // A single aggregate expression might appear multiple times in resultExpressions.
    +        // In order to avoid evaluating an individual aggregate function multiple times, we'll
    +        // build a set of the distinct aggregate expressions and build a function which can
    +        // be used to re-write expressions so that they reference the single copy of the
    +        // aggregate function which actually gets computed.
    +        val aggregateExpressions = resultExpressions.flatMap { expr =>
    +          expr.collect {
    +            case agg: AggregateExpression => agg
    +          }
    +        }.distinct
    +        // For those distinct aggregate expressions, we create a map from the
    +        // aggregate function to the corresponding attribute of the function.
    +        val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    +          val aggregateFunction = agg.aggregateFunction
    +          val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    +          (aggregateFunction, agg.isDistinct) -> attribute
    +        }.toMap
    +
    +        val (functionsWithDistinct, functionsWithoutDistinct) =
    +          aggregateExpressions.partition(_.isDistinct)
    +        if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    +          // This is a sanity check. We should not reach here when we have multiple distinct
    +          // column sets (aggregate.NewAggregation will not match).
    +          sys.error(
    +            "Multiple distinct column sets are not supported by the new aggregation" +
    +              "code path.")
    +        }
     
    -            val namedGroupingExpressions = groupingExpressions.map {
    -              case ne: NamedExpression => ne -> ne
    -              // If the expression is not a NamedExpressions, we add an alias.
    -              // So, when we generate the result of the operator, the Aggregate Operator
    -              // can directly get the Seq of attributes representing the grouping expressions.
    -              case other =>
    -                val withAlias = Alias(other, other.toString)()
    -                other -> withAlias
    -            }
    -            val groupExpressionMap = namedGroupingExpressions.toMap
    -
    -            // The original `resultExpressions` are a set of expressions which may reference
    -            // aggregate expressions, grouping column values, and constants. When aggregate operator
    -            // emits output rows, we will use `resultExpressions` to generate an output projection
    -            // which takes the grouping columns and final aggregate result buffer as input.
    -            // Thus, we must re-write the result expressions so that their attributes match up with
    -            // the attributes of the final result projection's input row:
    -            val rewrittenResultExpressions = resultExpressions.map { expr =>
    -              expr.transformDown {
    -                case AggregateExpression2(aggregateFunction, _, isDistinct) =>
    -                  // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    -                  // so replace each aggregate expression by its corresponding attribute in the set:
    -                  aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    -                case expression =>
    -                  // Since we're using `namedGroupingAttributes` to extract the grouping key
    -                  // columns, we need to replace grouping key expressions with their corresponding
    -                  // attributes. We do not rely on the equality check at here since attributes may
    -                  // differ cosmetically. Instead, we use semanticEquals.
    -                  groupExpressionMap.collectFirst {
    -                    case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    -                  }.getOrElse(expression)
    -              }.asInstanceOf[NamedExpression]
    +        val namedGroupingExpressions = groupingExpressions.map {
    +          case ne: NamedExpression => ne -> ne
    +          // If the expression is not a NamedExpressions, we add an alias.
    +          // So, when we generate the result of the operator, the Aggregate Operator
    +          // can directly get the Seq of attributes representing the grouping expressions.
    +          case other =>
    +            val withAlias = Alias(other, other.toString)()
    +            other -> withAlias
    +        }
    +        val groupExpressionMap = namedGroupingExpressions.toMap
    +
    +        // The original `resultExpressions` are a set of expressions which may reference
    +        // aggregate expressions, grouping column values, and constants. When aggregate operator
    +        // emits output rows, we will use `resultExpressions` to generate an output projection
    +        // which takes the grouping columns and final aggregate result buffer as input.
    +        // Thus, we must re-write the result expressions so that their attributes match up with
    +        // the attributes of the final result projection's input row:
    +        val rewrittenResultExpressions = resultExpressions.map { expr =>
    +          expr.transformDown {
    +            case AggregateExpression(aggregateFunction, _, isDistinct) =>
    +              // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    +              // so replace each aggregate expression by its corresponding attribute in the set:
    +              aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    +            case expression =>
    +              // Since we're using `namedGroupingAttributes` to extract the grouping key
    +              // columns, we need to replace grouping key expressions with their corresponding
    +              // attributes. We do not rely on the equality check at here since attributes may
    +              // differ cosmetically. Instead, we use semanticEquals.
    +              groupExpressionMap.collectFirst {
    +                case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    +              }.getOrElse(expression)
    +          }.asInstanceOf[NamedExpression]
    +        }
    +
    +        val aggregateOperator =
    +          if (aggregateExpressions.map(_.aggregateFunction).exists(!_.supportsPartial)) {
    +            if (functionsWithDistinct.nonEmpty) {
    +              sys.error("Distinct columns cannot exist in Aggregate operator containing " +
    +                "aggregate functions which don't support partial aggregation.")
    +            } else {
    +              aggregate.Utils.planAggregateWithoutPartial(
    +                namedGroupingExpressions.map(_._2),
    +                aggregateExpressions,
    +                aggregateFunctionToAttribute,
    +                rewrittenResultExpressions,
    +                planLater(child))
                 }
    +          } else if (functionsWithDistinct.isEmpty) {
    +            aggregate.Utils.planAggregateWithoutDistinct(
    +              namedGroupingExpressions.map(_._2),
    +              aggregateExpressions,
    +              aggregateFunctionToAttribute,
    +              rewrittenResultExpressions,
    +              planLater(child))
    +          } else {
    +            aggregate.Utils.planAggregateWithOneDistinct(
    --- End diff --
    
    @yhuai I was thinking the same thing. This would make it easier to benchmark the different paths. We could address this in a follow-up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155290543
  
    **[Test build #45476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45476/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155160400
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155182875
  
    **[Test build #45407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45407/consoleFull)** for PR 9556 at commit [`bad4184`](https://github.com/apache/spark/commit/bad418404e7ba91402279f09a06c21e1930e2b13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44451806
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala ---
    @@ -29,16 +31,13 @@ case class Sum(child: Expression) extends DeclarativeAggregate {
       // Return data type.
       override def dataType: DataType = resultType
     
    -  // Expected input data type.
    -  // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the
    -  // new version at planning time (after analysis phase). For now, NullType is added at here
    -  // to make it resolved when we have cases like `select sum(null)`.
    -  // We can use our analyzer to cast NullType to the default data type of the NumericType once
    -  // we remove the old aggregate functions. Then, we will not need NullType at here.
       override def inputTypes: Seq[AbstractDataType] =
         Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType))
    --- End diff --
    
    yeah, we should remove it. I somehow missed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44443426
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -120,14 +133,26 @@ trait CheckAnalysis {
                   case e => e.children.foreach(checkValidAggregateExpression)
                 }
     
    -            def checkValidGroupingExprs(expr: Expression): Unit = expr.dataType match {
    -              case BinaryType =>
    -                failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " +
    -                  "in grouping expression")
    -              case m: MapType =>
    -                failAnalysis(s"map type expression ${expr.prettyString} cannot be used " +
    -                  "in grouping expression")
    -              case _ => // OK
    +            def checkValidGroupingExprs(expr: Expression): Unit = {
    +              expr.dataType match {
    +                case BinaryType =>
    +                  failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " +
    +                    "in grouping expression")
    +                case a: ArrayType =>
    +                  failAnalysis(s"array type expression ${expr.prettyString} cannot be used " +
    +                    "in grouping expression")
    +                case m: MapType =>
    +                  failAnalysis(s"map type expression ${expr.prettyString} cannot be used " +
    +                    "in grouping expression")
    +                case _ => // OK
    --- End diff --
    
    We should also check UDT and types inside StructType


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155228625
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44360927
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala ---
    @@ -24,6 +24,8 @@ case class Kurtosis(child: Expression,
         inputAggBufferOffset: Int = 0)
       extends CentralMomentAgg(child) {
     
    +  def this(child: Expression) = this(child, 0, 0)
    --- End diff --
    
    Will drop those default args in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44451909
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala ---
    @@ -55,13 +57,10 @@ abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate w
     
       override def dataType: DataType = DoubleType
     
    -  // Expected input data type.
    -  // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the
    -  // new version at planning time (after analysis phase). For now, NullType is added at here
    -  // to make it resolved when we have cases like `select avg(null)`.
    -  // We can use our analyzer to cast NullType to the default data type of the NumericType once
    -  // we remove the old aggregate functions. Then, we will not need NullType at here.
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType))
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType))
    --- End diff --
    
    remove TypeCollection?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44344541
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         }
       }
     
    -  object HashAggregation extends Strategy {
    -    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      // Aggregations that can be performed in two phases, before and after the shuffle.
    -      case PartialAggregation(
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          groupingExpressions,
    -          partialComputation,
    -          child) if !canBeConvertedToNewAggregation(plan) =>
    -        execution.Aggregate(
    -          partial = false,
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          execution.Aggregate(
    -            partial = true,
    -            groupingExpressions,
    -            partialComputation,
    -            planLater(child))) :: Nil
    -
    -      case _ => Nil
    -    }
    -
    -    def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match {
    -      case a: logical.Aggregate =>
    -        if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) {
    -          a.newAggregation.isDefined
    -        } else {
    -          Utils.checkInvalidAggregateFunction2(a)
    -          false
    -        }
    -      case _ => false
    -    }
    -
    -    def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
    -      exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
    -  }
    -
       /**
        * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
        */
       object Aggregation extends Strategy {
         def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
    -          sqlContext.conf.codegenEnabled =>
    -        val converted = p.newAggregation
    -        converted match {
    -          case None => Nil // Cannot convert to new aggregation code path.
    -          case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) =>
    -            // A single aggregate expression might appear multiple times in resultExpressions.
    -            // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -            // build a set of the distinct aggregate expressions and build a function which can
    -            // be used to re-write expressions so that they reference the single copy of the
    -            // aggregate function which actually gets computed.
    -            val aggregateExpressions = resultExpressions.flatMap { expr =>
    -              expr.collect {
    -                case agg: AggregateExpression2 => agg
    -              }
    -            }.distinct
    -            // For those distinct aggregate expressions, we create a map from the
    -            // aggregate function to the corresponding attribute of the function.
    -            val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    -              val aggregateFunction = agg.aggregateFunction
    -              val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    -              (aggregateFunction, agg.isDistinct) -> attribute
    -            }.toMap
    -
    -            val (functionsWithDistinct, functionsWithoutDistinct) =
    -              aggregateExpressions.partition(_.isDistinct)
    -            if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    -              // This is a sanity check. We should not reach here when we have multiple distinct
    -              // column sets (aggregate.NewAggregation will not match).
    -              sys.error(
    -                "Multiple distinct column sets are not supported by the new aggregation" +
    -                  "code path.")
    -            }
    +      case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
    +        // A single aggregate expression might appear multiple times in resultExpressions.
    +        // In order to avoid evaluating an individual aggregate function multiple times, we'll
    +        // build a set of the distinct aggregate expressions and build a function which can
    +        // be used to re-write expressions so that they reference the single copy of the
    +        // aggregate function which actually gets computed.
    +        val aggregateExpressions = resultExpressions.flatMap { expr =>
    +          expr.collect {
    +            case agg: AggregateExpression => agg
    +          }
    +        }.distinct
    +        // For those distinct aggregate expressions, we create a map from the
    +        // aggregate function to the corresponding attribute of the function.
    +        val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    +          val aggregateFunction = agg.aggregateFunction
    +          val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    +          (aggregateFunction, agg.isDistinct) -> attribute
    +        }.toMap
    +
    +        val (functionsWithDistinct, functionsWithoutDistinct) =
    +          aggregateExpressions.partition(_.isDistinct)
    +        if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    +          // This is a sanity check. We should not reach here when we have multiple distinct
    +          // column sets (aggregate.NewAggregation will not match).
    +          sys.error(
    +            "Multiple distinct column sets are not supported by the new aggregation" +
    +              "code path.")
    +        }
     
    -            val namedGroupingExpressions = groupingExpressions.map {
    -              case ne: NamedExpression => ne -> ne
    -              // If the expression is not a NamedExpressions, we add an alias.
    -              // So, when we generate the result of the operator, the Aggregate Operator
    -              // can directly get the Seq of attributes representing the grouping expressions.
    -              case other =>
    -                val withAlias = Alias(other, other.toString)()
    -                other -> withAlias
    -            }
    -            val groupExpressionMap = namedGroupingExpressions.toMap
    -
    -            // The original `resultExpressions` are a set of expressions which may reference
    -            // aggregate expressions, grouping column values, and constants. When aggregate operator
    -            // emits output rows, we will use `resultExpressions` to generate an output projection
    -            // which takes the grouping columns and final aggregate result buffer as input.
    -            // Thus, we must re-write the result expressions so that their attributes match up with
    -            // the attributes of the final result projection's input row:
    -            val rewrittenResultExpressions = resultExpressions.map { expr =>
    -              expr.transformDown {
    -                case AggregateExpression2(aggregateFunction, _, isDistinct) =>
    -                  // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    -                  // so replace each aggregate expression by its corresponding attribute in the set:
    -                  aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    -                case expression =>
    -                  // Since we're using `namedGroupingAttributes` to extract the grouping key
    -                  // columns, we need to replace grouping key expressions with their corresponding
    -                  // attributes. We do not rely on the equality check at here since attributes may
    -                  // differ cosmetically. Instead, we use semanticEquals.
    -                  groupExpressionMap.collectFirst {
    -                    case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    -                  }.getOrElse(expression)
    -              }.asInstanceOf[NamedExpression]
    +        val namedGroupingExpressions = groupingExpressions.map {
    +          case ne: NamedExpression => ne -> ne
    +          // If the expression is not a NamedExpressions, we add an alias.
    +          // So, when we generate the result of the operator, the Aggregate Operator
    +          // can directly get the Seq of attributes representing the grouping expressions.
    +          case other =>
    +            val withAlias = Alias(other, other.toString)()
    +            other -> withAlias
    +        }
    +        val groupExpressionMap = namedGroupingExpressions.toMap
    +
    +        // The original `resultExpressions` are a set of expressions which may reference
    +        // aggregate expressions, grouping column values, and constants. When aggregate operator
    +        // emits output rows, we will use `resultExpressions` to generate an output projection
    +        // which takes the grouping columns and final aggregate result buffer as input.
    +        // Thus, we must re-write the result expressions so that their attributes match up with
    +        // the attributes of the final result projection's input row:
    +        val rewrittenResultExpressions = resultExpressions.map { expr =>
    +          expr.transformDown {
    +            case AggregateExpression(aggregateFunction, _, isDistinct) =>
    +              // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    +              // so replace each aggregate expression by its corresponding attribute in the set:
    +              aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    +            case expression =>
    +              // Since we're using `namedGroupingAttributes` to extract the grouping key
    +              // columns, we need to replace grouping key expressions with their corresponding
    +              // attributes. We do not rely on the equality check at here since attributes may
    +              // differ cosmetically. Instead, we use semanticEquals.
    +              groupExpressionMap.collectFirst {
    +                case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    +              }.getOrElse(expression)
    +          }.asInstanceOf[NamedExpression]
    +        }
    +
    +        val aggregateOperator =
    +          if (aggregateExpressions.map(_.aggregateFunction).exists(!_.supportsPartial)) {
    +            if (functionsWithDistinct.nonEmpty) {
    +              sys.error("Distinct columns cannot exist in Aggregate operator containing " +
    +                "aggregate functions which don't support partial aggregation.")
    +            } else {
    +              aggregate.Utils.planAggregateWithoutPartial(
    +                namedGroupingExpressions.map(_._2),
    +                aggregateExpressions,
    +                aggregateFunctionToAttribute,
    +                rewrittenResultExpressions,
    +                planLater(child))
                 }
    +          } else if (functionsWithDistinct.isEmpty) {
    +            aggregate.Utils.planAggregateWithoutDistinct(
    +              namedGroupingExpressions.map(_._2),
    +              aggregateExpressions,
    +              aggregateFunctionToAttribute,
    +              rewrittenResultExpressions,
    +              planLater(child))
    +          } else {
    +            aggregate.Utils.planAggregateWithOneDistinct(
    --- End diff --
    
    Sure. I have prepared the diff for that. Once this one is in, I will send out a PR for that conf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44352010
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
    @@ -141,40 +141,56 @@ class WindowSpec private[sql](
        */
       private[sql] def withAggregate(aggregate: Column): Column = {
         val windowExpr = aggregate.expr match {
    -      case Average(child) => WindowExpression(
    -        UnresolvedWindowFunction("avg", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Sum(child) => WindowExpression(
    -        UnresolvedWindowFunction("sum", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Count(child) => WindowExpression(
    -        UnresolvedWindowFunction("count", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case First(child, ignoreNulls) => WindowExpression(
    -        // TODO this is a hack for Hive UDAF first_value
    -        UnresolvedWindowFunction(
    -          "first_value",
    -          child :: ignoreNulls :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Last(child, ignoreNulls) => WindowExpression(
    -        // TODO this is a hack for Hive UDAF last_value
    -        UnresolvedWindowFunction(
    -          "last_value",
    -          child :: ignoreNulls :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Min(child) => WindowExpression(
    -        UnresolvedWindowFunction("min", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case Max(child) => WindowExpression(
    -        UnresolvedWindowFunction("max", child :: Nil),
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    -      case wf: WindowFunction => WindowExpression(
    -        wf,
    -        WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +      // First, we check if we get an aggregate function without the DISTINCT keyword.
    +      // Right now, we do not support using a DISTINCT aggregate function as a
    +      // window function.
    +      case AggregateExpression(aggregateFunction, _, isDistinct) if !isDistinct =>
    +        aggregateFunction match {
    +          case Average(child) => WindowExpression(
    +            UnresolvedWindowFunction("avg", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Sum(child) => WindowExpression(
    +            UnresolvedWindowFunction("sum", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Count(child) => WindowExpression(
    +            UnresolvedWindowFunction("count", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case First(child, ignoreNulls) => WindowExpression(
    +            // TODO this is a hack for Hive UDAF first_value
    +            UnresolvedWindowFunction(
    +              "first_value",
    +              child :: ignoreNulls :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Last(child, ignoreNulls) => WindowExpression(
    +            // TODO this is a hack for Hive UDAF last_value
    +            UnresolvedWindowFunction(
    +              "last_value",
    +              child :: ignoreNulls :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Min(child) => WindowExpression(
    +            UnresolvedWindowFunction("min", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case Max(child) => WindowExpression(
    +            UnresolvedWindowFunction("max", child :: Nil),
    +            WindowSpecDefinition(partitionSpec, orderSpec, frame))
    +          case x =>
    +            throw new UnsupportedOperationException(s"$x is not supported in window operation.")
    --- End diff --
    
    nit: "in a window operation" or "in window operations"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155263282
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44268074
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         }
       }
     
    -  object HashAggregation extends Strategy {
    -    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      // Aggregations that can be performed in two phases, before and after the shuffle.
    -      case PartialAggregation(
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          groupingExpressions,
    -          partialComputation,
    -          child) if !canBeConvertedToNewAggregation(plan) =>
    -        execution.Aggregate(
    -          partial = false,
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          execution.Aggregate(
    -            partial = true,
    -            groupingExpressions,
    -            partialComputation,
    -            planLater(child))) :: Nil
    -
    -      case _ => Nil
    -    }
    -
    -    def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match {
    -      case a: logical.Aggregate =>
    -        if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) {
    -          a.newAggregation.isDefined
    -        } else {
    -          Utils.checkInvalidAggregateFunction2(a)
    -          false
    -        }
    -      case _ => false
    -    }
    -
    -    def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
    -      exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
    -  }
    -
       /**
        * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
        */
       object Aggregation extends Strategy {
         def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
    -          sqlContext.conf.codegenEnabled =>
    -        val converted = p.newAggregation
    -        converted match {
    -          case None => Nil // Cannot convert to new aggregation code path.
    -          case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) =>
    -            // A single aggregate expression might appear multiple times in resultExpressions.
    -            // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -            // build a set of the distinct aggregate expressions and build a function which can
    -            // be used to re-write expressions so that they reference the single copy of the
    -            // aggregate function which actually gets computed.
    -            val aggregateExpressions = resultExpressions.flatMap { expr =>
    -              expr.collect {
    -                case agg: AggregateExpression2 => agg
    -              }
    -            }.distinct
    -            // For those distinct aggregate expressions, we create a map from the
    -            // aggregate function to the corresponding attribute of the function.
    -            val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    -              val aggregateFunction = agg.aggregateFunction
    -              val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    -              (aggregateFunction, agg.isDistinct) -> attribute
    -            }.toMap
    -
    -            val (functionsWithDistinct, functionsWithoutDistinct) =
    -              aggregateExpressions.partition(_.isDistinct)
    -            if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    -              // This is a sanity check. We should not reach here when we have multiple distinct
    -              // column sets (aggregate.NewAggregation will not match).
    -              sys.error(
    -                "Multiple distinct column sets are not supported by the new aggregation" +
    -                  "code path.")
    -            }
    +      case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
    +        // A single aggregate expression might appear multiple times in resultExpressions.
    +        // In order to avoid evaluating an individual aggregate function multiple times, we'll
    +        // build a set of the distinct aggregate expressions and build a function which can
    +        // be used to re-write expressions so that they reference the single copy of the
    +        // aggregate function which actually gets computed.
    +        val aggregateExpressions = resultExpressions.flatMap { expr =>
    +          expr.collect {
    +            case agg: AggregateExpression => agg
    +          }
    +        }.distinct
    +        // For those distinct aggregate expressions, we create a map from the
    +        // aggregate function to the corresponding attribute of the function.
    +        val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    +          val aggregateFunction = agg.aggregateFunction
    +          val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    +          (aggregateFunction, agg.isDistinct) -> attribute
    +        }.toMap
    +
    +        val (functionsWithDistinct, functionsWithoutDistinct) =
    +          aggregateExpressions.partition(_.isDistinct)
    +        if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    +          // This is a sanity check. We should not reach here when we have multiple distinct
    +          // column sets (aggregate.NewAggregation will not match).
    +          sys.error(
    +            "Multiple distinct column sets are not supported by the new aggregation" +
    +              "code path.")
    +        }
     
    -            val namedGroupingExpressions = groupingExpressions.map {
    -              case ne: NamedExpression => ne -> ne
    -              // If the expression is not a NamedExpressions, we add an alias.
    -              // So, when we generate the result of the operator, the Aggregate Operator
    -              // can directly get the Seq of attributes representing the grouping expressions.
    -              case other =>
    -                val withAlias = Alias(other, other.toString)()
    -                other -> withAlias
    -            }
    -            val groupExpressionMap = namedGroupingExpressions.toMap
    -
    -            // The original `resultExpressions` are a set of expressions which may reference
    -            // aggregate expressions, grouping column values, and constants. When aggregate operator
    -            // emits output rows, we will use `resultExpressions` to generate an output projection
    -            // which takes the grouping columns and final aggregate result buffer as input.
    -            // Thus, we must re-write the result expressions so that their attributes match up with
    -            // the attributes of the final result projection's input row:
    -            val rewrittenResultExpressions = resultExpressions.map { expr =>
    -              expr.transformDown {
    -                case AggregateExpression2(aggregateFunction, _, isDistinct) =>
    -                  // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    -                  // so replace each aggregate expression by its corresponding attribute in the set:
    -                  aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    -                case expression =>
    -                  // Since we're using `namedGroupingAttributes` to extract the grouping key
    -                  // columns, we need to replace grouping key expressions with their corresponding
    -                  // attributes. We do not rely on the equality check at here since attributes may
    -                  // differ cosmetically. Instead, we use semanticEquals.
    -                  groupExpressionMap.collectFirst {
    -                    case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    -                  }.getOrElse(expression)
    -              }.asInstanceOf[NamedExpression]
    +        val namedGroupingExpressions = groupingExpressions.map {
    +          case ne: NamedExpression => ne -> ne
    +          // If the expression is not a NamedExpressions, we add an alias.
    +          // So, when we generate the result of the operator, the Aggregate Operator
    +          // can directly get the Seq of attributes representing the grouping expressions.
    +          case other =>
    +            val withAlias = Alias(other, other.toString)()
    +            other -> withAlias
    +        }
    +        val groupExpressionMap = namedGroupingExpressions.toMap
    +
    +        // The original `resultExpressions` are a set of expressions which may reference
    +        // aggregate expressions, grouping column values, and constants. When aggregate operator
    +        // emits output rows, we will use `resultExpressions` to generate an output projection
    +        // which takes the grouping columns and final aggregate result buffer as input.
    +        // Thus, we must re-write the result expressions so that their attributes match up with
    +        // the attributes of the final result projection's input row:
    +        val rewrittenResultExpressions = resultExpressions.map { expr =>
    +          expr.transformDown {
    +            case AggregateExpression(aggregateFunction, _, isDistinct) =>
    +              // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    +              // so replace each aggregate expression by its corresponding attribute in the set:
    +              aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    +            case expression =>
    +              // Since we're using `namedGroupingAttributes` to extract the grouping key
    +              // columns, we need to replace grouping key expressions with their corresponding
    +              // attributes. We do not rely on the equality check at here since attributes may
    +              // differ cosmetically. Instead, we use semanticEquals.
    +              groupExpressionMap.collectFirst {
    +                case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    +              }.getOrElse(expression)
    +          }.asInstanceOf[NamedExpression]
    +        }
    +
    +        val aggregateOperator =
    +          if (aggregateExpressions.map(_.aggregateFunction).exists(!_.supportsPartial)) {
    +            if (functionsWithDistinct.nonEmpty) {
    +              sys.error("Distinct columns cannot exist in Aggregate operator containing " +
    +                "aggregate functions which don't support partial aggregation.")
    +            } else {
    +              aggregate.Utils.planAggregateWithoutPartial(
    +                namedGroupingExpressions.map(_._2),
    +                aggregateExpressions,
    +                aggregateFunctionToAttribute,
    +                rewrittenResultExpressions,
    +                planLater(child))
                 }
    +          } else if (functionsWithDistinct.isEmpty) {
    +            aggregate.Utils.planAggregateWithoutDistinct(
    +              namedGroupingExpressions.map(_._2),
    +              aggregateExpressions,
    +              aggregateFunctionToAttribute,
    +              rewrittenResultExpressions,
    +              planLater(child))
    +          } else {
    +            aggregate.Utils.planAggregateWithOneDistinct(
    --- End diff --
    
    We could still choose to rewrite all distinct expressions into an expanded aggregate. This is a bit easier to plan, and it might improve the processing of global aggregates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154984531
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155118316
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155130215
  
    **[Test build #45380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45380/consoleFull)** for PR 9556 at commit [`10a86fd`](https://github.com/apache/spark/commit/10a86fd67d35d0c6486e31700080d73ea2493d08).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44350660
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
    @@ -272,7 +273,7 @@ object SqlParser extends AbstractSparkSQLParser with DataTypeParser {
       protected lazy val function: Parser[Expression] =
         ( ident <~ ("(" ~ "*" ~ ")") ^^ { case udfName =>
           if (lexical.normalizeKeyword(udfName) == "count") {
    -        Count(Literal(1))
    +        AggregateExpression(Count(Literal(1)), mode = Complete, isDistinct = false)
    --- End diff --
    
    Is there a reason we can't just create unresolved functions here? (other than maybe the distinct case?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44350995
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -108,7 +109,16 @@ trait CheckAnalysis {
     
               case Aggregate(groupingExprs, aggregateExprs, child) =>
                 def checkValidAggregateExpression(expr: Expression): Unit = expr match {
    -              case _: AggregateExpression => // OK
    +              case aggExpr: AggregateExpression =>
    +                // TODO: Is it possible that the child of a agg function is another
    +                // agg function?
    +                aggExpr.aggregateFunction.children.foreach {
    +                  case child if !child.deterministic =>
    +                    failAnalysis(
    +                      s"nondeterministic expression ${expr.prettyString} are not allowed " +
    +                        s"in grouping expression.")
    --- End diff --
    
    This error message is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154930666
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155290683
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45476/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154989888
  
    **[Test build #45350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45350/consoleFull)** for PR 9556 at commit [`cb3f4d3`](https://github.com/apache/spark/commit/cb3f4d3f6486b0aa1b668ee901cde353e829b7f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155154504
  
    **[Test build #45374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45374/consoleFull)** for PR 9556 at commit [`bf3dfb7`](https://github.com/apache/spark/commit/bf3dfb743ece5bbd36ad548c005eefcdf641a1ce).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155232483
  
    **[Test build #2024 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2024/consoleFull)** for PR 9556 at commit [`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9556


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44358149
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala ---
    @@ -55,6 +56,22 @@ case class HyperLogLogPlusPlus(
       extends ImperativeAggregate {
       import HyperLogLogPlusPlus._
     
    +  def this(child: Expression) = {
    +    this(child, 0.05, 0, 0)
    +  }
    +
    +  def this(child: Expression, relativeSD: Expression) = {
    +    this(
    +      child,
    +      relativeSD match {
    +        case Literal(d: Double, DoubleType) => d
    +        case _ =>
    +          throw new AnalysisException("The second argument should be a double literal.")
    +      },
    +      0,
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154950509
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155262404
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44351475
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -177,6 +178,7 @@ object FunctionRegistry {
         expression[ToRadians]("radians"),
     
         // aggregate functions
    +    expression[HyperLogLogPlusPlus]("approx_distinct"),
    --- End diff --
    
    approx_count_distinct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44361205
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -53,6 +54,12 @@ object functions {
     
       private def withExpr(expr: Expression): Column = Column(expr)
     
    +  private def withAggregateFunction(
    +                                     func: AggregateFunction,
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155238823
  
    Yay, death to Aggregation1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155295464
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155014247
  
    **[Test build #45350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45350/consoleFull)** for PR 9556 at commit [`cb3f4d3`](https://github.com/apache/spark/commit/cb3f4d3f6486b0aa1b668ee901cde353e829b7f5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154930324
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155231968
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44355776
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -525,21 +526,15 @@ class Analyzer(
               case u @ UnresolvedFunction(name, children, isDistinct) =>
                 withPosition(u) {
                   registry.lookupFunction(name, children) match {
    +                // DISTINCT is not meaningful for a Max or a Min.
    +                case max: Max if isDistinct =>
    +                  AggregateExpression(max, Complete, isDistinct = false)
    +                case min: Min if isDistinct =>
    +                  AggregateExpression(min, Complete, isDistinct = false)
                     // We get an aggregate function built based on AggregateFunction2 interface.
                     // So, we wrap it in AggregateExpression2.
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44350703
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -525,21 +526,15 @@ class Analyzer(
               case u @ UnresolvedFunction(name, children, isDistinct) =>
                 withPosition(u) {
                   registry.lookupFunction(name, children) match {
    +                // DISTINCT is not meaningful for a Max or a Min.
    +                case max: Max if isDistinct =>
    +                  AggregateExpression(max, Complete, isDistinct = false)
    +                case min: Min if isDistinct =>
    +                  AggregateExpression(min, Complete, isDistinct = false)
                     // We get an aggregate function built based on AggregateFunction2 interface.
                     // So, we wrap it in AggregateExpression2.
    --- End diff --
    
    This is a little out of date.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155128490
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155232567
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44363471
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"),
     #' @export
     setMethod("approxCountDistinct",
               signature(x = "Column"),
    -          function(x, rsd = 0.95) {
    +          function(x, rsd = 0.05) {
    --- End diff --
    
    @shivaram Seems this default value should be 0.05 instead of 0.95, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44298295
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         }
       }
     
    -  object HashAggregation extends Strategy {
    -    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      // Aggregations that can be performed in two phases, before and after the shuffle.
    -      case PartialAggregation(
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          groupingExpressions,
    -          partialComputation,
    -          child) if !canBeConvertedToNewAggregation(plan) =>
    -        execution.Aggregate(
    -          partial = false,
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          execution.Aggregate(
    -            partial = true,
    -            groupingExpressions,
    -            partialComputation,
    -            planLater(child))) :: Nil
    -
    -      case _ => Nil
    -    }
    -
    -    def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match {
    -      case a: logical.Aggregate =>
    -        if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) {
    -          a.newAggregation.isDefined
    -        } else {
    -          Utils.checkInvalidAggregateFunction2(a)
    -          false
    -        }
    -      case _ => false
    -    }
    -
    -    def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
    -      exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
    -  }
    -
       /**
        * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
        */
       object Aggregation extends Strategy {
         def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
    -          sqlContext.conf.codegenEnabled =>
    -        val converted = p.newAggregation
    -        converted match {
    -          case None => Nil // Cannot convert to new aggregation code path.
    -          case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) =>
    -            // A single aggregate expression might appear multiple times in resultExpressions.
    -            // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -            // build a set of the distinct aggregate expressions and build a function which can
    -            // be used to re-write expressions so that they reference the single copy of the
    -            // aggregate function which actually gets computed.
    -            val aggregateExpressions = resultExpressions.flatMap { expr =>
    -              expr.collect {
    -                case agg: AggregateExpression2 => agg
    -              }
    -            }.distinct
    -            // For those distinct aggregate expressions, we create a map from the
    -            // aggregate function to the corresponding attribute of the function.
    -            val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    -              val aggregateFunction = agg.aggregateFunction
    -              val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    -              (aggregateFunction, agg.isDistinct) -> attribute
    -            }.toMap
    -
    -            val (functionsWithDistinct, functionsWithoutDistinct) =
    -              aggregateExpressions.partition(_.isDistinct)
    -            if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    -              // This is a sanity check. We should not reach here when we have multiple distinct
    -              // column sets (aggregate.NewAggregation will not match).
    -              sys.error(
    -                "Multiple distinct column sets are not supported by the new aggregation" +
    -                  "code path.")
    -            }
    +      case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
    +        // A single aggregate expression might appear multiple times in resultExpressions.
    +        // In order to avoid evaluating an individual aggregate function multiple times, we'll
    +        // build a set of the distinct aggregate expressions and build a function which can
    +        // be used to re-write expressions so that they reference the single copy of the
    +        // aggregate function which actually gets computed.
    +        val aggregateExpressions = resultExpressions.flatMap { expr =>
    +          expr.collect {
    +            case agg: AggregateExpression => agg
    +          }
    +        }.distinct
    +        // For those distinct aggregate expressions, we create a map from the
    +        // aggregate function to the corresponding attribute of the function.
    +        val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    +          val aggregateFunction = agg.aggregateFunction
    +          val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    +          (aggregateFunction, agg.isDistinct) -> attribute
    +        }.toMap
    +
    +        val (functionsWithDistinct, functionsWithoutDistinct) =
    +          aggregateExpressions.partition(_.isDistinct)
    +        if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    +          // This is a sanity check. We should not reach here when we have multiple distinct
    +          // column sets (aggregate.NewAggregation will not match).
    +          sys.error(
    --- End diff --
    
    will update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155160189
  
    **[Test build #45380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45380/consoleFull)** for PR 9556 at commit [`10a86fd`](https://github.com/apache/spark/commit/10a86fd67d35d0c6486e31700080d73ea2493d08).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44462628
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -120,14 +133,26 @@ trait CheckAnalysis {
                   case e => e.children.foreach(checkValidAggregateExpression)
                 }
     
    -            def checkValidGroupingExprs(expr: Expression): Unit = expr.dataType match {
    -              case BinaryType =>
    -                failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " +
    -                  "in grouping expression")
    -              case m: MapType =>
    -                failAnalysis(s"map type expression ${expr.prettyString} cannot be used " +
    -                  "in grouping expression")
    -              case _ => // OK
    +            def checkValidGroupingExprs(expr: Expression): Unit = {
    +              expr.dataType match {
    +                case BinaryType =>
    +                  failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " +
    +                    "in grouping expression")
    +                case a: ArrayType =>
    +                  failAnalysis(s"array type expression ${expr.prettyString} cannot be used " +
    --- End diff --
    
    Unfortunately, it is the case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44463726
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -120,14 +133,26 @@ trait CheckAnalysis {
                   case e => e.children.foreach(checkValidAggregateExpression)
                 }
     
    -            def checkValidGroupingExprs(expr: Expression): Unit = expr.dataType match {
    -              case BinaryType =>
    -                failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " +
    -                  "in grouping expression")
    -              case m: MapType =>
    -                failAnalysis(s"map type expression ${expr.prettyString} cannot be used " +
    -                  "in grouping expression")
    -              case _ => // OK
    +            def checkValidGroupingExprs(expr: Expression): Unit = {
    +              expr.dataType match {
    +                case BinaryType =>
    +                  failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " +
    +                    "in grouping expression")
    +                case a: ArrayType =>
    +                  failAnalysis(s"array type expression ${expr.prettyString} cannot be used " +
    +                    "in grouping expression")
    +                case m: MapType =>
    +                  failAnalysis(s"map type expression ${expr.prettyString} cannot be used " +
    +                    "in grouping expression")
    +                case _ => // OK
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154930969
  
    **[Test build #45342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45342/consoleFull)** for PR 9556 at commit [`cb3f4d3`](https://github.com/apache/spark/commit/cb3f4d3f6486b0aa1b668ee901cde353e829b7f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155264239
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44441047
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -120,14 +133,26 @@ trait CheckAnalysis {
                   case e => e.children.foreach(checkValidAggregateExpression)
                 }
     
    -            def checkValidGroupingExprs(expr: Expression): Unit = expr.dataType match {
    -              case BinaryType =>
    -                failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " +
    -                  "in grouping expression")
    -              case m: MapType =>
    -                failAnalysis(s"map type expression ${expr.prettyString} cannot be used " +
    -                  "in grouping expression")
    -              case _ => // OK
    +            def checkValidGroupingExprs(expr: Expression): Unit = {
    +              expr.dataType match {
    +                case BinaryType =>
    +                  failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " +
    +                    "in grouping expression")
    +                case a: ArrayType =>
    +                  failAnalysis(s"array type expression ${expr.prettyString} cannot be used " +
    --- End diff --
    
    Is this a regression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155238877
  
    this is going to conflict at least logically with #9499 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44454907
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -525,21 +526,14 @@ class Analyzer(
               case u @ UnresolvedFunction(name, children, isDistinct) =>
                 withPosition(u) {
                   registry.lookupFunction(name, children) match {
    -                // We get an aggregate function built based on AggregateFunction2 interface.
    -                // So, we wrap it in AggregateExpression2.
    -                case agg2: AggregateFunction2 => AggregateExpression2(agg2, Complete, isDistinct)
    -                // Currently, our old aggregate function interface supports SUM(DISTINCT ...)
    -                // and COUTN(DISTINCT ...).
    -                case sumDistinct: SumDistinct => sumDistinct
    -                case countDistinct: CountDistinct => countDistinct
    -                // DISTINCT is not meaningful with Max and Min.
    -                case max: Max if isDistinct => max
    -                case min: Min if isDistinct => min
    -                // For other aggregate functions, DISTINCT keyword is not supported for now.
    -                // Once we converted to the new code path, we will allow using DISTINCT keyword.
    -                case other: AggregateExpression1 if isDistinct =>
    -                  failAnalysis(s"$name does not support DISTINCT keyword.")
    -                // If it does not have DISTINCT keyword, we will return it as is.
    +                // DISTINCT is not meaningful for a Max or a Min.
    +                case max: Max if isDistinct =>
    +                  AggregateExpression(max, Complete, isDistinct = false)
    +                case min: Min if isDistinct =>
    +                  AggregateExpression(min, Complete, isDistinct = false)
    +                // We get an aggregate function, we need to wrap it in an AggregateExpression.
    +                case agg2: AggregateFunction => AggregateExpression(agg2, Complete, isDistinct)
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154930384
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155264172
  
    **[Test build #2027 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2027/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155232599
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155348765
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45492/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155297593
  
    **[Test build #45492 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45492/consoleFull)** for PR 9556 at commit [`16826e6`](https://github.com/apache/spark/commit/16826e6d5cb9636b6a72125fcccfc4273227c45b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154922605
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154929171
  
    YAY.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44351922
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala ---
    @@ -141,6 +146,12 @@ sealed abstract class AggregateFunction2 extends Expression with ImplicitCastInp
     
       override protected def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String =
         throw new UnsupportedOperationException(s"Cannot evaluate expression: $this")
    +
    +  def toAggregateExpression(): AggregateExpression = toAggregateExpression(isDistinct = false)
    --- End diff --
    
    Can you add some scaladoc here about why we need to do this wrapping?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154986197
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155260995
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155120739
  
    I will `MultipleDistinctRewriter` to the package of `org.apache.spark.sql.catalyst.analysis`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155226238
  
    Yeah. I will change the prettyString of {{AggregateExpression}}. Also, since I need to update anyway, I am including my change that introduces the flag used by MultipleDistinctRewriter. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155257980
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45442/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155179635
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155129634
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44374094
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"),
     #' @export
     setMethod("approxCountDistinct",
               signature(x = "Column"),
    -          function(x, rsd = 0.95) {
    +          function(x, rsd = 0.05) {
    --- End diff --
    
    Good catch, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155224746
  
    LGTM, pending a successful test build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155265613
  
    **[Test build #45476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45476/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155141098
  
    ok. I moved the rewriter and I added docs to `DeclarativeAggregate` regarding why we need to use `lazy val`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155286504
  
    **[Test build #2027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2027/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155263285
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45471/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44352491
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
    @@ -448,15 +448,17 @@ private[spark] object SQLConf {
         defaultValue = Some(true),
         isPublic = false)
     
    -  val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2",
    -    defaultValue = Some(true), doc = "<TODO>")
    -
       val RUN_SQL_ON_FILES = booleanConf("spark.sql.runSQLOnFiles",
         defaultValue = Some(true),
         isPublic = false,
         doc = "When true, we could use `datasource`.`path` as table in SQL query"
       )
     
    +  val SPECIALIZE_SINGLE_DISTINCT_AGG_PLANNING =
    --- End diff --
    
    For posterity maybe comment on what this is and when we think we will remove it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44355675
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
    @@ -272,7 +273,7 @@ object SqlParser extends AbstractSparkSQLParser with DataTypeParser {
       protected lazy val function: Parser[Expression] =
         ( ident <~ ("(" ~ "*" ~ ")") ^^ { case udfName =>
           if (lexical.normalizeKeyword(udfName) == "count") {
    -        Count(Literal(1))
    +        AggregateExpression(Count(Literal(1)), mode = Complete, isDistinct = false)
    --- End diff --
    
    Seems we hard code this when we deal with count(*). We can change it in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44455980
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -108,7 +109,19 @@ trait CheckAnalysis {
     
               case Aggregate(groupingExprs, aggregateExprs, child) =>
                 def checkValidAggregateExpression(expr: Expression): Unit = expr match {
    -              case _: AggregateExpression => // OK
    +              case aggExpr: AggregateExpression =>
    +                // TODO: Is it possible that the child of a agg function is another
    --- End diff --
    
    seems our error message is not good. I will update this. We should not users to have nested agg functions like `sum(avg(...))`. I will add an error message to let users to use sub-queries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155223300
  
    **[Test build #45407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45407/consoleFull)** for PR 9556 at commit [`bad4184`](https://github.com/apache/spark/commit/bad418404e7ba91402279f09a06c21e1930e2b13).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155295488
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44351651
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala ---
    @@ -55,6 +56,22 @@ case class HyperLogLogPlusPlus(
       extends ImperativeAggregate {
       import HyperLogLogPlusPlus._
     
    +  def this(child: Expression) = {
    +    this(child, 0.05, 0, 0)
    +  }
    +
    +  def this(child: Expression, relativeSD: Expression) = {
    +    this(
    +      child,
    +      relativeSD match {
    +        case Literal(d: Double, DoubleType) => d
    +        case _ =>
    +          throw new AnalysisException("The second argument should be a double literal.")
    +      },
    +      0,
    --- End diff --
    
    consider named args here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155122470
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44452030
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala ---
    @@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends DeclarativeAggregate {
     
       override def dataType: DataType = resultType
     
    -  // Expected input data type.
    -  // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the
    -  // new version at planning time (after analysis phase). For now, NullType is added at here
    -  // to make it resolved when we have cases like `select stddev(null)`.
    -  // We can use our analyzer to cast NullType to the default data type of the NumericType once
    -  // we remove the old aggregate functions. Then, we will not need NullType at here.
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType))
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType))
    --- End diff --
    
    You meant `TypeCollection`? We can remove the `TypeCollection` since we only have a single abstract data type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44464823
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala ---
    @@ -55,13 +57,10 @@ abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate w
     
       override def dataType: DataType = DoubleType
     
    -  // Expected input data type.
    -  // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the
    -  // new version at planning time (after analysis phase). For now, NullType is added at here
    -  // to make it resolved when we have cases like `select avg(null)`.
    -  // We can use our analyzer to cast NullType to the default data type of the NumericType once
    -  // we remove the old aggregate functions. Then, we will not need NullType at here.
    -  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType))
    +  override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType))
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44464801
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala ---
    @@ -29,16 +31,13 @@ case class Sum(child: Expression) extends DeclarativeAggregate {
       // Return data type.
       override def dataType: DataType = resultType
     
    -  // Expected input data type.
    -  // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the
    -  // new version at planning time (after analysis phase). For now, NullType is added at here
    -  // to make it resolved when we have cases like `select sum(null)`.
    -  // We can use our analyzer to cast NullType to the default data type of the NumericType once
    -  // we remove the old aggregate functions. Then, we will not need NullType at here.
       override def inputTypes: Seq[AbstractDataType] =
         Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType))
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155282013
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45474/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44361162
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala ---
    @@ -141,6 +146,12 @@ sealed abstract class AggregateFunction2 extends Expression with ImplicitCastInp
     
       override protected def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String =
         throw new UnsupportedOperationException(s"Cannot evaluate expression: $this")
    +
    +  def toAggregateExpression(): AggregateExpression = toAggregateExpression(isDistinct = false)
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155054384
  
    Made a quick pass. I think the PR is in good shape. I do have a few aditional questions/remarks:
    * Maybe should add some documentation to ```DeclarativeAggregate``` to explain why developers have to use lazy vals for all expressions involving the child expression(s).
    * ```MultipleDistinctRewriter``` is in a 'funny' file/location. Shouldn't we move it to the analyzer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155257913
  
    **[Test build #45442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45442/consoleFull)** for PR 9556 at commit [`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44268135
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregate.scala ---
    @@ -27,15 +27,15 @@ import org.apache.spark.sql.execution.{SparkPlan, UnaryNode}
     import org.apache.spark.sql.execution.metric.SQLMetrics
     
     case class SortBasedAggregate(
    -    requiredChildDistributionExpressions: Option[Seq[Expression]],
    -    groupingExpressions: Seq[NamedExpression],
    -    nonCompleteAggregateExpressions: Seq[AggregateExpression2],
    -    nonCompleteAggregateAttributes: Seq[Attribute],
    -    completeAggregateExpressions: Seq[AggregateExpression2],
    -    completeAggregateAttributes: Seq[Attribute],
    -    initialInputBufferOffset: Int,
    -    resultExpressions: Seq[NamedExpression],
    -    child: SparkPlan)
    +                               requiredChildDistributionExpressions: Option[Seq[Expression]],
    --- End diff --
    
    Nit... Style... IntelliJ IDEA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155225000
  
    PySpark results are correct, just not in the correct form :(...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155014293
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155263519
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44327164
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         }
       }
     
    -  object HashAggregation extends Strategy {
    -    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      // Aggregations that can be performed in two phases, before and after the shuffle.
    -      case PartialAggregation(
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          groupingExpressions,
    -          partialComputation,
    -          child) if !canBeConvertedToNewAggregation(plan) =>
    -        execution.Aggregate(
    -          partial = false,
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          execution.Aggregate(
    -            partial = true,
    -            groupingExpressions,
    -            partialComputation,
    -            planLater(child))) :: Nil
    -
    -      case _ => Nil
    -    }
    -
    -    def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match {
    -      case a: logical.Aggregate =>
    -        if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) {
    -          a.newAggregation.isDefined
    -        } else {
    -          Utils.checkInvalidAggregateFunction2(a)
    -          false
    -        }
    -      case _ => false
    -    }
    -
    -    def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
    -      exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
    -  }
    -
       /**
        * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
        */
       object Aggregation extends Strategy {
         def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
    -          sqlContext.conf.codegenEnabled =>
    -        val converted = p.newAggregation
    -        converted match {
    -          case None => Nil // Cannot convert to new aggregation code path.
    -          case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) =>
    -            // A single aggregate expression might appear multiple times in resultExpressions.
    -            // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -            // build a set of the distinct aggregate expressions and build a function which can
    -            // be used to re-write expressions so that they reference the single copy of the
    -            // aggregate function which actually gets computed.
    -            val aggregateExpressions = resultExpressions.flatMap { expr =>
    -              expr.collect {
    -                case agg: AggregateExpression2 => agg
    -              }
    -            }.distinct
    -            // For those distinct aggregate expressions, we create a map from the
    -            // aggregate function to the corresponding attribute of the function.
    -            val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    -              val aggregateFunction = agg.aggregateFunction
    -              val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    -              (aggregateFunction, agg.isDistinct) -> attribute
    -            }.toMap
    -
    -            val (functionsWithDistinct, functionsWithoutDistinct) =
    -              aggregateExpressions.partition(_.isDistinct)
    -            if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    -              // This is a sanity check. We should not reach here when we have multiple distinct
    -              // column sets (aggregate.NewAggregation will not match).
    -              sys.error(
    -                "Multiple distinct column sets are not supported by the new aggregation" +
    -                  "code path.")
    -            }
    +      case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
    +        // A single aggregate expression might appear multiple times in resultExpressions.
    +        // In order to avoid evaluating an individual aggregate function multiple times, we'll
    +        // build a set of the distinct aggregate expressions and build a function which can
    +        // be used to re-write expressions so that they reference the single copy of the
    +        // aggregate function which actually gets computed.
    +        val aggregateExpressions = resultExpressions.flatMap { expr =>
    +          expr.collect {
    +            case agg: AggregateExpression => agg
    +          }
    +        }.distinct
    +        // For those distinct aggregate expressions, we create a map from the
    +        // aggregate function to the corresponding attribute of the function.
    +        val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    +          val aggregateFunction = agg.aggregateFunction
    +          val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    +          (aggregateFunction, agg.isDistinct) -> attribute
    +        }.toMap
    +
    +        val (functionsWithDistinct, functionsWithoutDistinct) =
    +          aggregateExpressions.partition(_.isDistinct)
    +        if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    +          // This is a sanity check. We should not reach here when we have multiple distinct
    +          // column sets (aggregate.NewAggregation will not match).
    +          sys.error(
    +            "Multiple distinct column sets are not supported by the new aggregation" +
    +              "code path.")
    +        }
     
    -            val namedGroupingExpressions = groupingExpressions.map {
    -              case ne: NamedExpression => ne -> ne
    -              // If the expression is not a NamedExpressions, we add an alias.
    -              // So, when we generate the result of the operator, the Aggregate Operator
    -              // can directly get the Seq of attributes representing the grouping expressions.
    -              case other =>
    -                val withAlias = Alias(other, other.toString)()
    -                other -> withAlias
    -            }
    -            val groupExpressionMap = namedGroupingExpressions.toMap
    -
    -            // The original `resultExpressions` are a set of expressions which may reference
    -            // aggregate expressions, grouping column values, and constants. When aggregate operator
    -            // emits output rows, we will use `resultExpressions` to generate an output projection
    -            // which takes the grouping columns and final aggregate result buffer as input.
    -            // Thus, we must re-write the result expressions so that their attributes match up with
    -            // the attributes of the final result projection's input row:
    -            val rewrittenResultExpressions = resultExpressions.map { expr =>
    -              expr.transformDown {
    -                case AggregateExpression2(aggregateFunction, _, isDistinct) =>
    -                  // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    -                  // so replace each aggregate expression by its corresponding attribute in the set:
    -                  aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    -                case expression =>
    -                  // Since we're using `namedGroupingAttributes` to extract the grouping key
    -                  // columns, we need to replace grouping key expressions with their corresponding
    -                  // attributes. We do not rely on the equality check at here since attributes may
    -                  // differ cosmetically. Instead, we use semanticEquals.
    -                  groupExpressionMap.collectFirst {
    -                    case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    -                  }.getOrElse(expression)
    -              }.asInstanceOf[NamedExpression]
    +        val namedGroupingExpressions = groupingExpressions.map {
    +          case ne: NamedExpression => ne -> ne
    +          // If the expression is not a NamedExpressions, we add an alias.
    +          // So, when we generate the result of the operator, the Aggregate Operator
    +          // can directly get the Seq of attributes representing the grouping expressions.
    +          case other =>
    +            val withAlias = Alias(other, other.toString)()
    +            other -> withAlias
    +        }
    +        val groupExpressionMap = namedGroupingExpressions.toMap
    +
    +        // The original `resultExpressions` are a set of expressions which may reference
    +        // aggregate expressions, grouping column values, and constants. When aggregate operator
    +        // emits output rows, we will use `resultExpressions` to generate an output projection
    +        // which takes the grouping columns and final aggregate result buffer as input.
    +        // Thus, we must re-write the result expressions so that their attributes match up with
    +        // the attributes of the final result projection's input row:
    +        val rewrittenResultExpressions = resultExpressions.map { expr =>
    +          expr.transformDown {
    +            case AggregateExpression(aggregateFunction, _, isDistinct) =>
    +              // The final aggregation buffer's attributes will be `finalAggregationAttributes`,
    +              // so replace each aggregate expression by its corresponding attribute in the set:
    +              aggregateFunctionToAttribute(aggregateFunction, isDistinct)
    +            case expression =>
    +              // Since we're using `namedGroupingAttributes` to extract the grouping key
    +              // columns, we need to replace grouping key expressions with their corresponding
    +              // attributes. We do not rely on the equality check at here since attributes may
    +              // differ cosmetically. Instead, we use semanticEquals.
    +              groupExpressionMap.collectFirst {
    +                case (expr, ne) if expr semanticEquals expression => ne.toAttribute
    +              }.getOrElse(expression)
    +          }.asInstanceOf[NamedExpression]
    +        }
    +
    +        val aggregateOperator =
    +          if (aggregateExpressions.map(_.aggregateFunction).exists(!_.supportsPartial)) {
    +            if (functionsWithDistinct.nonEmpty) {
    +              sys.error("Distinct columns cannot exist in Aggregate operator containing " +
    +                "aggregate functions which don't support partial aggregation.")
    +            } else {
    +              aggregate.Utils.planAggregateWithoutPartial(
    +                namedGroupingExpressions.map(_._2),
    +                aggregateExpressions,
    +                aggregateFunctionToAttribute,
    +                rewrittenResultExpressions,
    +                planLater(child))
                 }
    +          } else if (functionsWithDistinct.isEmpty) {
    +            aggregate.Utils.planAggregateWithoutDistinct(
    +              namedGroupingExpressions.map(_._2),
    +              aggregateExpressions,
    +              aggregateFunctionToAttribute,
    +              rewrittenResultExpressions,
    +              planLater(child))
    +          } else {
    +            aggregate.Utils.planAggregateWithOneDistinct(
    --- End diff --
    
    @hvanhovell How about we create an internal flag to control when we trigger the rewriter? When this flag is true, we always use the rewriter to handle all cases having any distinct aggregations. If this flag is false, we use our current code path (that special planning rule) to deal with a query with a single distinct queries. Then, we can test the performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154986250
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44352031
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -53,6 +54,12 @@ object functions {
     
       private def withExpr(expr: Expression): Column = Column(expr)
     
    +  private def withAggregateFunction(
    +                                     func: AggregateFunction,
    --- End diff --
    
    indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44351704
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala ---
    @@ -24,6 +24,8 @@ case class Kurtosis(child: Expression,
         inputAggBufferOffset: Int = 0)
       extends CentralMomentAgg(child) {
     
    +  def this(child: Expression) = this(child, 0, 0)
    --- End diff --
    
    Maybe drop the default args?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44464472
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
    @@ -448,15 +448,24 @@ private[spark] object SQLConf {
         defaultValue = Some(true),
         isPublic = false)
     
    -  val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2",
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155290682
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155129602
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44351730
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala ---
    @@ -24,6 +24,8 @@ case class Kurtosis(child: Expression,
         inputAggBufferOffset: Int = 0)
       extends CentralMomentAgg(child) {
     
    +  def this(child: Expression) = this(child, 0, 0)
    --- End diff --
    
    and comment why we can't use them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44366543
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"),
     #' @export
     setMethod("approxCountDistinct",
               signature(x = "Column"),
    -          function(x, rsd = 0.95) {
    +          function(x, rsd = 0.05) {
    --- End diff --
    
    Yeah looks like it should be 0.05 -- cc @davies 
    This seems to have been added back when SparkR was a separate code-base in https://github.com/davies/SparkR-pkg/commit/d7b17a428c27aac28d89e1c85f1ba7d9d4b021d2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155258202
  
    **[Test build #2024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2024/consoleFull)** for PR 9556 at commit [`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155263092
  
    **[Test build #45474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45474/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155232902
  
    **[Test build #45442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45442/consoleFull)** for PR 9556 at commit [`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44440725
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -525,21 +526,14 @@ class Analyzer(
               case u @ UnresolvedFunction(name, children, isDistinct) =>
                 withPosition(u) {
                   registry.lookupFunction(name, children) match {
    -                // We get an aggregate function built based on AggregateFunction2 interface.
    -                // So, we wrap it in AggregateExpression2.
    -                case agg2: AggregateFunction2 => AggregateExpression2(agg2, Complete, isDistinct)
    -                // Currently, our old aggregate function interface supports SUM(DISTINCT ...)
    -                // and COUTN(DISTINCT ...).
    -                case sumDistinct: SumDistinct => sumDistinct
    -                case countDistinct: CountDistinct => countDistinct
    -                // DISTINCT is not meaningful with Max and Min.
    -                case max: Max if isDistinct => max
    -                case min: Min if isDistinct => min
    -                // For other aggregate functions, DISTINCT keyword is not supported for now.
    -                // Once we converted to the new code path, we will allow using DISTINCT keyword.
    -                case other: AggregateExpression1 if isDistinct =>
    -                  failAnalysis(s"$name does not support DISTINCT keyword.")
    -                // If it does not have DISTINCT keyword, we will return it as is.
    +                // DISTINCT is not meaningful for a Max or a Min.
    +                case max: Max if isDistinct =>
    +                  AggregateExpression(max, Complete, isDistinct = false)
    +                case min: Min if isDistinct =>
    +                  AggregateExpression(min, Complete, isDistinct = false)
    +                // We get an aggregate function, we need to wrap it in an AggregateExpression.
    +                case agg2: AggregateFunction => AggregateExpression(agg2, Complete, isDistinct)
    --- End diff --
    
    agg2 -> agg


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-154922620
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44351757
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Skewness.scala ---
    @@ -24,6 +24,8 @@ case class Skewness(child: Expression,
         inputAggBufferOffset: Int = 0)
       extends CentralMomentAgg(child) {
     
    +  def this(child: Expression) = this(child, 0, 0)
    --- End diff --
    
    same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155348607
  
    **[Test build #45492 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45492/consoleFull)** for PR 9556 at commit [`16826e6`](https://github.com/apache/spark/commit/16826e6d5cb9636b6a72125fcccfc4273227c45b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44267784
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         }
       }
     
    -  object HashAggregation extends Strategy {
    -    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      // Aggregations that can be performed in two phases, before and after the shuffle.
    -      case PartialAggregation(
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          groupingExpressions,
    -          partialComputation,
    -          child) if !canBeConvertedToNewAggregation(plan) =>
    -        execution.Aggregate(
    -          partial = false,
    -          namedGroupingAttributes,
    -          rewrittenAggregateExpressions,
    -          execution.Aggregate(
    -            partial = true,
    -            groupingExpressions,
    -            partialComputation,
    -            planLater(child))) :: Nil
    -
    -      case _ => Nil
    -    }
    -
    -    def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match {
    -      case a: logical.Aggregate =>
    -        if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) {
    -          a.newAggregation.isDefined
    -        } else {
    -          Utils.checkInvalidAggregateFunction2(a)
    -          false
    -        }
    -      case _ => false
    -    }
    -
    -    def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
    -      exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
    -  }
    -
       /**
        * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
        */
       object Aggregation extends Strategy {
         def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    -      case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
    -          sqlContext.conf.codegenEnabled =>
    -        val converted = p.newAggregation
    -        converted match {
    -          case None => Nil // Cannot convert to new aggregation code path.
    -          case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) =>
    -            // A single aggregate expression might appear multiple times in resultExpressions.
    -            // In order to avoid evaluating an individual aggregate function multiple times, we'll
    -            // build a set of the distinct aggregate expressions and build a function which can
    -            // be used to re-write expressions so that they reference the single copy of the
    -            // aggregate function which actually gets computed.
    -            val aggregateExpressions = resultExpressions.flatMap { expr =>
    -              expr.collect {
    -                case agg: AggregateExpression2 => agg
    -              }
    -            }.distinct
    -            // For those distinct aggregate expressions, we create a map from the
    -            // aggregate function to the corresponding attribute of the function.
    -            val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    -              val aggregateFunction = agg.aggregateFunction
    -              val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    -              (aggregateFunction, agg.isDistinct) -> attribute
    -            }.toMap
    -
    -            val (functionsWithDistinct, functionsWithoutDistinct) =
    -              aggregateExpressions.partition(_.isDistinct)
    -            if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    -              // This is a sanity check. We should not reach here when we have multiple distinct
    -              // column sets (aggregate.NewAggregation will not match).
    -              sys.error(
    -                "Multiple distinct column sets are not supported by the new aggregation" +
    -                  "code path.")
    -            }
    +      case logical.Aggregate(groupingExpressions, resultExpressions, child) =>
    +        // A single aggregate expression might appear multiple times in resultExpressions.
    +        // In order to avoid evaluating an individual aggregate function multiple times, we'll
    +        // build a set of the distinct aggregate expressions and build a function which can
    +        // be used to re-write expressions so that they reference the single copy of the
    +        // aggregate function which actually gets computed.
    +        val aggregateExpressions = resultExpressions.flatMap { expr =>
    +          expr.collect {
    +            case agg: AggregateExpression => agg
    +          }
    +        }.distinct
    +        // For those distinct aggregate expressions, we create a map from the
    +        // aggregate function to the corresponding attribute of the function.
    +        val aggregateFunctionToAttribute = aggregateExpressions.map { agg =>
    +          val aggregateFunction = agg.aggregateFunction
    +          val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute
    +          (aggregateFunction, agg.isDistinct) -> attribute
    +        }.toMap
    +
    +        val (functionsWithDistinct, functionsWithoutDistinct) =
    +          aggregateExpressions.partition(_.isDistinct)
    +        if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
    +          // This is a sanity check. We should not reach here when we have multiple distinct
    +          // column sets (aggregate.NewAggregation will not match).
    +          sys.error(
    --- End diff --
    
    The comment and the error message mention a **new** aggregation path. We should update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155128961
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44352445
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala ---
    @@ -88,30 +89,33 @@ class GroupedData protected[sql](
             namedExpr
           }
         }
    -    toDF(columnExprs.map(f))
    +    toDF(columnExprs.map(expr => f(expr).toAggregateExpression()))
       }
     
       private[this] def strToExpr(expr: String): (Expression => Expression) = {
    -    expr.toLowerCase match {
    -      case "avg" | "average" | "mean" => Average
    -      case "max" => Max
    -      case "min" => Min
    -      case "stddev" | "std" => StddevSamp
    -      case "stddev_pop" => StddevPop
    -      case "stddev_samp" => StddevSamp
    -      case "variance" => VarianceSamp
    -      case "var_pop" => VariancePop
    -      case "var_samp" => VarianceSamp
    -      case "sum" => Sum
    -      case "skewness" => Skewness
    -      case "kurtosis" => Kurtosis
    -      case "count" | "size" =>
    -        // Turn count(*) into count(1)
    -        (inputExpr: Expression) => inputExpr match {
    -          case s: Star => Count(Literal(1))
    -          case _ => Count(inputExpr)
    -        }
    +    val exprToFunc: (Expression => AggregateFunction) = {
    +      (inputExpr: Expression) => expr.toLowerCase match {
    +        case "avg" | "average" | "mean" => Average(inputExpr)
    +        case "max" => Max(inputExpr)
    +        case "min" => Min(inputExpr)
    +        case "stddev" | "std" => StddevSamp(inputExpr)
    +        case "stddev_pop" => StddevPop(inputExpr)
    +        case "stddev_samp" => StddevSamp(inputExpr)
    +        case "variance" => VarianceSamp(inputExpr, 0, 0)
    +        case "var_pop" => VariancePop(inputExpr, 0, 0)
    +        case "var_samp" => VarianceSamp(inputExpr, 0, 0)
    +        case "sum" => Sum(inputExpr)
    +        case "skewness" => Skewness(inputExpr, 0, 0)
    +        case "kurtosis" => Kurtosis(inputExpr, 0, 0)
    --- End diff --
    
    This is kinda of a nit, but I think it might be better to rely on the defaults provided by the extra constructor instead of hard coding these here and elsewhere.
    
    Also, is it possible for a user to corrupt query results by specifying their own offsets due to the way we resolve functions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155228594
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9556#discussion_r44360853
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Skewness.scala ---
    @@ -24,6 +24,8 @@ case class Skewness(child: Expression,
         inputAggBufferOffset: Int = 0)
       extends CentralMomentAgg(child) {
     
    +  def this(child: Expression) = this(child, 0, 0)
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9556#issuecomment-155179711
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org