You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2016/01/25 09:14:27 UTC

[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/10896

    [SPARK-12978][SQL] Skip unnecessary final group-by when input data already clustered with group-by keys

    This ticket targets the optimization to skip an unnecessary group-by operation below;
    
    Without opt.:
    ```
    == Physical Plan ==
    TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Final,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178])
    +- TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Partial,isDistinct=false),(avg(col2#161),mode=Partial,isDistinct=false)], output=[col0#159,sum#200,sum#201,count#202L])
       +- TungstenExchange hashpartitioning(col0#159,200), None
          +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None
    ```
    
    With opt.:
    ```
    == Physical Plan ==
    TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Complete,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178])
    +- TungstenExchange hashpartitioning(col0#159,200), None
      +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SkipGroupbySpike

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10896.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10896
    
----
commit 5ab19c1a97593b75512ba7b910849bef87f9810e
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-01-25T05:58:45Z

    Skip unnecessary final group-by when input data already clustered

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-174439637
  
    **[Test build #49985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49985/consoleFull)** for PR 10896 at commit [`5ab19c1`](https://github.com/apache/spark/commit/5ab19c1a97593b75512ba7b910849bef87f9810e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-180262936
  
    **[Test build #50812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50812/consoleFull)** for PR 10896 at commit [`140da25`](https://github.com/apache/spark/commit/140da256e9f7dd6ba705e3ef5c6c61410b3cda63).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75670619
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    +    assert(isAggregate(operator))
    +    def supportPartial(exprs: Seq[AggregateExpression]) =
    +      exprs.map(_.aggregateFunction).forall(_.supportsPartial)
    +    operator match {
    +      case agg @ HashAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +      case agg @ SortAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +    }
    +  }
    +
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
    +  }
    +
    +  private[execution] def createPartialAggregate(operator: SparkPlan)
    --- End diff --
    
    How about this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    After this PR, we create the partial aggregate operator in `EnsureRequirements`, which makes the aggregation code harder to understand and also mess up `EnsureRequirements`.
    
    I have a simpler idea: 
    
    1. Correctly propagate `outputPartitioning` in `AggregateExec`, then we can avoid adding an exchange between a partial aggregate and a final aggregate.
    2. Add a new rule which is run after `EnsureRequirements`. In this rule, we can combine adjacent partial aggregate and final aggregate into one.
    
    cc @maropu @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60809/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60766/consoleFull)** for PR 10896 at commit [`adad55d`](https://github.com/apache/spark/commit/adad55dbaf4adab313f970c82c2b7a45298c36eb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75695126
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -19,34 +19,90 @@ package org.apache.spark.sql.execution.aggregate
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.expressions.aggregate._
    +import org.apache.spark.sql.catalyst.plans.physical.Distribution
    +import org.apache.spark.sql.execution.aggregate.{Aggregate => AggregateExec}
     import org.apache.spark.sql.execution.SparkPlan
     import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec}
     
     /**
    + * A pattern that finds aggregate operators to support partial aggregations.
    + */
    +object ExtractPartialAggregate {
    --- End diff --
    
    okay


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @hvanhovell okay, ready to review. After the v2.0 release, plz review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60121/consoleFull)** for PR 10896 at commit [`2007caa`](https://github.com/apache/spark/commit/2007caaa712439926d8a0d30df1bffe472bf50d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75604121
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    +    assert(isAggregate(operator))
    +    def supportPartial(exprs: Seq[AggregateExpression]) =
    +      exprs.map(_.aggregateFunction).forall(_.supportsPartial)
    +    operator match {
    +      case agg @ HashAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +      case agg @ SortAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +    }
    +  }
    +
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
    +  }
    +
    +  private[execution] def createPartialAggregate(operator: SparkPlan)
    --- End diff --
    
    A lot of duplication here. It would be nice if we have an parent for the `*AggregateExec` nodes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64209/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64218/consoleFull)** for PR 10896 at commit [`86068d0`](https://github.com/apache/spark/commit/86068d0f9db2cd1be91e5ec0c56d6c7c074438c8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64324/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60697/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75603826
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -37,36 +37,58 @@ class PlannerSuite extends SharedSQLContext {
     
       setupTestData()
     
    -  private def testPartialAggregationPlan(query: LogicalPlan): Unit = {
    +  private def testPartialAggregationPlan(query: LogicalPlan): Seq[SparkPlan] = {
         val planner = spark.sessionState.planner
         import planner._
    -    val plannedOption = Aggregation(query).headOption
    -    val planned =
    -      plannedOption.getOrElse(
    -        fail(s"Could query play aggregation query $query. Is it an aggregation query?"))
    -    val aggregations = planned.collect { case n if n.nodeName contains "Aggregate" => n }
    -
    -    // For the new aggregation code path, there will be four aggregate operator for
    -    // distinct aggregations.
    -    assert(
    -      aggregations.size == 2 || aggregations.size == 4,
    -      s"The plan of query $query does not have partial aggregations.")
    +    val ensureRequirements = EnsureRequirements(spark.sessionState.conf)
    +    val planned = Aggregation(query).headOption.map(ensureRequirements(_))
    +      .getOrElse(fail(s"Could query play aggregation query $query. Is it an aggregation query?"))
    +    planned.collect { case n if n.nodeName contains "Aggregate" => n }
       }
     
       test("count is partially aggregated") {
         val query = testData.groupBy('value).agg(count('key)).queryExecution.analyzed
    -    testPartialAggregationPlan(query)
    +    assert(testPartialAggregationPlan(query).size == 2,
    +      s"The plan of query $query does not have partial aggregations.")
       }
     
       test("count distinct is partially aggregated") {
         val query = testData.groupBy('value).agg(countDistinct('key)).queryExecution.analyzed
         testPartialAggregationPlan(query)
    +    // For the new aggregation code path, there will be four aggregate operator for  distinct
    +    // aggregations.
    +    assert(testPartialAggregationPlan(query).size == 4,
    +      s"The plan of query $query does not have partial aggregations.")
       }
     
       test("mixed aggregates are partially aggregated") {
         val query =
           testData.groupBy('value).agg(count('value), countDistinct('key)).queryExecution.analyzed
    -    testPartialAggregationPlan(query)
    +    // For the new aggregation code path, there will be four aggregate operator for  distinct
    +    // aggregations.
    +    assert(testPartialAggregationPlan(query).size == 4,
    +      s"The plan of query $query does not have partial aggregations.")
    +  }
    +
    +  test("non-partial aggregation for distinct aggregates") {
    --- End diff --
    
    Also add a test for non-partial aggregation with a regular (non distinct) aggregate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-180242112
  
    **[Test build #50812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50812/consoleFull)** for PR 10896 at commit [`140da25`](https://github.com/apache/spark/commit/140da256e9f7dd6ba705e3ef5c6c61410b3cda63).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-184248142
  
    @yhuai The second approach's good to me though, I'm not exactly sure how to remove unnecessary final aggregation covered in this pr. IMO these kinds of partial aggregation optimization seem to be similar to `Filter` optmization such as push-downs and pruning. So, it'd be better to get together these kinds of optimization in a same file.
    Even in the above example, we should push down the partial aggregation below `TungstenExchange` generated by `DataFrame#repartition`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    +1 for @cloud-fan's proposal. Instead of creating a performant plan using tricky code, it's clearer to create a naive but correct physical plan first and then optimize it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60713/consoleFull)** for PR 10896 at commit [`f8ac8cb`](https://github.com/apache/spark/commit/f8ac8cbe9fe646d6efd0e609778314c85fcb0e49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75979651
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.expressions.aggregate._
    +import org.apache.spark.sql.catalyst.plans.physical.Distribution
     import org.apache.spark.sql.execution.SparkPlan
     import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec}
     
     /**
    + * A pattern that finds aggregate operators to support partial aggregations.
    + */
    +object PartialAggregate {
    +
    +  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
    +    case agg: AggregateExec
    +        if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>
    +      Some(agg.requiredChildDistribution.head)
    +    case _ =>
    +      None
    +  }
    +}
    +
    +/**
      * Utility functions used by the query planner to convert our plan to new aggregation code path.
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
       }
     
    -  private def createAggregate(
    +  /**
    +   * Builds new merge and map-side [[AggregateExec]]s from an input aggregate operator.
    +   * If an aggregation needs a shuffle for satisfying its own distribution and supports partial
    +   * aggregations, a map-side aggregation is appended before the shuffle in
    +   * [[org.apache.spark.sql.execution.exchange.EnsureRequirements]].
    +   */
    +  def createMapMergeAggregatePair(operator: SparkPlan): (SparkPlan, SparkPlan) = operator match {
    +    case agg: AggregateExec =>
    +      val mapSideAgg = createPartialAggregateExec(
    +        agg.groupingExpressions, agg.aggregateExpressions, agg.child)
    +      val mergeAgg = createAggregateExec(
    +        requiredChildDistributionExpressions = agg.requiredChildDistributionExpressions,
    +        groupingExpressions = agg.groupingExpressions.map(_.toAttribute),
    +        aggregateExpressions = updateMergeAggregateMode(agg.aggregateExpressions),
    +        aggregateAttributes = agg.aggregateAttributes,
    +        initialInputBufferOffset = agg.groupingExpressions.length,
    +        resultExpressions = agg.resultExpressions,
    +        child = agg.child
    --- End diff --
    
    In fact, the final plan is [MergeAgg]<-[Shuffle]<-[MapSideAgg]. So, this function just returns the two aggregations separately, and the plan is built in `EnsureRequirements`. Is this a bad idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60791/consoleFull)** for PR 10896 at commit [`cb47a54`](https://github.com/apache/spark/commit/cb47a54b25c3ee76e46dbd9ebb83e99814c93580).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75701953
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/Aggregate.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
    +import org.apache.spark.sql.catalyst.plans.physical._
    +import org.apache.spark.sql.execution.SparkPlan
    +
    +/**
    + * A base class for aggregate implementation.
    + */
    +trait Aggregate {
    --- End diff --
    
    Well I think a super class makes a bit more sense. A trait to me is a way to bolt on functionality. The `Aggregate` contains core functionality for both the Hash and Sort based version, and is the natural parent class of both.
    
    I do have to admit that this is more a personal preference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64218/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-174439667
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49985/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60120/consoleFull)** for PR 10896 at commit [`36553bc`](https://github.com/apache/spark/commit/36553bc2a465c3c8d7b764e565f88aba5385d759).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-214152193
  
    **[Test build #56883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56883/consoleFull)** for PR 10896 at commit [`9d77c90`](https://github.com/apache/spark/commit/9d77c90c05e21364b65e38fa786e5207d8dc9c94).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64324/consoleFull)** for PR 10896 at commit [`d5e0ed3`](https://github.com/apache/spark/commit/d5e0ed3d0efdc8047948e48cdc6fb1257cc381f0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60797/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-174509342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49988/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-179386508
  
    Okay, but that code doesn't actually produce an exchange right?  Since its captured by the cache?
    
    ```
    == Optimized Logical Plan ==
    Aggregate [col0#38918], [col0#38918,(sum(cast(col1#38919 as bigint)),mode=Complete,isDistinct=false) AS sum(col1)#38936L,(avg(cast(col2#38920 as bigint)),mode=Complete,isDistinct=false) AS avg(col2)#38937]
    +- InMemoryRelation [col0#38918,col1#38919,col2#38920], true, 10000, StorageLevel(true, true, false, true, 1), Exchange hashpartitioning(col0#38918,200), None, None
    
    == Physical Plan ==
    WholeStageCodegen
    :  +- TungstenAggregate(key=[col0#38918], functions=[(sum(cast(col1#38919 as bigint)),mode=Final,isDistinct=false),(avg(cast(col2#38920 as bigint)),mode=Final,isDistinct=false)], output=[col0#38918,sum(col1)#38936L,avg(col2)#38937])
    :     +- TungstenAggregate(key=[col0#38918], functions=[(sum(cast(col1#38919 as bigint)),mode=Partial,isDistinct=false),(avg(cast(col2#38920 as bigint)),mode=Partial,isDistinct=false)], output=[col0#38918,sum#38959L,sum#38960,count#38961L])
    :        +- INPUT
    +- InMemoryColumnarTableScan [col0#38918,col1#38919,col2#38920], InMemoryRelation [col0#38918,col1#38919,col2#38920], true, 10000, StorageLevel(true, true, false, true, 1), Exchange hashpartitioning(col0#38918,200), None, None
    ```
    
    Eitherway, I'll let @yhuai sign off on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r76168321
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.expressions.aggregate._
    +import org.apache.spark.sql.catalyst.plans.physical.Distribution
     import org.apache.spark.sql.execution.SparkPlan
     import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec}
     
     /**
    + * A pattern that finds aggregate operators to support partial aggregations.
    + */
    +object PartialAggregate {
    +
    +  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
    +    case agg: AggregateExec
    +        if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>
    +      Some(agg.requiredChildDistribution.head)
    +    case _ =>
    +      None
    +  }
    +}
    +
    +/**
      * Utility functions used by the query planner to convert our plan to new aggregation code path.
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
       }
     
    -  private def createAggregate(
    +  /**
    +   * Builds new merge and map-side [[AggregateExec]]s from an input aggregate operator.
    +   * If an aggregation needs a shuffle for satisfying its own distribution and supports partial
    +   * aggregations, a map-side aggregation is appended before the shuffle in
    +   * [[org.apache.spark.sql.execution.exchange.EnsureRequirements]].
    +   */
    +  def createMapMergeAggregatePair(operator: SparkPlan): (SparkPlan, SparkPlan) = operator match {
    +    case agg: AggregateExec =>
    +      val mapSideAgg = createPartialAggregateExec(
    +        agg.groupingExpressions, agg.aggregateExpressions, agg.child)
    +      val mergeAgg = createAggregateExec(
    +        requiredChildDistributionExpressions = agg.requiredChildDistributionExpressions,
    +        groupingExpressions = agg.groupingExpressions.map(_.toAttribute),
    +        aggregateExpressions = updateMergeAggregateMode(agg.aggregateExpressions),
    +        aggregateAttributes = agg.aggregateAttributes,
    +        initialInputBufferOffset = agg.groupingExpressions.length,
    +        resultExpressions = agg.resultExpressions,
    +        child = agg.child
    --- End diff --
    
    okay, I'll fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    A high-level question: After this PR, `EnsureRequirements` also need to know about aggregate, and handle partial aggregate logic. This makes the aggregation code harder to understand, instead of doing this, can we make `AggregateExec` produce corrected `outputPartitioning`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64079/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75707785
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    +    assert(isAggregate(operator))
    +    def supportPartial(exprs: Seq[AggregateExpression]) =
    +      exprs.map(_.aggregateFunction).forall(_.supportsPartial)
    +    operator match {
    +      case agg @ HashAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +      case agg @ SortAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +    }
    +  }
    +
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
    +  }
    +
    +  private[execution] def createPartialAggregate(operator: SparkPlan)
    --- End diff --
    
    Much better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    You could also argue the other way around, planning a partial aggregate is also a premature optimization, and that the planning of such an `Aggregate` could also be considered tricky code. BTW: the solution implemented in this PR was initially proposed by @yhuai.
    
    I do think things could be simplified even more, and that either pruning an unneeded partial aggregate or planning one in a new rule both have merit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    okay, thanks. I'll fix in hours.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @maropu Thank you for working on this. Sorry that I did not get time to look at it after you updated the pr. I looked at it today. I think this optimization deserves a feature flag since it determines if we can generate a valid physical plan. We can enable it by default. But, we will have the flexibility to disable it when there is an issue.  
    
    After looking at the code, I am not sure it is a good approach to put the logic of adding partial aggregate operators in EnsureRequirements. Originally, I thought we could have a individual rule to add partial aggregate operators and then either extract logic in EnsureRequirements as a utility function or run EnsureRequirements again. 
    
    Also, due to the complexity of the logic for planning aggregations, seems after the change it is hard to track the planner logic. 
    
    So, seems it will be good to try your your original proposal by adding a rule to remove unnecessary operators (like @cloud-fan implemented in #14876). In this way, it will also be very easy to add the feature flag and keep the optimization rule in a single place. Later, we can revisit this approach if we can clean up the planner logic for aggregation. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-180263124
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64199/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-178746071
  
    @yhuai would be better to review this, but neither of those plans look great to me.  Why are we not partial aggregating before a shuffle?  Seems like that will ship a lot of data around for no reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Could you update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @hvanhovell okay, fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75694390
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -19,34 +19,90 @@ package org.apache.spark.sql.execution.aggregate
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.expressions.aggregate._
    +import org.apache.spark.sql.catalyst.plans.physical.Distribution
    +import org.apache.spark.sql.execution.aggregate.{Aggregate => AggregateExec}
     import org.apache.spark.sql.execution.SparkPlan
     import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec}
     
     /**
    + * A pattern that finds aggregate operators to support partial aggregations.
    + */
    +object ExtractPartialAggregate {
    --- End diff --
    
    Just name this PartialAggregate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r51762630
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala ---
    @@ -86,20 +86,40 @@ object Utils {
           aggregateExpressions: Seq[AggregateExpression],
           aggregateFunctionToAttribute: Map[(AggregateFunction, Boolean), Attribute],
           resultExpressions: Seq[NamedExpression],
    +      skipUnnecessaryAggregate: Boolean,
    --- End diff --
    
    I think it would be clearer if you called the `partialAggregation`.  Its not unnecessary, its an optimization in most cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-182169168
  
    I probably will not be able to take a close look on this PR until later this month. I have a question regarding the approach of PR. Right now, we always plan partial aggregation operators first (in SparkStrategies) and then add Exchange operators (in EnsureRequirements). Another approach will be that we do not add partial aggregation operators in SparkStrategies. Then, after we figure out where we need exchange operators, we add partial aggregation operators. This approach probably needs more code changes. But, I feel it is a more cleaner approach.
    
    @maropu @marmbrus  what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75692369
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/Aggregate.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
    +import org.apache.spark.sql.catalyst.plans.physical._
    +import org.apache.spark.sql.execution.SparkPlan
    +
    +/**
    + * A base class for aggregate implementation.
    + */
    +trait Aggregate {
    --- End diff --
    
    Why a trait and not a superclass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60766/consoleFull)** for PR 10896 at commit [`adad55d`](https://github.com/apache/spark/commit/adad55dbaf4adab313f970c82c2b7a45298c36eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r65103410
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -257,10 +257,19 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
                     planLater(child))
                 }
               } else if (functionsWithDistinct.isEmpty) {
    +            // Check if the child operator satisfies the group-by distribution requirements
    +            val childPlan = planLater(child)
    +            val canPatialAggregate = if (groupingExpressions != Nil) {
    +              childPlan.outputPartitioning.satisfies(ClusteredDistribution(groupingExpressions))
    +            } else {
    +              false
    +            }
    +
                 aggregate.Utils.planAggregateWithoutDistinct(
                   groupingExpressions,
                   aggregateExpressions,
                   resultExpressions,
    +              canPatialAggregate,
                   planLater(child))
    --- End diff --
    
    Use `childPlan`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-174482019
  
    **[Test build #49988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49988/consoleFull)** for PR 10896 at commit [`1b7e3d8`](https://github.com/apache/spark/commit/1b7e3d8cc2e7d74d5c7e6b6564c03dc1f152bc0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @hvanhovell could you also give me comments on #13852?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60121/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60697/consoleFull)** for PR 10896 at commit [`3ae9153`](https://github.com/apache/spark/commit/3ae91530a6883aadfdbb08d77959ab414f822de0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-214152854
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56883/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64200/consoleFull)** for PR 10896 at commit [`8b64305`](https://github.com/apache/spark/commit/8b643055e47d6213847d0cf10adf83a3f59ece55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60800/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64218/consoleFull)** for PR 10896 at commit [`86068d0`](https://github.com/apache/spark/commit/86068d0f9db2cd1be91e5ec0c56d6c7c074438c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @maropu this is shaping up nicely. I left a few more small comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75603363
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala ---
    @@ -121,3 +121,6 @@ case class SortAggregateExec(
         }
       }
     }
    +
    +object SortAggregateExec
    --- End diff --
    
    ???


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64283/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64079/consoleFull)** for PR 10896 at commit [`91701eb`](https://github.com/apache/spark/commit/91701eb5171555ce73ba6b190b41cc17f05eee8c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @cloud-fan @liancheng yea, adding a new rule after `EnsureRequirements` sounds good to me. One question; creating a partial aggregation in the planner and removing it in the new rule seem to be kind of wasteful, so is it a bad idea to create a partial aggregation in the rule? Physical plans is always correct with/without partial aggregations and it is one of optimizations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @hvanhovell fixed. plz check again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60811/consoleFull)** for PR 10896 at commit [`572db4c`](https://github.com/apache/spark/commit/572db4ceef346d9ae01065b84d83777b81f2eb06).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-221528445
  
    @hvanhovell yeah, it is up-to-dated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-174435556
  
    **[Test build #49985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49985/consoleFull)** for PR 10896 at commit [`5ab19c1`](https://github.com/apache/spark/commit/5ab19c1a97593b75512ba7b910849bef87f9810e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75603936
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    --- End diff --
    
    We could just merge this with `isAggregate(...)`. You could also turn this into an Extractor, if that makes life easier in `EnsureRequirements`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75649753
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    +    assert(isAggregate(operator))
    +    def supportPartial(exprs: Seq[AggregateExpression]) =
    +      exprs.map(_.aggregateFunction).forall(_.supportsPartial)
    +    operator match {
    +      case agg @ HashAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    --- End diff --
    
    yea, if they have a common superclass, it is more simple.
    I'll fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r76044818
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.expressions.aggregate._
    +import org.apache.spark.sql.catalyst.plans.physical.Distribution
     import org.apache.spark.sql.execution.SparkPlan
     import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec}
     
     /**
    + * A pattern that finds aggregate operators to support partial aggregations.
    + */
    +object PartialAggregate {
    +
    +  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
    +    case agg: AggregateExec
    +        if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>
    +      Some(agg.requiredChildDistribution.head)
    +    case _ =>
    +      None
    +  }
    +}
    +
    +/**
      * Utility functions used by the query planner to convert our plan to new aggregation code path.
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
       }
     
    -  private def createAggregate(
    +  /**
    +   * Builds new merge and map-side [[AggregateExec]]s from an input aggregate operator.
    +   * If an aggregation needs a shuffle for satisfying its own distribution and supports partial
    +   * aggregations, a map-side aggregation is appended before the shuffle in
    +   * [[org.apache.spark.sql.execution.exchange.EnsureRequirements]].
    +   */
    +  def createMapMergeAggregatePair(operator: SparkPlan): (SparkPlan, SparkPlan) = operator match {
    +    case agg: AggregateExec =>
    +      val mapSideAgg = createPartialAggregateExec(
    +        agg.groupingExpressions, agg.aggregateExpressions, agg.child)
    +      val mergeAgg = createAggregateExec(
    +        requiredChildDistributionExpressions = agg.requiredChildDistributionExpressions,
    +        groupingExpressions = agg.groupingExpressions.map(_.toAttribute),
    +        aggregateExpressions = updateMergeAggregateMode(agg.aggregateExpressions),
    +        aggregateAttributes = agg.aggregateAttributes,
    +        initialInputBufferOffset = agg.groupingExpressions.length,
    +        resultExpressions = agg.resultExpressions,
    +        child = agg.child
    --- End diff --
    
    It violates the principle of least surprise. The `mergeAgg` is not usable without the `mapSideAgg`. This is fine for usage in `EnsureRequirements` because it gets straightened out anyway, but can be very surprising if someone uses it in a different way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    LGTM - merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60800/consoleFull)** for PR 10896 at commit [`9ac8025`](https://github.com/apache/spark/commit/9ac80253d59614088f064cd249400b90695912b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75603771
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -151,18 +152,39 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
       private def ensureDistributionAndOrdering(operator: SparkPlan): SparkPlan = {
         val requiredChildDistributions: Seq[Distribution] = operator.requiredChildDistribution
         val requiredChildOrderings: Seq[Seq[SortOrder]] = operator.requiredChildOrdering
    -    var children: Seq[SparkPlan] = operator.children
    -    assert(requiredChildDistributions.length == children.length)
    -    assert(requiredChildOrderings.length == children.length)
    +    assert(requiredChildDistributions.length == operator.children.length)
    +    assert(requiredChildOrderings.length == operator.children.length)
     
         // Ensure that the operator's children satisfy their output distribution requirements:
    -    children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    -        child
    -      case (child, BroadcastDistribution(mode)) =>
    -        BroadcastExchangeExec(mode, child)
    -      case (child, distribution) =>
    -        ShuffleExchange(createPartitioning(distribution, defaultNumPreShufflePartitions), child)
    +    val childrenWithDist = operator.children.zip(requiredChildDistributions)
    +
    +    def createShuffleExchange(dist: Distribution, child: SparkPlan) =
    +      ShuffleExchange(createPartitioning(dist, defaultNumPreShufflePartitions), child)
    +
    +    var (parent, children) = if (!AggUtils.isAggregate(operator)) {
    --- End diff --
    
    Can we try to simplify the if-else here? We only need to isolate aggregates for which can be partially aggregated and for which the distribution does not match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64283/consoleFull)** for PR 10896 at commit [`69751bb`](https://github.com/apache/spark/commit/69751bb5d22f9f8a5505ddbaa035f664309ce30b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-221525267
  
    @maropu I'll take a look today. Is the description up-to-date?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-179139867
  
    Yes, it is. The input query is;
    ```
    val fields = Seq(StringType, DoubleType, DoubleType)
      .zipWithIndex.map { case (dataType, index) =>
        StructField(s"col$index", dataType, true)
      }
     
    val df = sqlContext.createDataFrame(rdd, StructType(fields))
    val df2 = df.repartition($"col0").cache
    val df3 = df2.groupBy($"col0").agg(Map("col1"->"sum", "col2"->"avg"))
    df3.explain(true)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-214186535
  
    **[Test build #56884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56884/consoleFull)** for PR 10896 at commit [`dcc51a1`](https://github.com/apache/spark/commit/dcc51a1bb04b95f436e36d96d91a17eeab16d5ab).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60791/consoleFull)** for PR 10896 at commit [`cb47a54`](https://github.com/apache/spark/commit/cb47a54b25c3ee76e46dbd9ebb83e99814c93580).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64388/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @hvanhovell okay, ready to review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64324/consoleFull)** for PR 10896 at commit [`d5e0ed3`](https://github.com/apache/spark/commit/d5e0ed3d0efdc8047948e48cdc6fb1257cc381f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64322/consoleFull)** for PR 10896 at commit [`8a81e23`](https://github.com/apache/spark/commit/8a81e23cebe25315be1e8d94dbf9b52258bc31f9).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75693169
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/Aggregate.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
    +import org.apache.spark.sql.catalyst.plans.physical._
    +import org.apache.spark.sql.execution.SparkPlan
    +
    +/**
    + * A base class for aggregate implementation.
    + */
    +trait Aggregate {
    --- End diff --
    
    I just used trait along with the  `HashJoin` trait. A super class is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60119/consoleFull)** for PR 10896 at commit [`2b1bea6`](https://github.com/apache/spark/commit/2b1bea6b0f786dcb126bf6123d6ada89e4a6296d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60811/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75666478
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -37,36 +37,58 @@ class PlannerSuite extends SharedSQLContext {
     
       setupTestData()
     
    -  private def testPartialAggregationPlan(query: LogicalPlan): Unit = {
    +  private def testPartialAggregationPlan(query: LogicalPlan): Seq[SparkPlan] = {
         val planner = spark.sessionState.planner
         import planner._
    -    val plannedOption = Aggregation(query).headOption
    -    val planned =
    -      plannedOption.getOrElse(
    -        fail(s"Could query play aggregation query $query. Is it an aggregation query?"))
    -    val aggregations = planned.collect { case n if n.nodeName contains "Aggregate" => n }
    -
    -    // For the new aggregation code path, there will be four aggregate operator for
    -    // distinct aggregations.
    -    assert(
    -      aggregations.size == 2 || aggregations.size == 4,
    -      s"The plan of query $query does not have partial aggregations.")
    +    val ensureRequirements = EnsureRequirements(spark.sessionState.conf)
    +    val planned = Aggregation(query).headOption.map(ensureRequirements(_))
    +      .getOrElse(fail(s"Could query play aggregation query $query. Is it an aggregation query?"))
    +    planned.collect { case n if n.nodeName contains "Aggregate" => n }
       }
     
       test("count is partially aggregated") {
         val query = testData.groupBy('value).agg(count('key)).queryExecution.analyzed
    -    testPartialAggregationPlan(query)
    +    assert(testPartialAggregationPlan(query).size == 2,
    +      s"The plan of query $query does not have partial aggregations.")
       }
     
       test("count distinct is partially aggregated") {
         val query = testData.groupBy('value).agg(countDistinct('key)).queryExecution.analyzed
         testPartialAggregationPlan(query)
    +    // For the new aggregation code path, there will be four aggregate operator for  distinct
    +    // aggregations.
    +    assert(testPartialAggregationPlan(query).size == 4,
    +      s"The plan of query $query does not have partial aggregations.")
       }
     
       test("mixed aggregates are partially aggregated") {
         val query =
           testData.groupBy('value).agg(count('value), countDistinct('key)).queryExecution.analyzed
    -    testPartialAggregationPlan(query)
    +    // For the new aggregation code path, there will be four aggregate operator for  distinct
    +    // aggregations.
    +    assert(testPartialAggregationPlan(query).size == 4,
    +      s"The plan of query $query does not have partial aggregations.")
    +  }
    +
    +  test("non-partial aggregation for distinct aggregates") {
    --- End diff --
    
    Added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-214186715
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56884/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75731352
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    +    assert(isAggregate(operator))
    +    def supportPartial(exprs: Seq[AggregateExpression]) =
    +      exprs.map(_.aggregateFunction).forall(_.supportsPartial)
    +    operator match {
    +      case agg @ HashAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +      case agg @ SortAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +    }
    +  }
    +
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
    +  }
    +
    +  private[execution] def createPartialAggregate(operator: SparkPlan)
    --- End diff --
    
    Could make this public instead of private[execution]? We just opened up a lot of similar APIs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60797/consoleFull)** for PR 10896 at commit [`9ac8025`](https://github.com/apache/spark/commit/9ac80253d59614088f064cd249400b90695912b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64388/consoleFull)** for PR 10896 at commit [`ac68145`](https://github.com/apache/spark/commit/ac6814514e091ffc377fde75d5c79b583c878626).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-214152843
  
    **[Test build #56883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56883/consoleFull)** for PR 10896 at commit [`9d77c90`](https://github.com/apache/spark/commit/9d77c90c05e21364b65e38fa786e5207d8dc9c94).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-179004277
  
    I guess that exchange is added because there is a `distribute by`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60791/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60624/consoleFull)** for PR 10896 at commit [`c40388a`](https://github.com/apache/spark/commit/c40388acf657f18ae1431cb33515e395f6c64eb0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10896


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-179618989
  
    As @marmbrus said, we also need push down partial aggregation under an exchange;
    The current Catalyst transforms
    ```
    df.repartition($"col0").groupBy($"col0").agg(Map("col1"->"sum", "col2"->"avg")).explain(true)
    ```
    into
    ```
    == Physical Plan ==
    TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Final,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178])
    +- TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Partial,isDistinct=false),(avg(col2#161),mode=Partial,isDistinct=false)], output=[col0#159,sum#200,sum#201,count#202L])
       +- TungstenExchange hashpartitioning(col0#159,200), None
          +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #3228 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3228/consoleFull)** for PR 10896 at commit [`572db4c`](https://github.com/apache/spark/commit/572db4ceef346d9ae01065b84d83777b81f2eb06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-180263126
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50812/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64283/consoleFull)** for PR 10896 at commit [`69751bb`](https://github.com/apache/spark/commit/69751bb5d22f9f8a5505ddbaa035f664309ce30b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60624/consoleFull)** for PR 10896 at commit [`c40388a`](https://github.com/apache/spark/commit/c40388acf657f18ae1431cb33515e395f6c64eb0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-174509341
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    I agree that partial aggregate is also kind of optimization, and it's tricky to put it in planner. I think it makes sense to clean it up, after we have sufficient discussion and come to a consensus, but not finishing it within an optimization.
    
    For this particular optimization, I think it's much simpler to add an extra rule to merge the partial and final aggregate, than spreading the aggregation stuff to `EnsureRequirements`.
    
    cc @yhuai too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60119/consoleFull)** for PR 10896 at commit [`2b1bea6`](https://github.com/apache/spark/commit/2b1bea6b0f786dcb126bf6123d6ada89e4a6296d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r65103700
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -257,10 +257,19 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
                     planLater(child))
                 }
               } else if (functionsWithDistinct.isEmpty) {
    +            // Check if the child operator satisfies the group-by distribution requirements
    --- End diff --
    
    Why not move this code block into `aggregate.Utils.planAggregateWithoutDistinct`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75670643
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60811/consoleFull)** for PR 10896 at commit [`572db4c`](https://github.com/apache/spark/commit/572db4ceef346d9ae01065b84d83777b81f2eb06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60797/consoleFull)** for PR 10896 at commit [`9ac8025`](https://github.com/apache/spark/commit/9ac80253d59614088f064cd249400b90695912b6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-178372627
  
    @marmbrus Could you review this and give me suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    okay, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64200/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64209/consoleFull)** for PR 10896 at commit [`0375ac6`](https://github.com/apache/spark/commit/0375ac69a517092a6ac6bb412b6ffb1509835c8a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r65103353
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala ---
    @@ -81,20 +81,38 @@ object Utils {
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
           resultExpressions: Seq[NamedExpression],
    +      partialAggregation: Boolean,
    --- End diff --
    
    Partial aggregation IMO implies that we add a partial aggregation step. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-214155733
  
    **[Test build #56884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56884/consoleFull)** for PR 10896 at commit [`dcc51a1`](https://github.com/apache/spark/commit/dcc51a1bb04b95f436e36d96d91a17eeab16d5ab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75649105
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -151,18 +152,39 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
       private def ensureDistributionAndOrdering(operator: SparkPlan): SparkPlan = {
         val requiredChildDistributions: Seq[Distribution] = operator.requiredChildDistribution
         val requiredChildOrderings: Seq[Seq[SortOrder]] = operator.requiredChildOrdering
    -    var children: Seq[SparkPlan] = operator.children
    -    assert(requiredChildDistributions.length == children.length)
    -    assert(requiredChildOrderings.length == children.length)
    +    assert(requiredChildDistributions.length == operator.children.length)
    +    assert(requiredChildOrderings.length == operator.children.length)
     
         // Ensure that the operator's children satisfy their output distribution requirements:
    -    children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    -        child
    -      case (child, BroadcastDistribution(mode)) =>
    -        BroadcastExchangeExec(mode, child)
    -      case (child, distribution) =>
    -        ShuffleExchange(createPartitioning(distribution, defaultNumPreShufflePartitions), child)
    +    val childrenWithDist = operator.children.zip(requiredChildDistributions)
    +
    +    def createShuffleExchange(dist: Distribution, child: SparkPlan) =
    +      ShuffleExchange(createPartitioning(dist, defaultNumPreShufflePartitions), child)
    +
    +    var (parent, children) = if (!AggUtils.isAggregate(operator)) {
    --- End diff --
    
    okay, I'll try to simplify.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-214152849
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60800/consoleFull)** for PR 10896 at commit [`9ac8025`](https://github.com/apache/spark/commit/9ac80253d59614088f064cd249400b90695912b6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #3228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3228/consoleFull)** for PR 10896 at commit [`572db4c`](https://github.com/apache/spark/commit/572db4ceef346d9ae01065b84d83777b81f2eb06).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64076/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Yeah, lets pick this up again. Thanks for the ping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64079/consoleFull)** for PR 10896 at commit [`91701eb`](https://github.com/apache/spark/commit/91701eb5171555ce73ba6b190b41cc17f05eee8c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    okay, thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64322/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-214186713
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @maropu I would like to move this one to Spark 2.1 and implement it like @yhuai suggested. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r76751695
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -225,37 +219,38 @@ object AggUtils {
             rewrittenDistinctFunctions.zipWithIndex.map { case (func, i) =>
               // We rewrite the aggregate function to a non-distinct aggregation because
               // its input will have distinct arguments.
    -          // We just keep the isDistinct setting to true, so when users look at the query plan,
    -          // they still can see distinct aggregations.
    -          val expr = AggregateExpression(func, Final, isDistinct = true)
    +          // We keep the isDistinct setting to true because this flag is used to generate partial
    +          // aggregations and it is easy to see aggregation types in the query plan.
    +          val expr = AggregateExpression(func, Complete, isDistinct = true)
               // Use original AggregationFunction to lookup attributes, which is used to build
               // aggregateFunctionToAttribute
               val attr = functionsWithDistinct(i).resultAttribute
               (expr, attr)
    -      }.unzip
    +        }.unzip
     
    -      createAggregate(
    +      createAggregateExec(
             requiredChildDistributionExpressions = Some(groupingAttributes),
             groupingExpressions = groupingAttributes,
             aggregateExpressions = finalAggregateExpressions ++ distinctAggregateExpressions,
             aggregateAttributes = finalAggregateAttributes ++ distinctAggregateAttributes,
             initialInputBufferOffset = groupingAttributes.length,
             resultExpressions = resultExpressions,
    -        child = partialDistinctAggregate)
    +        child = partialAggregate)
         }
     
         finalAndCompleteAggregate :: Nil
       }
     
       /**
        * Plans a streaming aggregation using the following progression:
    -   *  - Partial Aggregation
    -   *  - Shuffle
    -   *  - Partial Merge (now there is at most 1 tuple per group)
    +   *  - Partial Aggregation (now there is at most 1 tuple per group)
        *  - StateStoreRestore (now there is 1 tuple from this batch + optionally one from the previous)
        *  - PartialMerge (now there is at most 1 tuple per group)
        *  - StateStoreSave (saves the tuple for the next batch)
        *  - Complete (output the current result of the aggregation)
    +   *
    +   *  If the first aggregation needs a shuffle to satisfy its distribution, a map-side partial
    +   *  an aggregation and a shuffle are added in `EnsureRequirements`.
        */
       def planStreamingAggregation(
    --- End diff --
    
    have we tested the streaming aggregation with the optimization?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @yhuai Thanks your comment and I agree with you. We'll keep the discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60119/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @hvanhovell yea, I'm checking we could implement  the same logic in `EnsureRequirements#ensureDistributionAndOrdering`.
    As @yhuai suggested, the new approach will make a pr more bigger.
    So, which is better; closing this pr for now and making a new pr, or making follow-up pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60713/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60713/consoleFull)** for PR 10896 at commit [`f8ac8cb`](https://github.com/apache/spark/commit/f8ac8cbe9fe646d6efd0e609778314c85fcb0e49).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60624/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60766/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60697/consoleFull)** for PR 10896 at commit [`3ae9153`](https://github.com/apache/spark/commit/3ae91530a6883aadfdbb08d77959ab414f822de0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60809/consoleFull)** for PR 10896 at commit [`103680c`](https://github.com/apache/spark/commit/103680c536a4e15fdad7d6ab55f15c778e547c79).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-181783113
  
    @yhuai ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64199/consoleFull)** for PR 10896 at commit [`e37ef6a`](https://github.com/apache/spark/commit/e37ef6afd47e9dd325a7f9e6d0826a3cb66c8e2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64200/consoleFull)** for PR 10896 at commit [`8b64305`](https://github.com/apache/spark/commit/8b643055e47d6213847d0cf10adf83a3f59ece55).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-179611828
  
    Ah, yes..., you're right; in this case, it'd be better to push down partial aggregation under an exchange.
    I try to fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-180236978
  
    @yhuai ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-174508861
  
    **[Test build #49988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49988/consoleFull)** for PR 10896 at commit [`1b7e3d8`](https://github.com/apache/spark/commit/1b7e3d8cc2e7d74d5c7e6b6564c03dc1f152bc0a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @hvanhovell okay, I'll finish the fix soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64322/consoleFull)** for PR 10896 at commit [`8a81e23`](https://github.com/apache/spark/commit/8a81e23cebe25315be1e8d94dbf9b52258bc31f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75979792
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.expressions.aggregate._
    +import org.apache.spark.sql.catalyst.plans.physical.Distribution
     import org.apache.spark.sql.execution.SparkPlan
     import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec}
     
     /**
    + * A pattern that finds aggregate operators to support partial aggregations.
    + */
    +object PartialAggregate {
    +
    +  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
    +    case agg: AggregateExec
    +        if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>
    --- End diff --
    
    yea, okay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-221452701
  
    @hvanhovell ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64388/consoleFull)** for PR 10896 at commit [`ac68145`](https://github.com/apache/spark/commit/ac6814514e091ffc377fde75d5c79b583c878626).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75894983
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.expressions.aggregate._
    +import org.apache.spark.sql.catalyst.plans.physical.Distribution
     import org.apache.spark.sql.execution.SparkPlan
     import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec}
     
     /**
    + * A pattern that finds aggregate operators to support partial aggregations.
    + */
    +object PartialAggregate {
    +
    +  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
    +    case agg: AggregateExec
    +        if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>
    --- End diff --
    
    Put this in a function. This can be found a few times in the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-222549257
  
    @maropu IIUC this is still the old approach instead of the approach @yhuai suggests. Do you feel up to see if his approach works?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64076/consoleFull)** for PR 10896 at commit [`2615a6e`](https://github.com/apache/spark/commit/2615a6e06f75c5b264c726a61cba6e647f77dd6f).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10896
  
    Thank for you comments! I'll check them in a few days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75895214
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -19,34 +19,94 @@ package org.apache.spark.sql.execution.aggregate
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.expressions.aggregate._
    +import org.apache.spark.sql.catalyst.plans.physical.Distribution
     import org.apache.spark.sql.execution.SparkPlan
     import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec}
     
     /**
    + * A pattern that finds aggregate operators to support partial aggregations.
    + */
    +object PartialAggregate {
    +
    +  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
    +    case agg: AggregateExec
    +        if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>
    +      Some(agg.requiredChildDistribution.head)
    +    case _ =>
    +      None
    +  }
    +}
    +
    +/**
      * Utility functions used by the query planner to convert our plan to new aggregation code path.
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
       }
     
    -  private def createAggregate(
    +  /**
    +   * Builds new merge and map-side [[AggregateExec]]s from an input aggregate operator.
    +   * If an aggregation needs a shuffle for satisfying its own distribution and supports partial
    +   * aggregations, a map-side aggregation is appended before the shuffle in
    +   * [[org.apache.spark.sql.execution.exchange.EnsureRequirements]].
    +   */
    +  def createMapMergeAggregatePair(operator: SparkPlan): (SparkPlan, SparkPlan) = operator match {
    +    case agg: AggregateExec =>
    +      val mapSideAgg = createPartialAggregateExec(
    +        agg.groupingExpressions, agg.aggregateExpressions, agg.child)
    +      val mergeAgg = createAggregateExec(
    +        requiredChildDistributionExpressions = agg.requiredChildDistributionExpressions,
    +        groupingExpressions = agg.groupingExpressions.map(_.toAttribute),
    +        aggregateExpressions = updateMergeAggregateMode(agg.aggregateExpressions),
    +        aggregateAttributes = agg.aggregateAttributes,
    +        initialInputBufferOffset = agg.groupingExpressions.length,
    +        resultExpressions = agg.resultExpressions,
    +        child = agg.child
    --- End diff --
    
    mapSideAgg?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-174439665
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60121/consoleFull)** for PR 10896 at commit [`2007caa`](https://github.com/apache/spark/commit/2007caaa712439926d8a0d30df1bffe472bf50d3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12978][SQL] Skip unnecessary final grou...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10896#issuecomment-220494208
  
    cc @hvanhovell can you review this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75732105
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    +    assert(isAggregate(operator))
    +    def supportPartial(exprs: Seq[AggregateExpression]) =
    +      exprs.map(_.aggregateFunction).forall(_.supportsPartial)
    +    operator match {
    +      case agg @ HashAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +      case agg @ SortAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +    }
    +  }
    +
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
    +  }
    +
    +  private[execution] def createPartialAggregate(operator: SparkPlan)
    --- End diff --
    
    Small: The name of the function is also quite misleading. It returns a map side and merge aggregate pair, so `createMapMergeAggregatePair`? Please also add a little bit of documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75819461
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    +    assert(isAggregate(operator))
    +    def supportPartial(exprs: Seq[AggregateExpression]) =
    +      exprs.map(_.aggregateFunction).forall(_.supportsPartial)
    +    operator match {
    +      case agg @ HashAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +      case agg @ SortAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    +        supportPartial(aggregateExpressions)
    +    }
    +  }
    +
    +  private def createPartialAggregateExec(
           groupingExpressions: Seq[NamedExpression],
           aggregateExpressions: Seq[AggregateExpression],
    -      resultExpressions: Seq[NamedExpression],
    -      child: SparkPlan): Seq[SparkPlan] = {
    +      child: SparkPlan): SparkPlan = {
    +    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    +    val functionsWithDistinct = aggregateExpressions.filter(_.isDistinct)
    +    val partialAggregateExpressions = aggregateExpressions.map {
    +      case agg @ AggregateExpression(_, _, false, _) if functionsWithDistinct.length > 0 =>
    +        agg.copy(mode = PartialMerge)
    +      case agg =>
    +        agg.copy(mode = Partial)
    +    }
    +    val partialAggregateAttributes =
    +      partialAggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
    +    val partialResultExpressions =
    +      groupingAttributes ++
    +        partialAggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes)
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    -    val completeAggregateAttributes = completeAggregateExpressions.map(_.resultAttribute)
    -    SortAggregateExec(
    -      requiredChildDistributionExpressions = Some(groupingExpressions),
    +    createAggregateExec(
    +      requiredChildDistributionExpressions = None,
           groupingExpressions = groupingExpressions,
    -      aggregateExpressions = completeAggregateExpressions,
    -      aggregateAttributes = completeAggregateAttributes,
    -      initialInputBufferOffset = 0,
    -      resultExpressions = resultExpressions,
    -      child = child
    -    ) :: Nil
    +      aggregateExpressions = partialAggregateExpressions,
    +      aggregateAttributes = partialAggregateAttributes,
    +      initialInputBufferOffset = if (functionsWithDistinct.length > 0) {
    +        groupingExpressions.length + functionsWithDistinct.head.aggregateFunction.children.length
    +      } else {
    +        0
    +      },
    +      resultExpressions = partialResultExpressions,
    +      child = child)
    +  }
    +
    +  private def updateMergeAggregateMode(aggregateExpressions: Seq[AggregateExpression]) = {
    +    def updateMode(mode: AggregateMode) = mode match {
    +      case Partial => PartialMerge
    +      case Complete => Final
    +      case mode => mode
    +    }
    +    aggregateExpressions.map(e => e.copy(mode = updateMode(e.mode)))
    +  }
    +
    +  private[execution] def createPartialAggregate(operator: SparkPlan)
    --- End diff --
    
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60120 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60120/consoleFull)** for PR 10896 at commit [`36553bc`](https://github.com/apache/spark/commit/36553bc2a465c3c8d7b764e565f88aba5385d759).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75666430
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala ---
    @@ -121,3 +121,6 @@ case class SortAggregateExec(
         }
       }
     }
    +
    +object SortAggregateExec
    --- End diff --
    
    Removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60120/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64199/consoleFull)** for PR 10896 at commit [`e37ef6a`](https://github.com/apache/spark/commit/e37ef6afd47e9dd325a7f9e6d0826a3cb66c8e2e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait Aggregate `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64076/consoleFull)** for PR 10896 at commit [`2615a6e`](https://github.com/apache/spark/commit/2615a6e06f75c5b264c726a61cba6e647f77dd6f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #60809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60809/consoleFull)** for PR 10896 at commit [`103680c`](https://github.com/apache/spark/commit/103680c536a4e15fdad7d6ab55f15c778e547c79).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    @hvanhovell ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    **[Test build #64209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64209/consoleFull)** for PR 10896 at commit [`0375ac6`](https://github.com/apache/spark/commit/0375ac69a517092a6ac6bb412b6ffb1509835c8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10896: [SPARK-12978][SQL] Skip unnecessary final group-b...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10896#discussion_r75603963
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ---
    @@ -27,26 +27,87 @@ import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateSto
      */
     object AggUtils {
     
    -  def planAggregateWithoutPartial(
    +  private[execution] def isAggregate(operator: SparkPlan): Boolean = {
    +    operator.isInstanceOf[HashAggregateExec] || operator.isInstanceOf[SortAggregateExec]
    +  }
    +
    +  private[execution] def supportPartialAggregate(operator: SparkPlan): Boolean = {
    +    assert(isAggregate(operator))
    +    def supportPartial(exprs: Seq[AggregateExpression]) =
    +      exprs.map(_.aggregateFunction).forall(_.supportsPartial)
    +    operator match {
    +      case agg @ HashAggregateExec(_, _, aggregateExpressions, _, _, _, _) =>
    --- End diff --
    
    Should we have a common superclass for the `*AggregateExec` classes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/10896
  
    Sorry for my bad explanation. yes, I agree that we remove the aggregation stuff from `EnsureRequirements` for simpler codes. I'd say, how about moving the aggregation stuff (creating partial aggregations) into the extra rule after `EnsureRequirements`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org