You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/15 04:16:25 UTC

[GitHub] [spark] beliefer opened a new pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

beliefer opened a new pull request #34904:
URL: https://github.com/apache/spark/pull/34904


   ### What changes were proposed in this pull request?
   Currently , Spark supports push down aggregate with partial-agg and final-agg . For some data source (e.g. JDBC ) , we can avoid partial-agg and final-agg by running completely on database.
   
   ### Why are the changes needed?
   Improve performance for aggregate pushdown.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'. Just change the inner implement.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994738680


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146222/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994361361


   **[Test build #146211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146211/testReport)** for PR 34904 at commit [`b414ccb`](https://github.com/apache/spark/commit/b414ccbb32da992043ed50565c03cdd69cc6a00e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r769794944



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java
##########
@@ -45,6 +45,14 @@
 @Evolving
 public interface SupportsPushDownAggregates extends ScanBuilder {
 
+  /**
+   * Whether the datasource support complete aggregation push-down. Spark could avoid partial-agg
+   * and final-agg when the aggregation operation can be pushed down to the datasource completely.
+   *
+   * @return true if the aggregation can be pushed down to datasource completely, false otherwise.
+   */
+  boolean supportCompletePushDown();

Review comment:
       This is an already-released API. Let's provide a default return value to avoid breaking existing implementations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997582331


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995973924


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50763/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995771124


   **[Test build #146289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146289/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771378975



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val groupOutputLength = resultExpressions.length - aggOutput.length

Review comment:
       Thank you for you reminder.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999517302


   **[Test build #146479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146479/testReport)** for PR 34904 at commit [`c939885`](https://github.com/apache/spark/commit/c939885560c0eef3959a378abe87e0ec84de7088).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999248228


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304275


   **[Test build #146464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146464/testReport)** for PR 34904 at commit [`448bc7f`](https://github.com/apache/spark/commit/448bc7ff630450f4ca1103c03cf983e14246291d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r772197103



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,56 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val projectExpressions = resultExpressions.map { expr =>
+                    expr.transform {
+                      case agg: AggregateExpression =>
+                        val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+                        val aggAttribute = aggOutput(ordinal)
+                        val child = if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+                          aggAttribute
+                        } else {
+                          Cast(aggAttribute, agg.resultAttribute.dataType)
+                        }
+                        Alias(child, agg.resultAttribute.name)(agg.resultAttribute.exprId)
+                    }
+                  }.asInstanceOf[Seq[NamedExpression]]

Review comment:
       Thanks for the reminder.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999223617


   **[Test build #146460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146460/testReport)** for PR 34904 at commit [`ee36dbb`](https://github.com/apache/spark/commit/ee36dbbf25f722530c09c0205cb203642821a340).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999365897


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999330036


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994481806


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50692/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770341543



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                val complexOperators = resultExpressions.flatMap { expr =>

Review comment:
       I got it. We can use project to replace it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995865000


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50761/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994361893


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146211/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994361893


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146211/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995844852


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50761/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995973924


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50763/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996112823


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146291/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995417445


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50724/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994455562


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50692/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994279032


   **[Test build #146211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146211/testReport)** for PR 34904 at commit [`b414ccb`](https://github.com/apache/spark/commit/b414ccbb32da992043ed50565c03cdd69cc6a00e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770162045



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java
##########
@@ -45,6 +45,14 @@
 @Evolving
 public interface SupportsPushDownAggregates extends ScanBuilder {
 
+  /**
+   * Whether the datasource support complete aggregation push-down. Spark could avoid partial-agg
+   * and final-agg when the aggregation operation can be pushed down to the datasource completely.
+   *
+   * @return true if the aggregation can be pushed down to datasource completely, false otherwise.
+   */
+  boolean supportCompletePushDown();

Review comment:
       OK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995589009


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50741/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995380231


   **[Test build #146250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146250/testReport)** for PR 34904 at commit [`ac187cc`](https://github.com/apache/spark/commit/ac187cca966a1b5f3511d72a5a572dd18e2d0748).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994343433


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50685/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995957758


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50763/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997680343


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146384/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996023432


   **[Test build #146289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146289/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997849468


   **[Test build #146398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146398/testReport)** for PR 34904 at commit [`24ce91d`](https://github.com/apache/spark/commit/24ce91d51379e192c35529f90204db137390f570).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997931128


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50873/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999399735


   **[Test build #146467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146467/testReport)** for PR 34904 at commit [`4575c71`](https://github.com/apache/spark/commit/4575c714019b61c0f2b96941091de56cb8adbd17).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999400000


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304786


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999323083


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995507992


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146250/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994295446


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50685/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995509848


   **[Test build #146266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146266/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997972081


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50874/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998037696


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50874/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995875853


   **[Test build #146291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146291/testReport)** for PR 34904 at commit [`9d9cd64`](https://github.com/apache/spark/commit/9d9cd64c7318bdc2bb9ef82f8e3ce41e6c8ff44b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995771124


   **[Test build #146289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146289/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771106583



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val groupOutputLength = resultExpressions.length - aggOutput.length
+                  val aggExpressions = resultExpressions.drop(groupOutputLength).map { expr =>

Review comment:
       We need to convert aggregate to project, and we need to:
   1. replace aggregate functions with the corresponding attributes from the scan node that has aggregate pushed.
   2. replace group by expressions with the corresponding attributes from the scan node that has aggregate pushed. (the query can be `GROUP BY a + b`)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771104358



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val groupOutputLength = resultExpressions.length - aggOutput.length

Review comment:
       what is this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997544960


   **[Test build #146384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146384/testReport)** for PR 34904 at commit [`2384e38`](https://github.com/apache/spark/commit/2384e38bb7779d83e607c9c60970a7ef4ded09ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999328531


   **[Test build #146467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146467/testReport)** for PR 34904 at commit [`4575c71`](https://github.com/apache/spark/commit/4575c714019b61c0f2b96941091de56cb8adbd17).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997849468


   **[Test build #146398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146398/testReport)** for PR 34904 at commit [`24ce91d`](https://github.com/apache/spark/commit/24ce91d51379e192c35529f90204db137390f570).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997935367


   **[Test build #146399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146399/testReport)** for PR 34904 at commit [`b4be693`](https://github.com/apache/spark/commit/b4be693650905aee3c038ee79f97ae2a014e65ff).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995915277


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50763/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995865000


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50761/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994321446


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50685/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995767975


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146266/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995808202


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50761/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996112823


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146291/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998098368


   **[Test build #146398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146398/testReport)** for PR 34904 at commit [`24ce91d`](https://github.com/apache/spark/commit/24ce91d51379e192c35529f90204db137390f570).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998102581


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146398/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773599203



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -189,6 +207,13 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
       }
   }
 
+  private def newAggChild(aggAttribute: AttributeReference, aggDataType: DataType) =

Review comment:
       nit: `addCastIfNeeded`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r772219016



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,56 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val projectExpressions = resultExpressions.map { expr =>
+                    expr.transform {
+                      case agg: AggregateExpression =>
+                        val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+                        val aggAttribute = aggOutput(ordinal)
+                        val child = if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+                          aggAttribute
+                        } else {
+                          Cast(aggAttribute, agg.resultAttribute.dataType)

Review comment:
       Because the JDBC protocol returns decimal(20, 2), but spark need decimal(32, 2)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999271490


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999517302


   **[Test build #146479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146479/testReport)** for PR 34904 at commit [`c939885`](https://github.com/apache/spark/commit/c939885560c0eef3959a378abe87e0ec84de7088).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773131123



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -189,6 +204,13 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
       }
   }
 
+  private def newAggOutput(aggAttribute: AttributeReference, agg: AggregateExpression) =
+    if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+      aggAttribute
+    } else {
+      Cast(aggAttribute, agg.resultAttribute.dataType)

Review comment:
       I think complete and partial pushdown are different here.
   
   For complete pushdown, we should cast to the data type of the aggregate function.
   For partial pushdown, Spark will run aggregate again, so we should cast to the data type of the input of the aggregate function, so that the final data type is still the same as before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999729390


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146479/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999725622


   **[Test build #146479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146479/testReport)** for PR 34904 at commit [`c939885`](https://github.com/apache/spark/commit/c939885560c0eef3959a378abe87e0ec84de7088).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994525148


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50696/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994560758


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50696/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994350400


   **[Test build #146218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146218/testReport)** for PR 34904 at commit [`313d51d`](https://github.com/apache/spark/commit/313d51d334702fb166aaa2a440cd9ffbbebe62cb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770163031



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                val complexOperators = resultExpressions.flatMap { expr =>

Review comment:
       It means the aggregate expressions contains complex operators. For example, Sum(a') + Sum(b') only pushdown `Sum(a')` and `Sum(b')`, not contains the `Add`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994474895


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50696/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995398614


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50724/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995541166


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50741/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996024943


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146289/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997590518


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997677240


   **[Test build #146384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146384/testReport)** for PR 34904 at commit [`2384e38`](https://github.com/apache/spark/commit/2384e38bb7779d83e607c9c60970a7ef4ded09ec).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999283579






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r772173898



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,56 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val projectExpressions = resultExpressions.map { expr =>
+                    expr.transform {
+                      case agg: AggregateExpression =>
+                        val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+                        val aggAttribute = aggOutput(ordinal)
+                        val child = if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+                          aggAttribute
+                        } else {
+                          Cast(aggAttribute, agg.resultAttribute.dataType)

Review comment:
       when can we reach this branch? and shall we add cast in the partial agg pushdown branch as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997933838


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50873/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

viirya commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-1000112045


   cc @huaxingao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan closed pull request #34904:
URL: https://github.com/apache/spark/pull/34904


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999328531


   **[Test build #146467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146467/testReport)** for PR 34904 at commit [`4575c71`](https://github.com/apache/spark/commit/4575c714019b61c0f2b96941091de56cb8adbd17).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995589009


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50741/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995762482


   **[Test build #146266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146266/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994403730


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50692/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994487177


   **[Test build #146218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146218/testReport)** for PR 34904 at commit [`313d51d`](https://github.com/apache/spark/commit/313d51d334702fb166aaa2a440cd9ffbbebe62cb).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994560758


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50696/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994729636


   **[Test build #146222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146222/testReport)** for PR 34904 at commit [`b76f5f8`](https://github.com/apache/spark/commit/b76f5f87859a6e2011fe7a45a3804589cd09d16c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r769796739



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                val complexOperators = resultExpressions.flatMap { expr =>

Review comment:
       what does this mean?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999588051


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50955/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999729390


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146479/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999589619


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50955/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997544960


   **[Test build #146384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146384/testReport)** for PR 34904 at commit [`2384e38`](https://github.com/apache/spark/commit/2384e38bb7779d83e607c9c60970a7ef4ded09ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773599024



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val projectExpressions = resultExpressions.map { expr =>
+                    // TODO At present, only push down group by attribute is supported.
+                    // In future, more attribute conversion is extended here. e.g. GetStructField
+                    expr.transform {
+                      case agg: AggregateExpression =>
+                        val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+                        val child = newAggChild(aggOutput(ordinal), agg.resultAttribute.dataType)
+                        Alias(child, agg.resultAttribute.name)(agg.resultAttribute.exprId)
+                    }
+                  }.asInstanceOf[Seq[NamedExpression]]
+                  Project(projectExpressions, scanRelation)
+                } else {
+                  val plan = Aggregate(
+                    output.take(groupingExpressions.length), resultExpressions, scanRelation)
+
+                  // scalastyle:off
+                  // Change the optimized logical plan to reflect the pushed down aggregate
+                  // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
+                  // SELECT min(c1), max(c1) FROM t GROUP BY c2;
+                  // The original logical plan is
+                  // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
+                  // +- RelationV2[c1#9, c2#10] ...
+                  //
+                  // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
+                  // we have the following
+                  // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
+                  // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
+                  //
+                  // We want to change it to
+                  // == Optimized Logical Plan ==
+                  // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
+                  // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
+                  // scalastyle:on
+                  plan.transformExpressions {
+                    case agg: AggregateExpression =>
+                      val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+                      val aggAttribute = aggOutput(ordinal)
+                      val aggFunction: aggregate.AggregateFunction =
+                        agg.aggregateFunction match {
+                          case max: aggregate.Max =>
+                            max.copy(child = newAggChild(aggAttribute, max.child.dataType))
+                          case min: aggregate.Min =>
+                            min.copy(child = newAggChild(aggAttribute, min.child.dataType))
+                          case sum: aggregate.Sum =>
+                            sum.copy(child = newAggChild(aggAttribute, sum.child.dataType))
+                          case _: aggregate.Count => aggregate.Sum(aggAttribute)

Review comment:
       For `count`, I think we should cast the aggAttr to long type, to make sure `Sum(aggAttribute)` also returns long.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999365844


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771381985



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val groupOutputLength = resultExpressions.length - aggOutput.length
+                  val aggExpressions = resultExpressions.drop(groupOutputLength).map { expr =>

Review comment:
       Thank you for the reminder.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997879305


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50873/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999275911


   **[Test build #146460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146460/testReport)** for PR 34904 at commit [`ee36dbb`](https://github.com/apache/spark/commit/ee36dbbf25f722530c09c0205cb203642821a340).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994343433


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50685/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994279032


   **[Test build #146211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146211/testReport)** for PR 34904 at commit [`b414ccb`](https://github.com/apache/spark/commit/b414ccbb32da992043ed50565c03cdd69cc6a00e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994738680


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146222/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994414817


   **[Test build #146222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146222/testReport)** for PR 34904 at commit [`b76f5f8`](https://github.com/apache/spark/commit/b76f5f87859a6e2011fe7a45a3804589cd09d16c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994854225


   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995498674


   **[Test build #146250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146250/testReport)** for PR 34904 at commit [`ac187cc`](https://github.com/apache/spark/commit/ac187cca966a1b5f3511d72a5a572dd18e2d0748).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995380231


   **[Test build #146250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146250/testReport)** for PR 34904 at commit [`ac187cc`](https://github.com/apache/spark/commit/ac187cca966a1b5f3511d72a5a572dd18e2d0748).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995509848


   **[Test build #146266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146266/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995507992


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146250/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996024943


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146289/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997560590


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998181418


   **[Test build #146399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146399/testReport)** for PR 34904 at commit [`b4be693`](https://github.com/apache/spark/commit/b4be693650905aee3c038ee79f97ae2a014e65ff).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304771


   **[Test build #146464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146464/testReport)** for PR 34904 at commit [`448bc7f`](https://github.com/apache/spark/commit/448bc7ff630450f4ca1103c03cf983e14246291d).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304275


   **[Test build #146464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146464/testReport)** for PR 34904 at commit [`448bc7f`](https://github.com/apache/spark/commit/448bc7ff630450f4ca1103c03cf983e14246291d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999356495


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998034793


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50874/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999223617


   **[Test build #146460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146460/testReport)** for PR 34904 at commit [`ee36dbb`](https://github.com/apache/spark/commit/ee36dbbf25f722530c09c0205cb203642821a340).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997680343


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146384/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998184862


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146399/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r776155741



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
##########
@@ -72,6 +72,9 @@ case class JDBCScanBuilder(
 
   private var pushedGroupByCols: Option[Array[String]] = None
 
+  override def supportCompletePushDown: Boolean =
+    jdbcOptions.numPartitions.map(_ == 1).getOrElse(true)

Review comment:
       I will followup a PR. @huaxingao Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-1000375465


   @cloud-fan Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994350400


   **[Test build #146218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146218/testReport)** for PR 34904 at commit [`313d51d`](https://github.com/apache/spark/commit/313d51d334702fb166aaa2a440cd9ffbbebe62cb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994487468


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146218/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994481806


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50692/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998037696


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50874/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997590518


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r772178127



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,56 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val projectExpressions = resultExpressions.map { expr =>
+                    expr.transform {
+                      case agg: AggregateExpression =>
+                        val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+                        val aggAttribute = aggOutput(ordinal)
+                        val child = if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+                          aggAttribute
+                        } else {
+                          Cast(aggAttribute, agg.resultAttribute.dataType)
+                        }
+                        Alias(child, agg.resultAttribute.name)(agg.resultAttribute.exprId)
+                    }
+                  }.asInstanceOf[Seq[NamedExpression]]

Review comment:
       According to the DS v2 API, it's possible to push down `GROUP BY a.b`, and we need to replace `GetStructField(Attr("a"), "b")` with the group col attribute from the scan relation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997933838


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50873/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997935367


   **[Test build #146399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146399/testReport)** for PR 34904 at commit [`b4be693`](https://github.com/apache/spark/commit/b4be693650905aee3c038ee79f97ae2a014e65ff).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998102581


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146398/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995570766


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50741/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995427125


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50724/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995767383


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995875853


   **[Test build #146291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146291/testReport)** for PR 34904 at commit [`9d9cd64`](https://github.com/apache/spark/commit/9d9cd64c7318bdc2bb9ef82f8e3ce41e6c8ff44b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304786


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999400000


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999589619


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50955/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994414817


   **[Test build #146222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146222/testReport)** for PR 34904 at commit [`b76f5f8`](https://github.com/apache/spark/commit/b76f5f87859a6e2011fe7a45a3804589cd09d16c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995427125


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50724/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770241266



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                val complexOperators = resultExpressions.flatMap { expr =>

Review comment:
       If I read the code correctly, you push down nothing for `sum(a) + sum(b)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994487468


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146218/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770248411



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                val complexOperators = resultExpressions.flatMap { expr =>

Review comment:
       V2ScanRelationPushDown pushdown `sum(a)` and `sum(b)` to datasource , not `sum(a) + sum(b)`. ref:
   https://github.com/apache/spark/blob/ac187cca966a1b5f3511d72a5a572dd18e2d0748/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L91




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996111394


   **[Test build #146291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146291/testReport)** for PR 34904 at commit [`9d9cd64`](https://github.com/apache/spark/commit/9d9cd64c7318bdc2bb9ef82f8e3ce41e6c8ff44b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `    (defaultdict(<class 'list'>, `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771105727



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
 
                 val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
 
-                val plan = Aggregate(
-                  output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
-                // scalastyle:off
-                // Change the optimized logical plan to reflect the pushed down aggregate
-                // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
-                // SELECT min(c1), max(c1) FROM t GROUP BY c2;
-                // The original logical plan is
-                // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c1#9, c2#10] ...
-                //
-                // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
-                // we have the following
-                // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                //
-                // We want to change it to
-                // == Optimized Logical Plan ==
-                // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
-                // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
-                // scalastyle:on
-                val aggOutput = output.drop(groupAttrs.length)
-                plan.transformExpressions {
-                  case agg: AggregateExpression =>
-                    val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
-                    val aggFunction: aggregate.AggregateFunction =
-                      agg.aggregateFunction match {
-                        case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
-                        case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
-                        case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
-                        case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
-                        case other => other
-                      }
-                    agg.copy(aggregateFunction = aggFunction)
+                if (r.supportCompletePushDown()) {
+                  val groupOutputLength = resultExpressions.length - aggOutput.length

Review comment:
       The result expressions have nothing to do with number of group columns, people can do `SELECT a + b, max(c) - max(d) FROM t GROUP BY a, b`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995767975


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146266/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773129127



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -633,4 +715,30 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
     }
     checkAnswer(query, Seq(Row(29000.0)))
   }
+
+  test("scan with aggregate push-down: SUM(CASE WHEN) with group by") {
+    val df =
+      sql("SELECT SUM(CASE WHEN SALARY > 0 THEN 1 ELSE 0 END) FROM h2.test.employee GROUP BY DEPT")
+    checkAggregateRemoved(df, false)
+    df.queryExecution.optimizedPlan.collect {
+      case _: DataSourceV2ScanRelation =>
+        val expected_plan_fragment =
+          "PushedFilters: [], "
+        checkKeywordsExistsInExplain(df, expected_plan_fragment)
+    }
+    checkAnswer(df, Seq(Row(1), Row(2), Row(2)))
+  }
+
+  test("scan with aggregate push-down: SUM(NVL) with group by") {

Review comment:
       This test case looks redundant to the above one and we can remove it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998184862


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146399/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999283580






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999356495


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999365897


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999348942


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-1000362478


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999544778


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50955/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Posted by GitBox <gi...@apache.org>.

huaxingao commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r776149870



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
##########
@@ -72,6 +72,9 @@ case class JDBCScanBuilder(
 
   private var pushedGroupByCols: Option[Array[String]] = None
 
+  override def supportCompletePushDown: Boolean =
+    jdbcOptions.numPartitions.map(_ == 1).getOrElse(true)

Review comment:
       In the case of multiple partitions, if partition columns are the same as group by columns, should `supportCompletePushDown` be set to true as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org