You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/15 04:16:25 UTC
[GitHub] [spark] beliefer opened a new pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
beliefer opened a new pull request #34904:
URL: https://github.com/apache/spark/pull/34904
### What changes were proposed in this pull request?
Currently , Spark supports push down aggregate with partial-agg and final-agg . For some data source (e.g. JDBC ) , we can avoid partial-agg and final-agg by running completely on database.
### Why are the changes needed?
Improve performance for aggregate pushdown.
### Does this PR introduce _any_ user-facing change?
'No'. Just change the inner implement.
### How was this patch tested?
New tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994738680
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146222/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994361361
**[Test build #146211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146211/testReport)** for PR 34904 at commit [`b414ccb`](https://github.com/apache/spark/commit/b414ccbb32da992043ed50565c03cdd69cc6a00e).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r769794944
##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java
##########
@@ -45,6 +45,14 @@
@Evolving
public interface SupportsPushDownAggregates extends ScanBuilder {
+ /**
+ * Whether the datasource support complete aggregation push-down. Spark could avoid partial-agg
+ * and final-agg when the aggregation operation can be pushed down to the datasource completely.
+ *
+ * @return true if the aggregation can be pushed down to datasource completely, false otherwise.
+ */
+ boolean supportCompletePushDown();
Review comment:
This is an already-released API. Let's provide a default return value to avoid breaking existing implementations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997582331
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50859/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995973924
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50763/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995771124
**[Test build #146289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146289/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771378975
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val groupOutputLength = resultExpressions.length - aggOutput.length
Review comment:
Thank you for you reminder.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999517302
**[Test build #146479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146479/testReport)** for PR 34904 at commit [`c939885`](https://github.com/apache/spark/commit/c939885560c0eef3959a378abe87e0ec84de7088).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999248228
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50935/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304275
**[Test build #146464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146464/testReport)** for PR 34904 at commit [`448bc7f`](https://github.com/apache/spark/commit/448bc7ff630450f4ca1103c03cf983e14246291d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r772197103
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,56 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val projectExpressions = resultExpressions.map { expr =>
+ expr.transform {
+ case agg: AggregateExpression =>
+ val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+ val aggAttribute = aggOutput(ordinal)
+ val child = if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+ aggAttribute
+ } else {
+ Cast(aggAttribute, agg.resultAttribute.dataType)
+ }
+ Alias(child, agg.resultAttribute.name)(agg.resultAttribute.exprId)
+ }
+ }.asInstanceOf[Seq[NamedExpression]]
Review comment:
Thanks for the reminder.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999223617
**[Test build #146460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146460/testReport)** for PR 34904 at commit [`ee36dbb`](https://github.com/apache/spark/commit/ee36dbbf25f722530c09c0205cb203642821a340).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999365897
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50941/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999330036
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50941/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994481806
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50692/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770341543
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ val complexOperators = resultExpressions.flatMap { expr =>
Review comment:
I got it. We can use project to replace it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995865000
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50761/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994361893
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146211/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994361893
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146211/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995844852
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50761/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995973924
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50763/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996112823
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146291/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995417445
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50724/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994455562
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50692/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994279032
**[Test build #146211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146211/testReport)** for PR 34904 at commit [`b414ccb`](https://github.com/apache/spark/commit/b414ccbb32da992043ed50565c03cdd69cc6a00e).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770162045
##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java
##########
@@ -45,6 +45,14 @@
@Evolving
public interface SupportsPushDownAggregates extends ScanBuilder {
+ /**
+ * Whether the datasource support complete aggregation push-down. Spark could avoid partial-agg
+ * and final-agg when the aggregation operation can be pushed down to the datasource completely.
+ *
+ * @return true if the aggregation can be pushed down to datasource completely, false otherwise.
+ */
+ boolean supportCompletePushDown();
Review comment:
OK
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995589009
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50741/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995380231
**[Test build #146250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146250/testReport)** for PR 34904 at commit [`ac187cc`](https://github.com/apache/spark/commit/ac187cca966a1b5f3511d72a5a572dd18e2d0748).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994343433
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50685/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995957758
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50763/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997680343
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146384/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996023432
**[Test build #146289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146289/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997849468
**[Test build #146398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146398/testReport)** for PR 34904 at commit [`24ce91d`](https://github.com/apache/spark/commit/24ce91d51379e192c35529f90204db137390f570).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997931128
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50873/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999399735
**[Test build #146467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146467/testReport)** for PR 34904 at commit [`4575c71`](https://github.com/apache/spark/commit/4575c714019b61c0f2b96941091de56cb8adbd17).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999400000
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146467/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304786
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146464/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999323083
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50939/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995507992
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146250/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994295446
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50685/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995509848
**[Test build #146266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146266/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997972081
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50874/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998037696
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50874/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995875853
**[Test build #146291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146291/testReport)** for PR 34904 at commit [`9d9cd64`](https://github.com/apache/spark/commit/9d9cd64c7318bdc2bb9ef82f8e3ce41e6c8ff44b).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995771124
**[Test build #146289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146289/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771106583
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val groupOutputLength = resultExpressions.length - aggOutput.length
+ val aggExpressions = resultExpressions.drop(groupOutputLength).map { expr =>
Review comment:
We need to convert aggregate to project, and we need to:
1. replace aggregate functions with the corresponding attributes from the scan node that has aggregate pushed.
2. replace group by expressions with the corresponding attributes from the scan node that has aggregate pushed. (the query can be `GROUP BY a + b`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771104358
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val groupOutputLength = resultExpressions.length - aggOutput.length
Review comment:
what is this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997544960
**[Test build #146384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146384/testReport)** for PR 34904 at commit [`2384e38`](https://github.com/apache/spark/commit/2384e38bb7779d83e607c9c60970a7ef4ded09ec).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999328531
**[Test build #146467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146467/testReport)** for PR 34904 at commit [`4575c71`](https://github.com/apache/spark/commit/4575c714019b61c0f2b96941091de56cb8adbd17).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997849468
**[Test build #146398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146398/testReport)** for PR 34904 at commit [`24ce91d`](https://github.com/apache/spark/commit/24ce91d51379e192c35529f90204db137390f570).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997935367
**[Test build #146399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146399/testReport)** for PR 34904 at commit [`b4be693`](https://github.com/apache/spark/commit/b4be693650905aee3c038ee79f97ae2a014e65ff).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995915277
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50763/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995865000
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50761/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994321446
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50685/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995767975
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146266/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995808202
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50761/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996112823
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146291/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998098368
**[Test build #146398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146398/testReport)** for PR 34904 at commit [`24ce91d`](https://github.com/apache/spark/commit/24ce91d51379e192c35529f90204db137390f570).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998102581
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146398/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773599203
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -189,6 +207,13 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
}
}
+ private def newAggChild(aggAttribute: AttributeReference, aggDataType: DataType) =
Review comment:
nit: `addCastIfNeeded`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r772219016
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,56 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val projectExpressions = resultExpressions.map { expr =>
+ expr.transform {
+ case agg: AggregateExpression =>
+ val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+ val aggAttribute = aggOutput(ordinal)
+ val child = if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+ aggAttribute
+ } else {
+ Cast(aggAttribute, agg.resultAttribute.dataType)
Review comment:
Because the JDBC protocol returns decimal(20, 2), but spark need decimal(32, 2)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999271490
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50935/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999517302
**[Test build #146479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146479/testReport)** for PR 34904 at commit [`c939885`](https://github.com/apache/spark/commit/c939885560c0eef3959a378abe87e0ec84de7088).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773131123
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -189,6 +204,13 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
}
}
+ private def newAggOutput(aggAttribute: AttributeReference, agg: AggregateExpression) =
+ if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+ aggAttribute
+ } else {
+ Cast(aggAttribute, agg.resultAttribute.dataType)
Review comment:
I think complete and partial pushdown are different here.
For complete pushdown, we should cast to the data type of the aggregate function.
For partial pushdown, Spark will run aggregate again, so we should cast to the data type of the input of the aggregate function, so that the final data type is still the same as before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999729390
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146479/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999725622
**[Test build #146479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146479/testReport)** for PR 34904 at commit [`c939885`](https://github.com/apache/spark/commit/c939885560c0eef3959a378abe87e0ec84de7088).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994525148
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50696/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994560758
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50696/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994350400
**[Test build #146218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146218/testReport)** for PR 34904 at commit [`313d51d`](https://github.com/apache/spark/commit/313d51d334702fb166aaa2a440cd9ffbbebe62cb).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770163031
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ val complexOperators = resultExpressions.flatMap { expr =>
Review comment:
It means the aggregate expressions contains complex operators. For example, Sum(a') + Sum(b') only pushdown `Sum(a')` and `Sum(b')`, not contains the `Add`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994474895
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50696/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995398614
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50724/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995541166
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50741/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996024943
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146289/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997590518
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50859/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997677240
**[Test build #146384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146384/testReport)** for PR 34904 at commit [`2384e38`](https://github.com/apache/spark/commit/2384e38bb7779d83e607c9c60970a7ef4ded09ec).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999283579
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r772173898
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,56 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val projectExpressions = resultExpressions.map { expr =>
+ expr.transform {
+ case agg: AggregateExpression =>
+ val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+ val aggAttribute = aggOutput(ordinal)
+ val child = if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+ aggAttribute
+ } else {
+ Cast(aggAttribute, agg.resultAttribute.dataType)
Review comment:
when can we reach this branch? and shall we add cast in the partial agg pushdown branch as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997933838
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50873/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-1000112045
cc @huaxingao
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #34904:
URL: https://github.com/apache/spark/pull/34904
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999328531
**[Test build #146467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146467/testReport)** for PR 34904 at commit [`4575c71`](https://github.com/apache/spark/commit/4575c714019b61c0f2b96941091de56cb8adbd17).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995589009
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50741/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995762482
**[Test build #146266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146266/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994403730
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50692/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994487177
**[Test build #146218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146218/testReport)** for PR 34904 at commit [`313d51d`](https://github.com/apache/spark/commit/313d51d334702fb166aaa2a440cd9ffbbebe62cb).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994560758
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50696/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994729636
**[Test build #146222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146222/testReport)** for PR 34904 at commit [`b76f5f8`](https://github.com/apache/spark/commit/b76f5f87859a6e2011fe7a45a3804589cd09d16c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r769796739
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ val complexOperators = resultExpressions.flatMap { expr =>
Review comment:
what does this mean?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999588051
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50955/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999729390
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146479/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999589619
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50955/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997544960
**[Test build #146384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146384/testReport)** for PR 34904 at commit [`2384e38`](https://github.com/apache/spark/commit/2384e38bb7779d83e607c9c60970a7ef4ded09ec).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773599024
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val projectExpressions = resultExpressions.map { expr =>
+ // TODO At present, only push down group by attribute is supported.
+ // In future, more attribute conversion is extended here. e.g. GetStructField
+ expr.transform {
+ case agg: AggregateExpression =>
+ val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+ val child = newAggChild(aggOutput(ordinal), agg.resultAttribute.dataType)
+ Alias(child, agg.resultAttribute.name)(agg.resultAttribute.exprId)
+ }
+ }.asInstanceOf[Seq[NamedExpression]]
+ Project(projectExpressions, scanRelation)
+ } else {
+ val plan = Aggregate(
+ output.take(groupingExpressions.length), resultExpressions, scanRelation)
+
+ // scalastyle:off
+ // Change the optimized logical plan to reflect the pushed down aggregate
+ // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
+ // SELECT min(c1), max(c1) FROM t GROUP BY c2;
+ // The original logical plan is
+ // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
+ // +- RelationV2[c1#9, c2#10] ...
+ //
+ // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
+ // we have the following
+ // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
+ // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
+ //
+ // We want to change it to
+ // == Optimized Logical Plan ==
+ // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
+ // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
+ // scalastyle:on
+ plan.transformExpressions {
+ case agg: AggregateExpression =>
+ val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+ val aggAttribute = aggOutput(ordinal)
+ val aggFunction: aggregate.AggregateFunction =
+ agg.aggregateFunction match {
+ case max: aggregate.Max =>
+ max.copy(child = newAggChild(aggAttribute, max.child.dataType))
+ case min: aggregate.Min =>
+ min.copy(child = newAggChild(aggAttribute, min.child.dataType))
+ case sum: aggregate.Sum =>
+ sum.copy(child = newAggChild(aggAttribute, sum.child.dataType))
+ case _: aggregate.Count => aggregate.Sum(aggAttribute)
Review comment:
For `count`, I think we should cast the aggAttr to long type, to make sure `Sum(aggAttribute)` also returns long.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999365844
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50941/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771381985
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val groupOutputLength = resultExpressions.length - aggOutput.length
+ val aggExpressions = resultExpressions.drop(groupOutputLength).map { expr =>
Review comment:
Thank you for the reminder.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997879305
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50873/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999275911
**[Test build #146460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146460/testReport)** for PR 34904 at commit [`ee36dbb`](https://github.com/apache/spark/commit/ee36dbbf25f722530c09c0205cb203642821a340).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994343433
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50685/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994279032
**[Test build #146211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146211/testReport)** for PR 34904 at commit [`b414ccb`](https://github.com/apache/spark/commit/b414ccbb32da992043ed50565c03cdd69cc6a00e).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994738680
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146222/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994414817
**[Test build #146222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146222/testReport)** for PR 34904 at commit [`b76f5f8`](https://github.com/apache/spark/commit/b76f5f87859a6e2011fe7a45a3804589cd09d16c).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994854225
ping @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995498674
**[Test build #146250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146250/testReport)** for PR 34904 at commit [`ac187cc`](https://github.com/apache/spark/commit/ac187cca966a1b5f3511d72a5a572dd18e2d0748).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995380231
**[Test build #146250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146250/testReport)** for PR 34904 at commit [`ac187cc`](https://github.com/apache/spark/commit/ac187cca966a1b5f3511d72a5a572dd18e2d0748).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995509848
**[Test build #146266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146266/testReport)** for PR 34904 at commit [`f1d523f`](https://github.com/apache/spark/commit/f1d523ffee09de613f93427805eaf51f68d42d0d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995507992
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146250/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996024943
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146289/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997560590
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50859/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998181418
**[Test build #146399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146399/testReport)** for PR 34904 at commit [`b4be693`](https://github.com/apache/spark/commit/b4be693650905aee3c038ee79f97ae2a014e65ff).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304771
**[Test build #146464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146464/testReport)** for PR 34904 at commit [`448bc7f`](https://github.com/apache/spark/commit/448bc7ff630450f4ca1103c03cf983e14246291d).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304275
**[Test build #146464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146464/testReport)** for PR 34904 at commit [`448bc7f`](https://github.com/apache/spark/commit/448bc7ff630450f4ca1103c03cf983e14246291d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999356495
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50939/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998034793
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50874/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999223617
**[Test build #146460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146460/testReport)** for PR 34904 at commit [`ee36dbb`](https://github.com/apache/spark/commit/ee36dbbf25f722530c09c0205cb203642821a340).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997680343
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146384/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998184862
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146399/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r776155741
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
##########
@@ -72,6 +72,9 @@ case class JDBCScanBuilder(
private var pushedGroupByCols: Option[Array[String]] = None
+ override def supportCompletePushDown: Boolean =
+ jdbcOptions.numPartitions.map(_ == 1).getOrElse(true)
Review comment:
I will followup a PR. @huaxingao Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-1000375465
@cloud-fan Thanks a lot!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994350400
**[Test build #146218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146218/testReport)** for PR 34904 at commit [`313d51d`](https://github.com/apache/spark/commit/313d51d334702fb166aaa2a440cd9ffbbebe62cb).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994487468
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146218/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994481806
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50692/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998037696
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50874/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997590518
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50859/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r772178127
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,56 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val projectExpressions = resultExpressions.map { expr =>
+ expr.transform {
+ case agg: AggregateExpression =>
+ val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
+ val aggAttribute = aggOutput(ordinal)
+ val child = if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+ aggAttribute
+ } else {
+ Cast(aggAttribute, agg.resultAttribute.dataType)
+ }
+ Alias(child, agg.resultAttribute.name)(agg.resultAttribute.exprId)
+ }
+ }.asInstanceOf[Seq[NamedExpression]]
Review comment:
According to the DS v2 API, it's possible to push down `GROUP BY a.b`, and we need to replace `GetStructField(Attr("a"), "b")` with the group col attribute from the scan relation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997933838
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50873/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-997935367
**[Test build #146399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146399/testReport)** for PR 34904 at commit [`b4be693`](https://github.com/apache/spark/commit/b4be693650905aee3c038ee79f97ae2a014e65ff).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998102581
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146398/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995570766
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50741/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995427125
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50724/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995767383
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995875853
**[Test build #146291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146291/testReport)** for PR 34904 at commit [`9d9cd64`](https://github.com/apache/spark/commit/9d9cd64c7318bdc2bb9ef82f8e3ce41e6c8ff44b).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999304786
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146464/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999400000
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146467/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999589619
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50955/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994414817
**[Test build #146222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146222/testReport)** for PR 34904 at commit [`b76f5f8`](https://github.com/apache/spark/commit/b76f5f87859a6e2011fe7a45a3804589cd09d16c).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995427125
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50724/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770241266
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ val complexOperators = resultExpressions.flatMap { expr =>
Review comment:
If I read the code correctly, you push down nothing for `sum(a) + sum(b)`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-994487468
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146218/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r770248411
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -146,40 +147,57 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ val complexOperators = resultExpressions.flatMap { expr =>
Review comment:
V2ScanRelationPushDown pushdown `sum(a)` and `sum(b)` to datasource , not `sum(a) + sum(b)`. ref:
https://github.com/apache/spark/blob/ac187cca966a1b5f3511d72a5a572dd18e2d0748/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L91
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-996111394
**[Test build #146291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146291/testReport)** for PR 34904 at commit [`9d9cd64`](https://github.com/apache/spark/commit/9d9cd64c7318bdc2bb9ef82f8e3ce41e6c8ff44b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* ` (defaultdict(<class 'list'>, `
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r771105727
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -147,40 +148,58 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper {
val scanRelation = DataSourceV2ScanRelation(sHolder.relation, wrappedScan, output)
- val plan = Aggregate(
- output.take(groupingExpressions.length), resultExpressions, scanRelation)
-
- // scalastyle:off
- // Change the optimized logical plan to reflect the pushed down aggregate
- // e.g. TABLE t (c1 INT, c2 INT, c3 INT)
- // SELECT min(c1), max(c1) FROM t GROUP BY c2;
- // The original logical plan is
- // Aggregate [c2#10],[min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c1#9, c2#10] ...
- //
- // After change the V2ScanRelation output to [c2#10, min(c1)#21, max(c1)#22]
- // we have the following
- // !Aggregate [c2#10], [min(c1#9) AS min(c1)#17, max(c1#9) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- //
- // We want to change it to
- // == Optimized Logical Plan ==
- // Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18]
- // +- RelationV2[c2#10, min(c1)#21, max(c1)#22] ...
- // scalastyle:on
- val aggOutput = output.drop(groupAttrs.length)
- plan.transformExpressions {
- case agg: AggregateExpression =>
- val ordinal = aggExprToOutputOrdinal(agg.canonicalized)
- val aggFunction: aggregate.AggregateFunction =
- agg.aggregateFunction match {
- case max: aggregate.Max => max.copy(child = aggOutput(ordinal))
- case min: aggregate.Min => min.copy(child = aggOutput(ordinal))
- case sum: aggregate.Sum => sum.copy(child = aggOutput(ordinal))
- case _: aggregate.Count => aggregate.Sum(aggOutput(ordinal))
- case other => other
- }
- agg.copy(aggregateFunction = aggFunction)
+ if (r.supportCompletePushDown()) {
+ val groupOutputLength = resultExpressions.length - aggOutput.length
Review comment:
The result expressions have nothing to do with number of group columns, people can do `SELECT a + b, max(c) - max(d) FROM t GROUP BY a, b`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-995767975
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146266/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773129127
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -633,4 +715,30 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
}
checkAnswer(query, Seq(Row(29000.0)))
}
+
+ test("scan with aggregate push-down: SUM(CASE WHEN) with group by") {
+ val df =
+ sql("SELECT SUM(CASE WHEN SALARY > 0 THEN 1 ELSE 0 END) FROM h2.test.employee GROUP BY DEPT")
+ checkAggregateRemoved(df, false)
+ df.queryExecution.optimizedPlan.collect {
+ case _: DataSourceV2ScanRelation =>
+ val expected_plan_fragment =
+ "PushedFilters: [], "
+ checkKeywordsExistsInExplain(df, expected_plan_fragment)
+ }
+ checkAnswer(df, Seq(Row(1), Row(2), Row(2)))
+ }
+
+ test("scan with aggregate push-down: SUM(NVL) with group by") {
Review comment:
This test case looks redundant to the above one and we can remove it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-998184862
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146399/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999283580
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999356495
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50939/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999365897
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50941/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999348942
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50939/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-1000362478
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34904:
URL: https://github.com/apache/spark/pull/34904#issuecomment-999544778
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50955/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown
Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r776149870
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
##########
@@ -72,6 +72,9 @@ case class JDBCScanBuilder(
private var pushedGroupByCols: Option[Array[String]] = None
+ override def supportCompletePushDown: Boolean =
+ jdbcOptions.numPartitions.map(_ == 1).getOrElse(true)
Review comment:
In the case of multiple partitions, if partition columns are the same as group by columns, should `supportCompletePushDown` be set to true as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org