You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/28 15:11:32 UTC
[GitHub] [spark] ulysses-you opened a new pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
ulysses-you opened a new pull request #32084:
URL: https://github.com/apache/spark/pull/32084
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
4. Be sure to keep the PR description updated to reflect all changes.
5. Please write your PR title to summarize what this PR proposes.
6. If possible, provide a concise example to reproduce the issue for a faster review.
7. If you want to add a new configuration, please read the guideline first for naming configurations in
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
-->
### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
2. If you fix some SQL features, you can provide some references of other DBMSes.
3. If there is design documentation, please add the link.
4. If there is a discussion in the mailing list, please add the link.
-->
- Split plan into several groups, and every child of union is a new group
- Coalesce paritition for every group
### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
-->
#### First Issue
The rule `CoalesceShufflePartitions` can only coalesce paritition if
* leaf node is ShuffleQueryStage
* all shuffle have same partition number
With `Union`, it might break the assumption. Let's say we have such plan
```
Union
HashAggregate
ShuffleQueryStage
FileScan
```
`CoalesceShufflePartitions` can not optimize it and the result partition would be `shuffle partition + FileScan partition` which can be quite lagre.
It's better to support partial optimize with `Union`.
#### Second Issue
the coalesce partition formule used the **sum value** as the total input size and it's not friendly for union, see
```
// ShufflePartitionsUtil.coalescePartitions
val totalPostShuffleInputSize = mapOutputStatistics.flatMap(_.map(_.bytesByPartitionId.sum)).sum
```
So for such case:
```
Union
HashAggregate
ShuffleQueryStage
HashAggregate
ShuffleQueryStage
```
The `CoalesceShufflePartitions` rule will return an unexpected partition number.
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
If no, write 'No'.
-->
Probably yes, the result partition might changed.
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
-->
Add test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855802452
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43929/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929769619
**[Test build #143695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143695/testReport)** for PR 32084 at commit [`e2b25b4`](https://github.com/apache/spark/commit/e2b25b4f35b507665029162efc4e2808fecd14e3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you closed pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you closed pull request #32084:
URL: https://github.com/apache/spark/pull/32084
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610652043
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,51 @@ class AdaptiveQueryExecSuite
checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
}
}
+
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row], shuffleReaderNumber: Int, partitionNumber: Int): Unit = {
+ df.collect()
+ assert(
+ collect(df.queryExecution.executedPlan) {
+ case s: CustomShuffleReaderExec => s
+ }.size === shuffleReaderNumber
+ )
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+ val df1 = spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)), 2).toDF()
+ val df2 = spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)), 4).toDF()
+
+ // positive test
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2),
+ 1,
+ 1 + 4)
+
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2).unionAll(df1),
+ 1,
+ 1 + 4 + 2)
+
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2).unionAll(df1.groupBy("key").count()),
+ 2,
+ 1 + 4 + 1)
Review comment:
+1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819199418
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41897/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858518626
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44163/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322078
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139617/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920921348
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47855/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818408284
**[Test build #137273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137273/testReport)** for PR 32084 at commit [`a966759`](https://github.com/apache/spark/commit/a966759afa0cf7e71440a344ae833fab7835a6a0).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818408284
**[Test build #137273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137273/testReport)** for PR 32084 at commit [`a966759`](https://github.com/apache/spark/commit/a966759afa0cf7e71440a344ae833fab7835a6a0).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816596294
**[Test build #137115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137115/testReport)** for PR 32084 at commit [`be8d584`](https://github.com/apache/spark/commit/be8d584270c2b93e8f5539f98ed8b67560587673).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322341
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44144/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854762987
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139335/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816438833
thank you @maropu for the review. Has addressed the comment that made code more readable and added more tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322051
**[Test build #139617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717712983
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala
##########
@@ -412,12 +412,10 @@ class CoalesceShufflePartitionsSuite extends SparkFunSuite with BeforeAndAfterAl
val finalPlan = resultDf.queryExecution.executedPlan
.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
- // As the pre-shuffle partition number are different, we will skip reducing
- // the shuffle partition numbers.
Review comment:
let's update the comment
```
// Shuffle partition coalescing of the join is performed independent of the non-grouping
// aggregate on the other side of the union.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918861253
**[Test build #143247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143247/testReport)** for PR 32084 at commit [`eeb4047`](https://github.com/apache/spark/commit/eeb4047a59773a6dba64a30b2e184badf69548b1).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921022688
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47859/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855756409
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43929/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818499547
**[Test build #137272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137272/testReport)** for PR 32084 at commit [`a001522`](https://github.com/apache/spark/commit/a00152234a67bd0efafef8083cf750fe83f01bdf).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class CoalesceShufflePartitions(session: SparkSession)`
* `trait UnionAwareOptimizerRule `
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872348799
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45032/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855714028
**[Test build #139407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139407/testReport)** for PR 32084 at commit [`797ea59`](https://github.com/apache/spark/commit/797ea5944cd405612cf3972f72ebf963cea694fc).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r613123229
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {
Review comment:
Thank you for the input. Actual the rule(child) does not consider the children of the Union, as it just try to optimize the plan.
For the nested Union. Here are two case, the first is ok if we skip the check but the second can be optimized through every Union that will cause the repetitiion.
```
Union
HashAggregate
ShuffleQueryStage
Union
HashAggregate
ShuffleQueryStage
FileScan
```
```
Union
HashAggregate
ShuffleQueryStage
Union
HashAggregate
ShuffleQueryStage
HashAggregate
ShuffleQueryStage
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610655569
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
- if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
- || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
- // If not all leaf nodes are query stages, it's not safe to reduce the number of
- // shuffle partitions, because we may break the assumption that all children of a spark plan
- // have same number of output partitions.
- return plan
+
+ if (canCoalescePartitions(plan)) {
Review comment:
Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the number of partitions.
This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "points" that can split groups.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816439939
**[Test build #137115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137115/testReport)** for PR 32084 at commit [`be8d584`](https://github.com/apache/spark/commit/be8d584270c2b93e8f5539f98ed8b67560587673).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819170943
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921015893
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47859/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920926709
**[Test build #143351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143351/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921421018
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47888/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921544045
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143381/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921398098
**[Test build #143381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143381/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322078
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139617/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816469642
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41694/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929794226
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48211/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929820764
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48210/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929333047
**[Test build #143679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143679/testReport)** for PR 32084 at commit [`f362c9f`](https://github.com/apache/spark/commit/f362c9fb387dcad38adec2c047bb256009d26744).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929821939
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48211/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929821989
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48211/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-930027529
thank you @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927233448
**[Test build #143632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143632/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927276805
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143632/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717712983
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala
##########
@@ -412,12 +412,10 @@ class CoalesceShufflePartitionsSuite extends SparkFunSuite with BeforeAndAfterAl
val finalPlan = resultDf.queryExecution.executedPlan
.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
- // As the pre-shuffle partition number are different, we will skip reducing
- // the shuffle partition numbers.
Review comment:
let's update the comment
```
// Shuffle partition coalescing of the join is performed independent of the non-grouping
// aggregate on the other side of the union.
```
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ unionNumber: Int,
+ shuffleReaderNumber: Int,
+ partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ // advisory partition size 1048576 has no special meaning, just a big enough value
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+ combineUnionConfig) {
+ withTempView("t1", "t2") {
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+ .toDF().createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+ .toDF().createOrReplaceTempView("t2")
+
+ // positive test that could be coalesced
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ """.stripMargin),
+ if (combineUnionEnabled) 1 else 1,
Review comment:
```suggestion
1
```
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ unionNumber: Int,
+ shuffleReaderNumber: Int,
+ partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ // advisory partition size 1048576 has no special meaning, just a big enough value
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+ combineUnionConfig) {
+ withTempView("t1", "t2") {
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+ .toDF().createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+ .toDF().createOrReplaceTempView("t2")
+
+ // positive test that could be coalesced
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ """.stripMargin),
+ if (combineUnionEnabled) 1 else 1,
Review comment:
```suggestion
unionNumber = 1
```
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ unionNumber: Int,
+ shuffleReaderNumber: Int,
+ partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
Review comment:
does this really matter for the "coalesce through union" feature? I think we can just test the default case, which means this rule is enabled.
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ unionNumber: Int,
+ shuffleReaderNumber: Int,
+ partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ // advisory partition size 1048576 has no special meaning, just a big enough value
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+ combineUnionConfig) {
+ withTempView("t1", "t2") {
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+ .toDF().createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+ .toDF().createOrReplaceTempView("t2")
+
+ // positive test that could be coalesced
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ """.stripMargin),
+ if (combineUnionEnabled) 1 else 1,
+ 1,
+ 1 + 4)
+
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ |UNION ALL
+ |SELECT * FROM t1
+ """.stripMargin),
+ if (combineUnionEnabled) 1 else 2,
+ 1,
+ 1 + 4 + 2)
+
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ |UNION ALL
+ |SELECT * FROM t1
+ |UNION ALL
+ |SELECT key, count(*) FROM t2 GROUP BY key
Review comment:
it's not very useful to test 3 unions, as it's similar to the 2 cases above.
Let's test SMJ UNION AGG
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855780960
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43929/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610655569
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
- if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
- || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
- // If not all leaf nodes are query stages, it's not safe to reduce the number of
- // shuffle partitions, because we may break the assumption that all children of a spark plan
- // have same number of output partitions.
- return plan
+
+ if (canCoalescePartitions(plan)) {
Review comment:
Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the same number of partitions.
This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "breaking points" that can split groups in the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854620111
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610677829
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
- if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
- || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
- // If not all leaf nodes are query stages, it's not safe to reduce the number of
- // shuffle partitions, because we may break the assumption that all children of a spark plan
- // have same number of output partitions.
- return plan
+
+ if (canCoalescePartitions(plan)) {
+ coalescePartitions(plan)
+ } else {
+ plan.transformUp {
+ case u: UnionExec =>
+ u.withNewChildren(u.children.map { child =>
+ if (canCoalescePartitions(child) &&
+ child.find(_.isInstanceOf[UnionExec]).isEmpty) {
Review comment:
We should add the coalesce if it's children don't have `Union` to avoid adding duplicate `CustomShufflerReader`.
Without `CombineUnions`, the plan can be
````
Union
Union
HashAggregate
ShuffleQueryStage
FileScan
FileScan
````
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929772354
**[Test build #143696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143696/testReport)** for PR 32084 at commit [`b2e3848`](https://github.com/apache/spark/commit/b2e3848d29a99ff415edfc4cb128b1ea6fd685cf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r718092179
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,89 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ numUnion: Int,
+ numShuffleReader: Int,
+ numPartition: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == numUnion)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === numShuffleReader)
+ assert(df.rdd.partitions.length === numPartition)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
Review comment:
this will set a config whose key is an empty string. I think it's safer to do `SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> ""`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929906030
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929891323
**[Test build #143696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143696/testReport)** for PR 32084 at commit [`b2e3848`](https://github.com/apache/spark/commit/b2e3848d29a99ff415edfc4cb128b1ea6fd685cf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854576858
**[Test build #139335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139335/testReport)** for PR 32084 at commit [`525d4d4`](https://github.com/apache/spark/commit/525d4d4c7a120967409baee65d97bf5e27a2fdbe).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880321182
**[Test build #141043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141043/testReport)** for PR 32084 at commit [`9743c0c`](https://github.com/apache/spark/commit/9743c0cf672129f9b30177279f803f9c8c8752d3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918867646
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47750/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927233448
**[Test build #143632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143632/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921417781
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47888/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921542931
**[Test build #143381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143381/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819204254
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41897/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818483062
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41851/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818705208
**[Test build #137284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137284/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818512659
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610699855
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,51 @@ class AdaptiveQueryExecSuite
checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
}
}
+
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row], shuffleReaderNumber: Int, partitionNumber: Int): Unit = {
+ df.collect()
+ assert(
+ collect(df.queryExecution.executedPlan) {
+ case s: CustomShuffleReaderExec => s
+ }.size === shuffleReaderNumber
+ )
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+ val df1 = spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)), 2).toDF()
+ val df2 = spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)), 4).toDF()
+
+ // positive test
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2),
+ 1,
+ 1 + 4)
+
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2).unionAll(df1),
+ 1,
+ 1 + 4 + 2)
+
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2).unionAll(df1.groupBy("key").count()),
+ 2,
+ 1 + 4 + 1)
Review comment:
I considered about it.. OK will add it later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872320654
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45032/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816439939
**[Test build #137115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137115/testReport)** for PR 32084 at commit [`be8d584`](https://github.com/apache/spark/commit/be8d584270c2b93e8f5539f98ed8b67560587673).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818707157
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137284/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818561308
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41863/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818407365
**[Test build #137272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137272/testReport)** for PR 32084 at commit [`a001522`](https://github.com/apache/spark/commit/a00152234a67bd0efafef8083cf750fe83f01bdf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610655569
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
- if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
- || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
- // If not all leaf nodes are query stages, it's not safe to reduce the number of
- // shuffle partitions, because we may break the assumption that all children of a spark plan
- // have same number of output partitions.
- return plan
+
+ if (canCoalescePartitions(plan)) {
Review comment:
Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the same number of partitions.
This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "points" that can split groups.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610655569
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
- if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
- || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
- // If not all leaf nodes are query stages, it's not safe to reduce the number of
- // shuffle partitions, because we may break the assumption that all children of a spark plan
- // have same number of output partitions.
- return plan
+
+ if (canCoalescePartitions(plan)) {
Review comment:
Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the same number of partitions.
This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "points" that can split groups in the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819300616
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137317/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858518626
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44163/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717714919
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ unionNumber: Int,
+ shuffleReaderNumber: Int,
+ partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ // advisory partition size 1048576 has no special meaning, just a big enough value
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+ combineUnionConfig) {
+ withTempView("t1", "t2") {
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+ .toDF().createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+ .toDF().createOrReplaceTempView("t2")
+
+ // positive test that could be coalesced
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ """.stripMargin),
+ if (combineUnionEnabled) 1 else 1,
Review comment:
```suggestion
unionNumber = 1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920877736
**[Test build #143349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143349/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921197715
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143352/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927229959
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920877823
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143349/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920920346
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r718092390
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,89 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ numUnion: Int,
+ numShuffleReader: Int,
+ numPartition: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == numUnion)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === numShuffleReader)
+ assert(df.rdd.partitions.length === numPartition)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ // advisory partition size 1048576 has no special meaning, just a big enough value
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+ combineUnionConfig) {
+ withTempView("t1", "t2") {
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+ .toDF().createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+ .toDF().createOrReplaceTempView("t2")
+
+ // positive test that could be coalesced
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ """.stripMargin),
+ 1,
Review comment:
can we put the parameter name to make the test more readable? `numUnion = 1`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929333047
**[Test build #143679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143679/testReport)** for PR 32084 at commit [`f362c9f`](https://github.com/apache/spark/commit/f362c9fb387dcad38adec2c047bb256009d26744).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929428597
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48193/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929415714
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48193/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872404616
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140519/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880321182
**[Test build #141043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141043/testReport)** for PR 32084 at commit [`9743c0c`](https://github.com/apache/spark/commit/9743c0cf672129f9b30177279f803f9c8c8752d3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880323090
**[Test build #141043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141043/testReport)** for PR 32084 at commit [`9743c0c`](https://github.com/apache/spark/commit/9743c0cf672129f9b30177279f803f9c8c8752d3).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `public final class SparkOutOfMemoryError extends OutOfMemoryError implements SparkThrowable `
* ` throw new IllegalArgumentException(s\"Cannot find error class '$errorClass'\"))`
* `class SparkArithmeticException(errorClass: String, messageParameters: Array[String])`
* ` class RemoteBlockDownloadFileManager(`
* `case class UnresolvedFieldPosition(position: ColumnPosition) extends FieldPosition `
* `case class ExpressionEquals(e: Expression) `
* `case class ExpressionStats(expr: Expression)(var useCount: Int = 1) `
* `case class Average(`
* `case class Sum(`
* `case class SubExprEliminationState(eval: ExprCode, children: Seq[SubExprEliminationState])`
* `case class LocalTimestamp(timeZoneId: Option[String] = None) extends LeafExpression`
* `case class GetTimestamp(`
* `case class ParseToTimestampLTZ(`
* `case class ParseToTimestamp(`
* `case class MakeTimestampNTZ(`
* `case class MakeTimestampLTZ(`
* `case class DomainJoin(`
* ` .doc(\"The custom cost evaluator class to be used for adaptive execution. If not being set,\" +`
* ` static class IntegerUpdater implements ParquetVectorUpdater `
* `class MergingSortWithSessionWindowStateIterator(`
* `trait HDFSBackedStateStoreMap `
* `class NoPrefixHDFSBackedStateStoreMap extends HDFSBackedStateStoreMap `
* `class PrefixScannableHDFSBackedStateStoreMap(`
* ` class HDFSBackedReadStateStore(val version: Long, map: HDFSBackedStateStoreMap)`
* ` class HDFSBackedStateStore(val version: Long, mapToUpdate: HDFSBackedStateStoreMap)`
* `case class RocksDBMetrics(`
* `case class RocksDBNativeHistogram(`
* `case class RocksDBFileManagerMetrics(`
* `sealed trait RocksDBStateEncoder `
* `class PrefixKeyScanStateEncoder(`
* `class NoPrefixKeyStateEncoder(keySchema: StructType, valueSchema: StructType)`
* ` class RocksDBStateStore(lastVersion: Long) extends StateStore `
* `sealed trait StreamingSessionWindowStateManager extends Serializable `
* `class StreamingSessionWindowStateManagerImplV1(`
* `class StreamingSessionWindowHelper(sessionExpression: Attribute, inputSchema: Seq[Attribute]) `
* `trait WatermarkSupport extends SparkPlan `
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322328
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44144/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #32084:
URL: https://github.com/apache/spark/pull/32084
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #32084:
URL: https://github.com/apache/spark/pull/32084
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815437826
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929820764
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921558465
Now, there are two issues for Union in rule `CoalesceShufflePartitions` and I updated the description to make them more clear.
cc @cloud-fan @maryannxue @JkSelf @viirya if you have time to take a look.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854601157
**[Test build #139336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139336/testReport)** for PR 32084 at commit [`7adda56`](https://github.com/apache/spark/commit/7adda56a5ce8336caaa9d72bfa28961669c22c66).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class CurrentUser() extends LeafExpression with Unevaluable `
* `case class ReplaceCurrentLike(catalogManager: CatalogManager) extends Rule[LogicalPlan] `
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854714719
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43858/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612198616
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {
Review comment:
We can always add abstraction later when we need to reuse the code in other places. For now let's focus on the coalesce shuffle partitions rule first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610642147
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,51 @@ class AdaptiveQueryExecSuite
checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
}
}
+
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row], shuffleReaderNumber: Int, partitionNumber: Int): Unit = {
+ df.collect()
+ assert(
+ collect(df.queryExecution.executedPlan) {
+ case s: CustomShuffleReaderExec => s
+ }.size === shuffleReaderNumber
+ )
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+ val df1 = spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)), 2).toDF()
+ val df2 = spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)), 4).toDF()
+
+ // positive test
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2),
+ 1,
+ 1 + 4)
+
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2).unionAll(df1),
+ 1,
+ 1 + 4 + 2)
+
+ checkResultPartition(
+ df1.groupBy("key").count().unionAll(df2).unionAll(df1.groupBy("key").count()),
+ 2,
+ 1 + 4 + 1)
Review comment:
IIUC these physical plans have a single union exec because of `CombineUnions`? Could you add tests for physical plans having multiple union execs?
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
- if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
- || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
- // If not all leaf nodes are query stages, it's not safe to reduce the number of
- // shuffle partitions, because we may break the assumption that all children of a spark plan
- // have same number of output partitions.
- return plan
+
+ if (canCoalescePartitions(plan)) {
+ coalescePartitions(plan)
+ } else {
+ plan.transformUp {
+ case u: UnionExec =>
+ u.withNewChildren(u.children.map { child =>
+ if (canCoalescePartitions(child) &&
+ child.find(_.isInstanceOf[UnionExec]).isEmpty) {
Review comment:
We still need this check `child.find(_.isInstanceOf[UnionExec]).isEmpty` ? It seems `canCoalescePartitions(child)` always return false if `child.find(_.isInstanceOf[UnionExec]).isEmpty` is false?
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
- if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
- || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
- // If not all leaf nodes are query stages, it's not safe to reduce the number of
- // shuffle partitions, because we may break the assumption that all children of a spark plan
- // have same number of output partitions.
- return plan
+
+ if (canCoalescePartitions(plan)) {
+ coalescePartitions(plan)
+ } else {
+ plan.transformUp {
+ case u: UnionExec =>
Review comment:
Could you leave some comments about what this pattern is for?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854714750
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43858/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] JkSelf commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
JkSelf commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612982244
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {
Review comment:
We can simplify the logic of this rule and only ensure that the children in the Union can be optimized. We don’t need to consider whether the child of the Union also contains the union, because when we call rule(child), we will consider the children of the Union recursively.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816466169
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854761608
**[Test build #139335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139335/testReport)** for PR 32084 at commit [`525d4d4`](https://github.com/apache/spark/commit/525d4d4c7a120967409baee65d97bf5e27a2fdbe).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854585920
**[Test build #139336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139336/testReport)** for PR 32084 at commit [`7adda56`](https://github.com/apache/spark/commit/7adda56a5ce8336caaa9d72bfa28961669c22c66).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858316750
**[Test build #139617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818477634
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137273/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858509527
**[Test build #139636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139636/testReport)** for PR 32084 at commit [`377505f`](https://github.com/apache/spark/commit/377505fb8dd137f5a3c41a7314e54857f472e1bb).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920921386
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47855/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818707157
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137284/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818483023
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819178667
**[Test build #137317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137317/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929959057
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921544045
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143381/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918867620
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47750/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918866346
**[Test build #143247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143247/testReport)** for PR 32084 at commit [`eeb4047`](https://github.com/apache/spark/commit/eeb4047a59773a6dba64a30b2e184badf69548b1).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880339094
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45558/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854714750
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43858/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858684427
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139636/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855883427
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139407/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819200088
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41897/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929333047
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854620111
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717714919
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ unionNumber: Int,
+ shuffleReaderNumber: Int,
+ partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ // advisory partition size 1048576 has no special meaning, just a big enough value
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+ combineUnionConfig) {
+ withTempView("t1", "t2") {
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+ .toDF().createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+ .toDF().createOrReplaceTempView("t2")
+
+ // positive test that could be coalesced
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ """.stripMargin),
+ if (combineUnionEnabled) 1 else 1,
Review comment:
```suggestion
1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929793299
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48210/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818477634
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137273/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819297063
**[Test build #137317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137317/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815553450
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137048/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855881776
**[Test build #139407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139407/testReport)** for PR 32084 at commit [`797ea59`](https://github.com/apache/spark/commit/797ea5944cd405612cf3972f72ebf963cea694fc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920970634
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47857/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815581312
cc @maropu @cloud-fan @JkSelf do you have any thought about this ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920913765
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47855/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920983006
**[Test build #143352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143352/testReport)** for PR 32084 at commit [`62fb383`](https://github.com/apache/spark/commit/62fb3835204abb40a278016421669298fa687365).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920983006
**[Test build #143352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143352/testReport)** for PR 32084 at commit [`62fb383`](https://github.com/apache/spark/commit/62fb3835204abb40a278016421669298fa687365).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920877823
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143349/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322341
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44144/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819300616
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137317/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r609693645
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -93,6 +106,15 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
}
}
+ private def shouldApplyChildren(plan: SparkPlan): Boolean = {
+ plan.find(p => shouldApplyChildrenFunc(p)).isDefined
+ }
+
+ private def shouldApplyChildrenFunc(plan: SparkPlan): Boolean = plan match {
+ case _: UnionExec => true
Review comment:
Any other plan node that we can apply the same optimization into? If no, could you inline it?
```
private def shouldApplyChildren(plan: SparkPlan): Boolean = {
plan.find(_.isInstanceOf[Union]).isDefined
}
```
Then, `shouldApplyChildren` -> `hasUnion`?
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,6 +35,19 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
+
+ if (shouldApplyChildren(plan)) {
+ plan.transformUp {
+ case p if shouldApplyChildrenFunc(p) &&
+ !p.children.exists(child => shouldApplyChildren(child)) =>
+ p.withNewChildren(p.children.map(child => applyInternal(child)))
+ }
+ } else {
+ applyInternal(plan)
+ }
+ }
+
+ private def applyInternal(plan: SparkPlan): SparkPlan = {
Review comment:
nit: `applyInternal` -> `coalescePartitions `?
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,6 +35,19 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
if (!conf.coalesceShufflePartitionsEnabled) {
return plan
}
+
+ if (shouldApplyChildren(plan)) {
+ plan.transformUp {
+ case p if shouldApplyChildrenFunc(p) &&
+ !p.children.exists(child => shouldApplyChildren(child)) =>
+ p.withNewChildren(p.children.map(child => applyInternal(child)))
+ }
+ } else {
+ applyInternal(plan)
+ }
Review comment:
This section looks hard-to-read, so could we write it like this?
```
private def canCoalescePartitions(plan: SparkPlan): Boolean = {
plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec]) &&
!plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined
}
...
if (canCoalescePartitions(plan) {
// simple case
return applyInternal(plan)
}
// Handle more cases to coalesce partitions?
...
```
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,26 @@ class AdaptiveQueryExecSuite
checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
}
}
+
+ test("SPARK-34980: Support coalesce partition through union") {
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+ val df1 = spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)), 2).toDF()
+ val df2 = spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)), 4).toDF()
+
+ val df = df1.groupBy("key").count().unionAll(df2)
Review comment:
Could you add more tests for more cases, e.g., multiple unions?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815544107
**[Test build #137048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137048/testReport)** for PR 32084 at commit [`e4bae2b`](https://github.com/apache/spark/commit/e4bae2bf77fa088d41edd02e9155a108fab36c34).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858682569
**[Test build #139636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139636/testReport)** for PR 32084 at commit [`377505f`](https://github.com/apache/spark/commit/377505fb8dd137f5a3c41a7314e54857f472e1bb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855802452
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43929/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818561308
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41863/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921022688
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47859/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816611339
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137115/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815714184
cc @maryannxue too
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858518613
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44163/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929769619
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929428597
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929813144
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48210/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929820764
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929820764
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819178667
**[Test build #137317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137317/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816469642
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41694/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816611339
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137115/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818515251
**[Test build #137284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137284/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880323116
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141043/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880323116
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141043/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818483062
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41851/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921420990
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47888/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921421018
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47888/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858684427
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139636/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858509527
**[Test build #139636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139636/testReport)** for PR 32084 at commit [`377505f`](https://github.com/apache/spark/commit/377505fb8dd137f5a3c41a7314e54857f472e1bb).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-930027529
thank you @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929769619
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929769619
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921022644
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47859/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921094826
**[Test build #143351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143351/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920960736
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47857/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918866398
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143247/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815451835
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815451835
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41626/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872403234
**[Test build #140519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140519/testReport)** for PR 32084 at commit [`fb562fd`](https://github.com/apache/spark/commit/fb562fd5570c05b2830a3884d67210abb40b6bcf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `public class MergedBlockMetaRequest extends AbstractMessage implements RequestMessage `
* `public class MergedBlockMetaSuccess extends AbstractResponseMessage `
* `public abstract class AbstractFetchShuffleBlocks extends BlockTransferMessage `
* `public class FetchShuffleBlockChunks extends AbstractFetchShuffleBlocks `
* `public class FetchShuffleBlocks extends AbstractFetchShuffleBlocks `
* ` throw new IllegalArgumentException(s\"Cannot find error class '$errorClass'\"))`
* `trait SparkError extends Throwable `
* `class SparkException(`
* `class SparkArithmeticException(`
* `final case class FileNameSpec(prefix: String, suffix: String)`
* `case class ShuffleBlockInfo(shuffleId: Int, mapId: Long) `
* `case class ShuffleBlockChunkId(`
* `case class ShuffleMergedDataBlockId(appId: String, shuffleId: Int, reduceId: Int) extends BlockId `
* `case class ShuffleMergedIndexBlockId(`
* `case class ShuffleMergedMetaBlockId(`
* ` case class FetchRequest(`
* ` class AvroSchemaHelper(`
* `class DecimalOps(FractionalOps):`
* `class IntegralExtensionOps(IntegralOps):`
* `class FractionalExtensionOps(FractionalOps):`
* `class StringExtensionOps(StringOps):`
* ` new_class = type(\"NameType\", (NameTypeHolder,), `
* `class GroupBy(Generic[FrameLike], metaclass=ABCMeta):`
* `class DataFrameGroupBy(GroupBy[DataFrame]):`
* `class SeriesGroupBy(GroupBy[Series]):`
* ` new_class = type(\"NameType\", (NameTypeHolder,), `
* `class SparkIndexOpsMethods(Generic[IndexOpsLike], metaclass=ABCMeta):`
* `class SparkSeriesMethods(SparkIndexOpsMethods[\"ps.Series\"]):`
* `class SparkIndexMethods(SparkIndexOpsMethods[\"ps.Index\"]):`
* `class RollingAndExpanding(Generic[FrameLike], metaclass=ABCMeta):`
* `class RollingLike(RollingAndExpanding[FrameLike]):`
* `class Rolling(RollingLike[FrameLike]):`
* `class RollingGroupby(RollingLike[FrameLike]):`
* `class ExpandingLike(RollingAndExpanding[FrameLike]):`
* `class Expanding(ExpandingLike[FrameLike]):`
* `class ExpandingGroupby(ExpandingLike[FrameLike]):`
* `class KubernetesLocalDiskShuffleDataIO(sparkConf: SparkConf) extends ShuffleDataIO `
* `class KubernetesLocalDiskShuffleExecutorComponents(sparkConf: SparkConf)`
* `case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends UnaryExpression`
* `sealed trait FieldName extends LeafExpression with Unevaluable `
* `case class UnresolvedFieldName(name: Seq[String]) extends FieldName `
* `sealed trait FieldPosition extends LeafExpression with Unevaluable `
* `case class UnresolvedFieldPosition(`
* `case class ResolvedFieldName(path: Seq[String], field: StructField) extends FieldName `
* `case class ResolvedFieldPosition(position: ColumnPosition) extends FieldPosition`
* `case class Cast(`
* `class ExpressionContainmentOrdering extends Ordering[Expression] `
* `case class SubExprEliminationState(`
* `case class ArraysZip(children: Seq[Expression], names: Seq[Expression])`
* `case class GetTimestampNTZ(`
* `case class ParseToTimestampNTZ(`
* `case class MakeDTInterval(`
* `case class MakeYMInterval(years: Expression, months: Expression)`
* `case class RebalancePartitions(`
* `trait AlterTableCommand extends UnaryCommand `
* `case class AlterTableDropColumns(`
* `case class AlterTableRenameColumn(`
* `case class AlterTableAlterColumn(`
* ` new AnalysisException(s\"UDF class $className doesn't implement any UDF interface\")`
* ` new AnalysisException(s\"UDF class with $n type arguments is not supported.\")`
* ` new AnalysisException(s\"Can not instantiate class $className, please make sure\" +`
* ` new AnalysisException(s\"Can not load class $className, please make sure it is on the classpath\")`
* ` new SparkException(s\"Cannot find catalog plugin class for catalog '$name': $pluginClassName\")`
* ` new SparkException(\"Cannot instantiate abstract catalog plugin class for \" +`
* ` new SparkException(s\"Can not load in UserDefinedType $`
* `case class DayTimeIntervalType(startField: Byte, endField: Byte) extends AtomicType `
* `case class YearMonthIntervalType(startField: Byte, endField: Byte) extends AtomicType `
* `final class ParquetReadState `
* `public class ParquetVectorUpdaterFactory `
* `case class CommandResult(`
* `case class MergingSessionsExec(`
* `class MergingSessionsIterator(`
* `case class ShowCreateTableExec(`
* `class RocksDB(`
* `class ByteArrayPair(var key: Array[Byte] = null, var value: Array[Byte] = null) `
* `case class RocksDBConf(`
* `case class AcquiredThreadInfo() `
* `case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric `
* `case class StateStoreCustomSizeMetric(name: String, desc: String) extends StateStoreCustomMetric `
* `case class StateStoreCustomTimingMetric(name: String, desc: String) extends StateStoreCustomMetric `
* `trait StatefulOperatorCustomMetric `
* `case class StatefulOperatorCustomSumMetric(name: String, desc: String)`
* `trait TestGroupState[S] extends GroupState[S] `
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929592217
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143679/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927234818
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48144/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818491403
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880350367
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45558/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855695488
refactor the PR, do you have time to take a look ? cc @maropu @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880364957
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45558/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818561267
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872348799
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45032/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818407365
**[Test build #137272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137272/testReport)** for PR 32084 at commit [`a001522`](https://github.com/apache/spark/commit/a00152234a67bd0efafef8083cf750fe83f01bdf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854585920
**[Test build #139336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139336/testReport)** for PR 32084 at commit [`7adda56`](https://github.com/apache/spark/commit/7adda56a5ce8336caaa9d72bfa28961669c22c66).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612223808
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {
Review comment:
Back to code, I'm not very sure this approach is good enough for the more general requirement. At least, the idea here is
1. Optimize using the Union's children that we can treat them as the atomic plan which can be optimized
2. Optimize using the whole plan if step 1 is not satisfied
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815414403
**[Test build #137048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137048/testReport)** for PR 32084 at commit [`e4bae2b`](https://github.com/apache/spark/commit/e4bae2bf77fa088d41edd02e9155a108fab36c34).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855714028
**[Test build #139407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139407/testReport)** for PR 32084 at commit [`797ea59`](https://github.com/apache/spark/commit/797ea5944cd405612cf3972f72ebf963cea694fc).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927241163
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48144/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929428597
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48193/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r718092179
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,89 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ numUnion: Int,
+ numShuffleReader: Int,
+ numPartition: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == numUnion)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === numShuffleReader)
+ assert(df.rdd.partitions.length === numPartition)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
Review comment:
this will set a config whose key is an empty string. I think it's safer to do `SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> ""`
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,89 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ numUnion: Int,
+ numShuffleReader: Int,
+ numPartition: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == numUnion)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === numShuffleReader)
+ assert(df.rdd.partitions.length === numPartition)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ // advisory partition size 1048576 has no special meaning, just a big enough value
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+ combineUnionConfig) {
+ withTempView("t1", "t2") {
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+ .toDF().createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+ .toDF().createOrReplaceTempView("t2")
+
+ // positive test that could be coalesced
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ """.stripMargin),
+ 1,
Review comment:
can we put the parameter name to make the test more readable? `numUnion = 1`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920970634
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47857/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921190594
**[Test build #143352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143352/testReport)** for PR 32084 at commit [`62fb383`](https://github.com/apache/spark/commit/62fb3835204abb40a278016421669298fa687365).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929428597
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48193/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921197715
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143352/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717744836
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ unionNumber: Int,
+ shuffleReaderNumber: Int,
+ partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ // advisory partition size 1048576 has no special meaning, just a big enough value
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+ SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+ combineUnionConfig) {
+ withTempView("t1", "t2") {
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+ .toDF().createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+ .toDF().createOrReplaceTempView("t2")
+
+ // positive test that could be coalesced
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ """.stripMargin),
+ if (combineUnionEnabled) 1 else 1,
+ 1,
+ 1 + 4)
+
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ |UNION ALL
+ |SELECT * FROM t1
+ """.stripMargin),
+ if (combineUnionEnabled) 1 else 2,
+ 1,
+ 1 + 4 + 2)
+
+ checkResultPartition(
+ sql("""
+ |SELECT key, count(*) FROM t1 GROUP BY key
+ |UNION ALL
+ |SELECT * FROM t2
+ |UNION ALL
+ |SELECT * FROM t1
+ |UNION ALL
+ |SELECT key, count(*) FROM t2 GROUP BY key
Review comment:
it's not very useful to test 3 unions, as it's similar to the 2 cases above.
Let's test SMJ UNION AGG
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717716881
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
}
}
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row],
+ unionNumber: Int,
+ shuffleReaderNumber: Int,
+ partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(collect(df.queryExecution.executedPlan) {
+ case r: AQEShuffleReadExec => r
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
Review comment:
does this really matter for the "coalesce through union" feature? I think we can just test the default case, which means this rule is enabled.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920873803
**[Test build #143349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143349/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920921386
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47855/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921107051
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143351/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872404616
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140519/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818475159
**[Test build #137273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137273/testReport)** for PR 32084 at commit [`a966759`](https://github.com/apache/spark/commit/a966759afa0cf7e71440a344ae833fab7835a6a0).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872214551
**[Test build #140519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140519/testReport)** for PR 32084 at commit [`fb562fd`](https://github.com/apache/spark/commit/fb562fd5570c05b2830a3884d67210abb40b6bcf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872292667
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45032/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929371451
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48193/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920873803
**[Test build #143349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143349/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612105250
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {
Review comment:
> Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the same number of partitions.
>
> This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "breaking points" that can split groups in the future
@cloud-fan agree with it, try to add a new trait to make it clear. I believe other query stage optimizer rule e.g., `OptimizeSkewedJoin` also need this. Do you have any thought ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880364957
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45558/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612105702
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,86 @@ class AdaptiveQueryExecSuite
checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
}
}
+
+ test("SPARK-34980: Support coalesce partition through union") {
+ def checkResultPartition(
+ df: Dataset[Row], unionNumber: Int, shuffleReaderNumber: Int, partitionNumber: Int): Unit = {
+ df.collect()
+ assert(collect(df.queryExecution.executedPlan) {
+ case u: UnionExec => u
+ }.size == unionNumber)
+ assert(
+ collect(df.queryExecution.executedPlan) {
+ case s: CustomShuffleReaderExec => s
+ }.size === shuffleReaderNumber)
+ assert(df.rdd.partitions.length === partitionNumber)
+ }
+
+ Seq(true, false).foreach { combineUnionEnabled =>
+ val combineUnionConfig = if (combineUnionEnabled) {
+ "" -> ""
+ } else {
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+ }
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
Review comment:
@maropu add the test that without rule `CombineUnions`. After this, the plan can introduce the nested `Union`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815414403
**[Test build #137048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137048/testReport)** for PR 32084 at commit [`e4bae2b`](https://github.com/apache/spark/commit/e4bae2bf77fa088d41edd02e9155a108fab36c34).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854618208
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43856/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854638306
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43858/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854762987
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139335/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872214551
**[Test build #140519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140519/testReport)** for PR 32084 at commit [`fb562fd`](https://github.com/apache/spark/commit/fb562fd5570c05b2830a3884d67210abb40b6bcf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818512659
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854576858
**[Test build #139335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139335/testReport)** for PR 32084 at commit [`525d4d4`](https://github.com/apache/spark/commit/525d4d4c7a120967409baee65d97bf5e27a2fdbe).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819204254
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41897/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858316750
**[Test build #139617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918861253
**[Test build #143247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143247/testReport)** for PR 32084 at commit [`eeb4047`](https://github.com/apache/spark/commit/eeb4047a59773a6dba64a30b2e184badf69548b1).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929959057
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920926709
**[Test build #143351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143351/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920967375
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47857/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918866398
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143247/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ulysses-you closed pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
ulysses-you closed pull request #32084:
URL: https://github.com/apache/spark/pull/32084
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918867646
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47750/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927240943
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48144/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921107051
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143351/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927272014
**[Test build #143632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143632/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927241163
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48144/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927276805
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143632/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921398098
**[Test build #143381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143381/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929891061
**[Test build #143695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143695/testReport)** for PR 32084 at commit [`e2b25b4`](https://github.com/apache/spark/commit/e2b25b4f35b507665029162efc4e2808fecd14e3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929906030
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855883427
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139407/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818515251
**[Test build #137284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137284/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org