You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/28 15:11:32 UTC

[GitHub] [spark] ulysses-you opened a new pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

ulysses-you opened a new pull request #32084:
URL: https://github.com/apache/spark/pull/32084


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   - Split plan into several groups, and every child of union is a new group
   - Coalesce paritition for every group
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   #### First Issue
   The rule `CoalesceShufflePartitions` can only coalesce paritition if
   * leaf node is ShuffleQueryStage
   * all shuffle have same partition number
   
   With `Union`, it might break the assumption. Let's say we have such plan
   ```
   Union
      HashAggregate
         ShuffleQueryStage
      FileScan
   ```
   `CoalesceShufflePartitions` can not optimize it and the result partition would be `shuffle partition + FileScan partition` which can be quite lagre.
   
   It's better to support partial optimize with `Union`.
   
   #### Second Issue
   the coalesce partition formule used the **sum value** as the total input size and it's not friendly for union, see
   ```
   // ShufflePartitionsUtil.coalescePartitions
   val totalPostShuffleInputSize = mapOutputStatistics.flatMap(_.map(_.bytesByPartitionId.sum)).sum
   ```
   
   So for such case:
   ```
   Union
      HashAggregate
         ShuffleQueryStage
      HashAggregate
         ShuffleQueryStage
   ```
   The `CoalesceShufflePartitions` rule will return an unexpected partition number.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Probably yes, the result partition might changed.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Add test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855802452


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43929/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929769619


   **[Test build #143695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143695/testReport)** for PR 32084 at commit [`e2b25b4`](https://github.com/apache/spark/commit/e2b25b4f35b507665029162efc4e2808fecd14e3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you closed pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you closed pull request #32084:
URL: https://github.com/apache/spark/pull/32084


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610652043



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,51 @@ class AdaptiveQueryExecSuite
       checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
     }
   }
+
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+      df: Dataset[Row], shuffleReaderNumber: Int, partitionNumber: Int): Unit = {
+      df.collect()
+      assert(
+        collect(df.queryExecution.executedPlan) {
+          case s: CustomShuffleReaderExec => s
+        }.size === shuffleReaderNumber
+      )
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+      SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+      SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+      SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+      SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+      val df1 = spark.sparkContext.parallelize(
+        (1 to 10).map(i => TestData(i, i.toString)), 2).toDF()
+      val df2 = spark.sparkContext.parallelize(
+        (1 to 10).map(i => TestData(i, i.toString)), 4).toDF()
+
+      // positive test
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2),
+        1,
+        1 + 4)
+
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2).unionAll(df1),
+        1,
+        1 + 4 + 2)
+
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2).unionAll(df1.groupBy("key").count()),
+        2,
+        1 + 4 + 1)

Review comment:
       +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819199418


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41897/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858518626


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44163/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139617/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920921348


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47855/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818408284


   **[Test build #137273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137273/testReport)** for PR 32084 at commit [`a966759`](https://github.com/apache/spark/commit/a966759afa0cf7e71440a344ae833fab7835a6a0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818408284


   **[Test build #137273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137273/testReport)** for PR 32084 at commit [`a966759`](https://github.com/apache/spark/commit/a966759afa0cf7e71440a344ae833fab7835a6a0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816596294


   **[Test build #137115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137115/testReport)** for PR 32084 at commit [`be8d584`](https://github.com/apache/spark/commit/be8d584270c2b93e8f5539f98ed8b67560587673).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322341


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854762987


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139335/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816438833


   thank you @maropu for the review. Has addressed the comment that made code more readable and added more tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322051


   **[Test build #139617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717712983



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala
##########
@@ -412,12 +412,10 @@ class CoalesceShufflePartitionsSuite extends SparkFunSuite with BeforeAndAfterAl
 
       val finalPlan = resultDf.queryExecution.executedPlan
         .asInstanceOf[AdaptiveSparkPlanExec].executedPlan
-      // As the pre-shuffle partition number are different, we will skip reducing
-      // the shuffle partition numbers.

Review comment:
       let's update the comment
   ```
         // Shuffle partition coalescing of the join is performed independent of the non-grouping
         // aggregate on the other side of the union.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918861253


   **[Test build #143247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143247/testReport)** for PR 32084 at commit [`eeb4047`](https://github.com/apache/spark/commit/eeb4047a59773a6dba64a30b2e184badf69548b1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921022688


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855756409


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43929/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818499547


   **[Test build #137272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137272/testReport)** for PR 32084 at commit [`a001522`](https://github.com/apache/spark/commit/a00152234a67bd0efafef8083cf750fe83f01bdf).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class CoalesceShufflePartitions(session: SparkSession)`
     * `trait UnionAwareOptimizerRule `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872348799


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855714028


   **[Test build #139407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139407/testReport)** for PR 32084 at commit [`797ea59`](https://github.com/apache/spark/commit/797ea5944cd405612cf3972f72ebf963cea694fc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r613123229



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {

Review comment:
       Thank you for the input. Actual the rule(child) does not consider the children of the Union, as it just try to optimize the plan.
   
   For the nested Union. Here are two case, the first is ok if we skip the check but the second can be optimized through every Union that will cause the repetitiion.
   ```
   Union
     HashAggregate
       ShuffleQueryStage
     Union
       HashAggregate
         ShuffleQueryStage
       FileScan
   ```
   
   ```
   Union
     HashAggregate
       ShuffleQueryStage
     Union
       HashAggregate
         ShuffleQueryStage
       HashAggregate
         ShuffleQueryStage
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610655569



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
-    if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
-        || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
-      // If not all leaf nodes are query stages, it's not safe to reduce the number of
-      // shuffle partitions, because we may break the assumption that all children of a spark plan
-      // have same number of output partitions.
-      return plan
+
+    if (canCoalescePartitions(plan)) {

Review comment:
       Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the number of partitions.
   
   This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "points" that can split groups.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816439939


   **[Test build #137115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137115/testReport)** for PR 32084 at commit [`be8d584`](https://github.com/apache/spark/commit/be8d584270c2b93e8f5539f98ed8b67560587673).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819170943


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921015893


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920926709


   **[Test build #143351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143351/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921421018


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47888/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921544045


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143381/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921398098


   **[Test build #143381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143381/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139617/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816469642


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929794226


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48211/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929820764


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48210/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929333047


   **[Test build #143679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143679/testReport)** for PR 32084 at commit [`f362c9f`](https://github.com/apache/spark/commit/f362c9fb387dcad38adec2c047bb256009d26744).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929821939


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48211/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929821989


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48211/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-930027529


   thank you @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927233448


   **[Test build #143632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143632/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927276805


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143632/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717712983



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala
##########
@@ -412,12 +412,10 @@ class CoalesceShufflePartitionsSuite extends SparkFunSuite with BeforeAndAfterAl
 
       val finalPlan = resultDf.queryExecution.executedPlan
         .asInstanceOf[AdaptiveSparkPlanExec].executedPlan
-      // As the pre-shuffle partition number are different, we will skip reducing
-      // the shuffle partition numbers.

Review comment:
       let's update the comment
   ```
         // Shuffle partition coalescing of the join is performed independent of the non-grouping
         // aggregate on the other side of the union.
   ```

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        unionNumber: Int,
+        shuffleReaderNumber: Int,
+        partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      // advisory partition size 1048576 has no special meaning, just a big enough value
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+        SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+        SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+        SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+        SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+        combineUnionConfig) {
+        withTempView("t1", "t2") {
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+            .toDF().createOrReplaceTempView("t1")
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+            .toDF().createOrReplaceTempView("t2")
+
+          // positive test that could be coalesced
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+              """.stripMargin),
+            if (combineUnionEnabled) 1 else 1,

Review comment:
       ```suggestion
               1
   ```

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        unionNumber: Int,
+        shuffleReaderNumber: Int,
+        partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      // advisory partition size 1048576 has no special meaning, just a big enough value
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+        SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+        SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+        SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+        SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+        combineUnionConfig) {
+        withTempView("t1", "t2") {
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+            .toDF().createOrReplaceTempView("t1")
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+            .toDF().createOrReplaceTempView("t2")
+
+          // positive test that could be coalesced
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+              """.stripMargin),
+            if (combineUnionEnabled) 1 else 1,

Review comment:
       ```suggestion
               unionNumber  = 1
   ```

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        unionNumber: Int,
+        shuffleReaderNumber: Int,
+        partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"

Review comment:
       does this really matter for the "coalesce through union" feature? I think we can just test the default case, which means this rule is enabled.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        unionNumber: Int,
+        shuffleReaderNumber: Int,
+        partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      // advisory partition size 1048576 has no special meaning, just a big enough value
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+        SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+        SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+        SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+        SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+        combineUnionConfig) {
+        withTempView("t1", "t2") {
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+            .toDF().createOrReplaceTempView("t1")
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+            .toDF().createOrReplaceTempView("t2")
+
+          // positive test that could be coalesced
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+              """.stripMargin),
+            if (combineUnionEnabled) 1 else 1,
+            1,
+            1 + 4)
+
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+                |UNION ALL
+                |SELECT * FROM t1
+              """.stripMargin),
+            if (combineUnionEnabled) 1 else 2,
+            1,
+            1 + 4 + 2)
+
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+                |UNION ALL
+                |SELECT * FROM t1
+                |UNION ALL
+                |SELECT key, count(*) FROM t2 GROUP BY key

Review comment:
       it's not very useful to test 3 unions, as it's similar to the 2 cases above.
   
   Let's test SMJ UNION AGG




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855780960


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43929/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610655569



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
-    if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
-        || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
-      // If not all leaf nodes are query stages, it's not safe to reduce the number of
-      // shuffle partitions, because we may break the assumption that all children of a spark plan
-      // have same number of output partitions.
-      return plan
+
+    if (canCoalescePartitions(plan)) {

Review comment:
       Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the same number of partitions.
   
   This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "breaking points" that can split groups in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854620111






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610677829



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
-    if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
-        || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
-      // If not all leaf nodes are query stages, it's not safe to reduce the number of
-      // shuffle partitions, because we may break the assumption that all children of a spark plan
-      // have same number of output partitions.
-      return plan
+
+    if (canCoalescePartitions(plan)) {
+      coalescePartitions(plan)
+    } else {
+      plan.transformUp {
+        case u: UnionExec =>
+          u.withNewChildren(u.children.map { child =>
+            if (canCoalescePartitions(child) &&
+              child.find(_.isInstanceOf[UnionExec]).isEmpty) {

Review comment:
       We should add the coalesce if it's children don't have `Union` to avoid adding duplicate `CustomShufflerReader`.
   
   Without `CombineUnions`, the plan can be
   ````
   Union
     Union
        HashAggregate
         ShuffleQueryStage
        FileScan
     FileScan
   ````




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929772354


   **[Test build #143696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143696/testReport)** for PR 32084 at commit [`b2e3848`](https://github.com/apache/spark/commit/b2e3848d29a99ff415edfc4cb128b1ea6fd685cf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r718092179



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,89 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        numUnion: Int,
+        numShuffleReader: Int,
+        numPartition: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == numUnion)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === numShuffleReader)
+      assert(df.rdd.partitions.length === numPartition)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""

Review comment:
       this will set a config whose key is an empty string. I think it's safer to do `SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> ""`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929906030






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929891323


   **[Test build #143696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143696/testReport)** for PR 32084 at commit [`b2e3848`](https://github.com/apache/spark/commit/b2e3848d29a99ff415edfc4cb128b1ea6fd685cf).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854576858


   **[Test build #139335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139335/testReport)** for PR 32084 at commit [`525d4d4`](https://github.com/apache/spark/commit/525d4d4c7a120967409baee65d97bf5e27a2fdbe).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880321182


   **[Test build #141043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141043/testReport)** for PR 32084 at commit [`9743c0c`](https://github.com/apache/spark/commit/9743c0cf672129f9b30177279f803f9c8c8752d3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918867646


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47750/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927233448


   **[Test build #143632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143632/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921417781


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47888/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921542931


   **[Test build #143381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143381/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819204254


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41897/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818483062


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41851/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818705208


   **[Test build #137284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137284/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818512659






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610699855



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,51 @@ class AdaptiveQueryExecSuite
       checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
     }
   }
+
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+      df: Dataset[Row], shuffleReaderNumber: Int, partitionNumber: Int): Unit = {
+      df.collect()
+      assert(
+        collect(df.queryExecution.executedPlan) {
+          case s: CustomShuffleReaderExec => s
+        }.size === shuffleReaderNumber
+      )
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+      SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+      SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+      SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+      SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+      val df1 = spark.sparkContext.parallelize(
+        (1 to 10).map(i => TestData(i, i.toString)), 2).toDF()
+      val df2 = spark.sparkContext.parallelize(
+        (1 to 10).map(i => TestData(i, i.toString)), 4).toDF()
+
+      // positive test
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2),
+        1,
+        1 + 4)
+
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2).unionAll(df1),
+        1,
+        1 + 4 + 2)
+
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2).unionAll(df1.groupBy("key").count()),
+        2,
+        1 + 4 + 1)

Review comment:
       I considered about it.. OK will add it later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872320654


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816439939


   **[Test build #137115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137115/testReport)** for PR 32084 at commit [`be8d584`](https://github.com/apache/spark/commit/be8d584270c2b93e8f5539f98ed8b67560587673).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818707157


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137284/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818561308


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41863/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818407365


   **[Test build #137272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137272/testReport)** for PR 32084 at commit [`a001522`](https://github.com/apache/spark/commit/a00152234a67bd0efafef8083cf750fe83f01bdf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610655569



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
-    if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
-        || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
-      // If not all leaf nodes are query stages, it's not safe to reduce the number of
-      // shuffle partitions, because we may break the assumption that all children of a spark plan
-      // have same number of output partitions.
-      return plan
+
+    if (canCoalescePartitions(plan)) {

Review comment:
       Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the same number of partitions.
   
   This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "points" that can split groups.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610655569



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
-    if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
-        || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
-      // If not all leaf nodes are query stages, it's not safe to reduce the number of
-      // shuffle partitions, because we may break the assumption that all children of a spark plan
-      // have same number of output partitions.
-      return plan
+
+    if (canCoalescePartitions(plan)) {

Review comment:
       Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the same number of partitions.
   
   This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "points" that can split groups in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819300616


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858518626


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44163/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717714919



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        unionNumber: Int,
+        shuffleReaderNumber: Int,
+        partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      // advisory partition size 1048576 has no special meaning, just a big enough value
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+        SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+        SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+        SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+        SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+        combineUnionConfig) {
+        withTempView("t1", "t2") {
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+            .toDF().createOrReplaceTempView("t1")
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+            .toDF().createOrReplaceTempView("t2")
+
+          // positive test that could be coalesced
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+              """.stripMargin),
+            if (combineUnionEnabled) 1 else 1,

Review comment:
       ```suggestion
               unionNumber  = 1
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920877736


   **[Test build #143349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143349/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921197715


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143352/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927229959


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920877823


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143349/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920920346


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r718092390



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,89 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        numUnion: Int,
+        numShuffleReader: Int,
+        numPartition: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == numUnion)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === numShuffleReader)
+      assert(df.rdd.partitions.length === numPartition)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      // advisory partition size 1048576 has no special meaning, just a big enough value
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+          SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+          SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+          SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+          SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+          combineUnionConfig) {
+        withTempView("t1", "t2") {
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+            .toDF().createOrReplaceTempView("t1")
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+            .toDF().createOrReplaceTempView("t2")
+
+          // positive test that could be coalesced
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+              """.stripMargin),
+            1,

Review comment:
       can we put the parameter name to make the test more readable? `numUnion = 1`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929333047


   **[Test build #143679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143679/testReport)** for PR 32084 at commit [`f362c9f`](https://github.com/apache/spark/commit/f362c9fb387dcad38adec2c047bb256009d26744).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929428597


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48193/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929415714


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48193/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872404616


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140519/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880321182


   **[Test build #141043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141043/testReport)** for PR 32084 at commit [`9743c0c`](https://github.com/apache/spark/commit/9743c0cf672129f9b30177279f803f9c8c8752d3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880323090


   **[Test build #141043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141043/testReport)** for PR 32084 at commit [`9743c0c`](https://github.com/apache/spark/commit/9743c0cf672129f9b30177279f803f9c8c8752d3).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `public final class SparkOutOfMemoryError extends OutOfMemoryError implements SparkThrowable `
     * `      throw new IllegalArgumentException(s\"Cannot find error class '$errorClass'\"))`
     * `class SparkArithmeticException(errorClass: String, messageParameters: Array[String])`
     * `  class RemoteBlockDownloadFileManager(`
     * `case class UnresolvedFieldPosition(position: ColumnPosition) extends FieldPosition `
     * `case class ExpressionEquals(e: Expression) `
     * `case class ExpressionStats(expr: Expression)(var useCount: Int = 1) `
     * `case class Average(`
     * `case class Sum(`
     * `case class SubExprEliminationState(eval: ExprCode, children: Seq[SubExprEliminationState])`
     * `case class LocalTimestamp(timeZoneId: Option[String] = None) extends LeafExpression`
     * `case class GetTimestamp(`
     * `case class ParseToTimestampLTZ(`
     * `case class ParseToTimestamp(`
     * `case class MakeTimestampNTZ(`
     * `case class MakeTimestampLTZ(`
     * `case class DomainJoin(`
     * `      .doc(\"The custom cost evaluator class to be used for adaptive execution. If not being set,\" +`
     * `  static class IntegerUpdater implements ParquetVectorUpdater `
     * `class MergingSortWithSessionWindowStateIterator(`
     * `trait HDFSBackedStateStoreMap `
     * `class NoPrefixHDFSBackedStateStoreMap extends HDFSBackedStateStoreMap `
     * `class PrefixScannableHDFSBackedStateStoreMap(`
     * `  class HDFSBackedReadStateStore(val version: Long, map: HDFSBackedStateStoreMap)`
     * `  class HDFSBackedStateStore(val version: Long, mapToUpdate: HDFSBackedStateStoreMap)`
     * `case class RocksDBMetrics(`
     * `case class RocksDBNativeHistogram(`
     * `case class RocksDBFileManagerMetrics(`
     * `sealed trait RocksDBStateEncoder `
     * `class PrefixKeyScanStateEncoder(`
     * `class NoPrefixKeyStateEncoder(keySchema: StructType, valueSchema: StructType)`
     * `  class RocksDBStateStore(lastVersion: Long) extends StateStore `
     * `sealed trait StreamingSessionWindowStateManager extends Serializable `
     * `class StreamingSessionWindowStateManagerImplV1(`
     * `class StreamingSessionWindowHelper(sessionExpression: Attribute, inputSchema: Seq[Attribute]) `
     * `trait WatermarkSupport extends SparkPlan `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322328


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #32084:
URL: https://github.com/apache/spark/pull/32084


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #32084:
URL: https://github.com/apache/spark/pull/32084


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815437826






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929820764






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921558465


   Now, there are two issues for Union in rule `CoalesceShufflePartitions` and I updated the description to make them more clear.
   
   cc @cloud-fan @maryannxue @JkSelf @viirya if you have time to take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854601157


   **[Test build #139336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139336/testReport)** for PR 32084 at commit [`7adda56`](https://github.com/apache/spark/commit/7adda56a5ce8336caaa9d72bfa28961669c22c66).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class CurrentUser() extends LeafExpression with Unevaluable `
     * `case class ReplaceCurrentLike(catalogManager: CatalogManager) extends Rule[LogicalPlan] `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854714719


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43858/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612198616



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {

Review comment:
       We can always add abstraction later when we need to reuse the code in other places. For now let's focus on the coalesce shuffle partitions rule first.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r610642147



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,51 @@ class AdaptiveQueryExecSuite
       checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
     }
   }
+
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+      df: Dataset[Row], shuffleReaderNumber: Int, partitionNumber: Int): Unit = {
+      df.collect()
+      assert(
+        collect(df.queryExecution.executedPlan) {
+          case s: CustomShuffleReaderExec => s
+        }.size === shuffleReaderNumber
+      )
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+      SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+      SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+      SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+      SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+      val df1 = spark.sparkContext.parallelize(
+        (1 to 10).map(i => TestData(i, i.toString)), 2).toDF()
+      val df2 = spark.sparkContext.parallelize(
+        (1 to 10).map(i => TestData(i, i.toString)), 4).toDF()
+
+      // positive test
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2),
+        1,
+        1 + 4)
+
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2).unionAll(df1),
+        1,
+        1 + 4 + 2)
+
+      checkResultPartition(
+        df1.groupBy("key").count().unionAll(df2).unionAll(df1.groupBy("key").count()),
+        2,
+        1 + 4 + 1)

Review comment:
       IIUC these physical plans have a single union exec because of `CombineUnions`? Could you add tests for physical plans having multiple union execs?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
-    if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
-        || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
-      // If not all leaf nodes are query stages, it's not safe to reduce the number of
-      // shuffle partitions, because we may break the assumption that all children of a spark plan
-      // have same number of output partitions.
-      return plan
+
+    if (canCoalescePartitions(plan)) {
+      coalescePartitions(plan)
+    } else {
+      plan.transformUp {
+        case u: UnionExec =>
+          u.withNewChildren(u.children.map { child =>
+            if (canCoalescePartitions(child) &&
+              child.find(_.isInstanceOf[UnionExec]).isEmpty) {

Review comment:
       We still need this check `child.find(_.isInstanceOf[UnionExec]).isEmpty` ? It seems `canCoalescePartitions(child)` always return false if `child.find(_.isInstanceOf[UnionExec]).isEmpty` is false?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,14 +35,25 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
-    if (!plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec])
-        || plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined) {
-      // If not all leaf nodes are query stages, it's not safe to reduce the number of
-      // shuffle partitions, because we may break the assumption that all children of a spark plan
-      // have same number of output partitions.
-      return plan
+
+    if (canCoalescePartitions(plan)) {
+      coalescePartitions(plan)
+    } else {
+      plan.transformUp {
+        case u: UnionExec =>

Review comment:
       Could you leave some comments about what this pattern is for?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854714750


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43858/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JkSelf commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
JkSelf commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612982244



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {

Review comment:
       We can simplify the logic of this rule and only ensure that the children in the Union can be optimized. We don’t need to consider whether the child of the Union also contains the union, because when we call rule(child), we will consider the children of the Union recursively.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816466169






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854761608


   **[Test build #139335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139335/testReport)** for PR 32084 at commit [`525d4d4`](https://github.com/apache/spark/commit/525d4d4c7a120967409baee65d97bf5e27a2fdbe).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854585920


   **[Test build #139336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139336/testReport)** for PR 32084 at commit [`7adda56`](https://github.com/apache/spark/commit/7adda56a5ce8336caaa9d72bfa28961669c22c66).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858316750


   **[Test build #139617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818477634


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137273/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858509527


   **[Test build #139636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139636/testReport)** for PR 32084 at commit [`377505f`](https://github.com/apache/spark/commit/377505fb8dd137f5a3c41a7314e54857f472e1bb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920921386


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47855/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818707157


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137284/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818483023






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819178667


   **[Test build #137317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137317/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929959057


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921544045


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143381/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918867620


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47750/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918866346


   **[Test build #143247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143247/testReport)** for PR 32084 at commit [`eeb4047`](https://github.com/apache/spark/commit/eeb4047a59773a6dba64a30b2e184badf69548b1).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880339094


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45558/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854714750


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43858/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858684427


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139636/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855883427


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139407/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819200088


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41897/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929333047






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854620111






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717714919



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        unionNumber: Int,
+        shuffleReaderNumber: Int,
+        partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      // advisory partition size 1048576 has no special meaning, just a big enough value
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+        SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+        SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+        SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+        SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+        combineUnionConfig) {
+        withTempView("t1", "t2") {
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+            .toDF().createOrReplaceTempView("t1")
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+            .toDF().createOrReplaceTempView("t2")
+
+          // positive test that could be coalesced
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+              """.stripMargin),
+            if (combineUnionEnabled) 1 else 1,

Review comment:
       ```suggestion
               1
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929793299


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48210/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818477634


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137273/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819297063


   **[Test build #137317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137317/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815553450


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137048/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855881776


   **[Test build #139407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139407/testReport)** for PR 32084 at commit [`797ea59`](https://github.com/apache/spark/commit/797ea5944cd405612cf3972f72ebf963cea694fc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920970634


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47857/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815581312


   cc @maropu @cloud-fan @JkSelf  do you have any thought about this ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920913765


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47855/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920983006


   **[Test build #143352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143352/testReport)** for PR 32084 at commit [`62fb383`](https://github.com/apache/spark/commit/62fb3835204abb40a278016421669298fa687365).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920983006


   **[Test build #143352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143352/testReport)** for PR 32084 at commit [`62fb383`](https://github.com/apache/spark/commit/62fb3835204abb40a278016421669298fa687365).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920877823


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143349/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858322341


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819300616


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r609693645



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -93,6 +106,15 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     }
   }
 
+  private def shouldApplyChildren(plan: SparkPlan): Boolean = {
+    plan.find(p => shouldApplyChildrenFunc(p)).isDefined
+  }
+
+  private def shouldApplyChildrenFunc(plan: SparkPlan): Boolean = plan match {
+    case _: UnionExec => true

Review comment:
       Any other plan node that we can apply the same optimization into? If no, could you inline it?
   ```
     private def shouldApplyChildren(plan: SparkPlan): Boolean = {
       plan.find(_.isInstanceOf[Union]).isDefined
     }
   ```
   Then, `shouldApplyChildren` -> `hasUnion`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,6 +35,19 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
+
+    if (shouldApplyChildren(plan)) {
+      plan.transformUp {
+        case p if shouldApplyChildrenFunc(p) &&
+          !p.children.exists(child => shouldApplyChildren(child)) =>
+          p.withNewChildren(p.children.map(child => applyInternal(child)))
+      }
+    } else {
+      applyInternal(plan)
+    }
+  }
+
+  private def applyInternal(plan: SparkPlan): SparkPlan = {

Review comment:
       nit: `applyInternal` -> `coalescePartitions `?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##########
@@ -35,6 +35,19 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl
     if (!conf.coalesceShufflePartitionsEnabled) {
       return plan
     }
+
+    if (shouldApplyChildren(plan)) {
+      plan.transformUp {
+        case p if shouldApplyChildrenFunc(p) &&
+          !p.children.exists(child => shouldApplyChildren(child)) =>
+          p.withNewChildren(p.children.map(child => applyInternal(child)))
+      }
+    } else {
+      applyInternal(plan)
+    }

Review comment:
       This section looks hard-to-read, so could we write it like this?
   ```
     private def canCoalescePartitions(plan: SparkPlan): Boolean = {
       plan.collectLeaves().forall(_.isInstanceOf[QueryStageExec]) &&
         !plan.find(_.isInstanceOf[CustomShuffleReaderExec]).isDefined
     }
   
     ...
     if (canCoalescePartitions(plan) {
       // simple case
       return applyInternal(plan)
     }
   
     // Handle more cases to coalesce partitions?
     ...
   ```

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,26 @@ class AdaptiveQueryExecSuite
       checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
     }
   }
+
+  test("SPARK-34980: Support coalesce partition through union") {
+    withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+      SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+      SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+      SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+      SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+      val df1 = spark.sparkContext.parallelize(
+        (1 to 10).map(i => TestData(i, i.toString)), 2).toDF()
+      val df2 = spark.sparkContext.parallelize(
+        (1 to 10).map(i => TestData(i, i.toString)), 4).toDF()
+
+      val df = df1.groupBy("key").count().unionAll(df2)

Review comment:
       Could you add more tests for more cases, e.g., multiple unions?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815544107


   **[Test build #137048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137048/testReport)** for PR 32084 at commit [`e4bae2b`](https://github.com/apache/spark/commit/e4bae2bf77fa088d41edd02e9155a108fab36c34).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858682569


   **[Test build #139636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139636/testReport)** for PR 32084 at commit [`377505f`](https://github.com/apache/spark/commit/377505fb8dd137f5a3c41a7314e54857f472e1bb).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855802452


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43929/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818561308


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41863/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921022688


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816611339


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137115/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815714184


   cc @maryannxue too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858518613


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44163/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929769619






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929428597






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929813144


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48210/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929820764






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929820764






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819178667


   **[Test build #137317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137317/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816469642


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-816611339


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137115/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818515251


   **[Test build #137284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137284/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880323116


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141043/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880323116


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141043/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818483062


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41851/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921420990


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47888/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921421018


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47888/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858684427


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139636/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858509527


   **[Test build #139636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139636/testReport)** for PR 32084 at commit [`377505f`](https://github.com/apache/spark/commit/377505fb8dd137f5a3c41a7314e54857f472e1bb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-930027529


   thank you @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929769619






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929769619






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921022644


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921094826


   **[Test build #143351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143351/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920960736


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47857/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918866398


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143247/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815451835






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815451835


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41626/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872403234


   **[Test build #140519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140519/testReport)** for PR 32084 at commit [`fb562fd`](https://github.com/apache/spark/commit/fb562fd5570c05b2830a3884d67210abb40b6bcf).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `public class MergedBlockMetaRequest extends AbstractMessage implements RequestMessage `
     * `public class MergedBlockMetaSuccess extends AbstractResponseMessage `
     * `public abstract class AbstractFetchShuffleBlocks extends BlockTransferMessage `
     * `public class FetchShuffleBlockChunks extends AbstractFetchShuffleBlocks `
     * `public class FetchShuffleBlocks extends AbstractFetchShuffleBlocks `
     * `      throw new IllegalArgumentException(s\"Cannot find error class '$errorClass'\"))`
     * `trait SparkError extends Throwable `
     * `class SparkException(`
     * `class SparkArithmeticException(`
     * `final case class FileNameSpec(prefix: String, suffix: String)`
     * `case class ShuffleBlockInfo(shuffleId: Int, mapId: Long) `
     * `case class ShuffleBlockChunkId(`
     * `case class ShuffleMergedDataBlockId(appId: String, shuffleId: Int, reduceId: Int) extends BlockId `
     * `case class ShuffleMergedIndexBlockId(`
     * `case class ShuffleMergedMetaBlockId(`
     * `  case class FetchRequest(`
     * `  class AvroSchemaHelper(`
     * `class DecimalOps(FractionalOps):`
     * `class IntegralExtensionOps(IntegralOps):`
     * `class FractionalExtensionOps(FractionalOps):`
     * `class StringExtensionOps(StringOps):`
     * `            new_class = type(\"NameType\", (NameTypeHolder,), `
     * `class GroupBy(Generic[FrameLike], metaclass=ABCMeta):`
     * `class DataFrameGroupBy(GroupBy[DataFrame]):`
     * `class SeriesGroupBy(GroupBy[Series]):`
     * `        new_class = type(\"NameType\", (NameTypeHolder,), `
     * `class SparkIndexOpsMethods(Generic[IndexOpsLike], metaclass=ABCMeta):`
     * `class SparkSeriesMethods(SparkIndexOpsMethods[\"ps.Series\"]):`
     * `class SparkIndexMethods(SparkIndexOpsMethods[\"ps.Index\"]):`
     * `class RollingAndExpanding(Generic[FrameLike], metaclass=ABCMeta):`
     * `class RollingLike(RollingAndExpanding[FrameLike]):`
     * `class Rolling(RollingLike[FrameLike]):`
     * `class RollingGroupby(RollingLike[FrameLike]):`
     * `class ExpandingLike(RollingAndExpanding[FrameLike]):`
     * `class Expanding(ExpandingLike[FrameLike]):`
     * `class ExpandingGroupby(ExpandingLike[FrameLike]):`
     * `class KubernetesLocalDiskShuffleDataIO(sparkConf: SparkConf) extends ShuffleDataIO `
     * `class KubernetesLocalDiskShuffleExecutorComponents(sparkConf: SparkConf)`
     * `case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends UnaryExpression`
     * `sealed trait FieldName extends LeafExpression with Unevaluable `
     * `case class UnresolvedFieldName(name: Seq[String]) extends FieldName `
     * `sealed trait FieldPosition extends LeafExpression with Unevaluable `
     * `case class UnresolvedFieldPosition(`
     * `case class ResolvedFieldName(path: Seq[String], field: StructField) extends FieldName `
     * `case class ResolvedFieldPosition(position: ColumnPosition) extends FieldPosition`
     * `case class Cast(`
     * `class ExpressionContainmentOrdering extends Ordering[Expression] `
     * `case class SubExprEliminationState(`
     * `case class ArraysZip(children: Seq[Expression], names: Seq[Expression])`
     * `case class GetTimestampNTZ(`
     * `case class ParseToTimestampNTZ(`
     * `case class MakeDTInterval(`
     * `case class MakeYMInterval(years: Expression, months: Expression)`
     * `case class RebalancePartitions(`
     * `trait AlterTableCommand extends UnaryCommand `
     * `case class AlterTableDropColumns(`
     * `case class AlterTableRenameColumn(`
     * `case class AlterTableAlterColumn(`
     * `    new AnalysisException(s\"UDF class $className doesn't implement any UDF interface\")`
     * `    new AnalysisException(s\"UDF class with $n type arguments is not supported.\")`
     * `    new AnalysisException(s\"Can not instantiate class $className, please make sure\" +`
     * `    new AnalysisException(s\"Can not load class $className, please make sure it is on the classpath\")`
     * `    new SparkException(s\"Cannot find catalog plugin class for catalog '$name': $pluginClassName\")`
     * `    new SparkException(\"Cannot instantiate abstract catalog plugin class for \" +`
     * `    new SparkException(s\"Can not load in UserDefinedType $`
     * `case class DayTimeIntervalType(startField: Byte, endField: Byte) extends AtomicType `
     * `case class YearMonthIntervalType(startField: Byte, endField: Byte) extends AtomicType `
     * `final class ParquetReadState `
     * `public class ParquetVectorUpdaterFactory `
     * `case class CommandResult(`
     * `case class MergingSessionsExec(`
     * `class MergingSessionsIterator(`
     * `case class ShowCreateTableExec(`
     * `class RocksDB(`
     * `class ByteArrayPair(var key: Array[Byte] = null, var value: Array[Byte] = null) `
     * `case class RocksDBConf(`
     * `case class AcquiredThreadInfo() `
     * `case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric `
     * `case class StateStoreCustomSizeMetric(name: String, desc: String) extends StateStoreCustomMetric `
     * `case class StateStoreCustomTimingMetric(name: String, desc: String) extends StateStoreCustomMetric `
     * `trait StatefulOperatorCustomMetric `
     * `case class StatefulOperatorCustomSumMetric(name: String, desc: String)`
     * `trait TestGroupState[S] extends GroupState[S] `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929592217


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143679/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927234818


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818491403






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880350367


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45558/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855695488


   refactor the PR, do you have time to take a look ? cc @maropu @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880364957


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45558/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818561267






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872348799


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818407365


   **[Test build #137272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137272/testReport)** for PR 32084 at commit [`a001522`](https://github.com/apache/spark/commit/a00152234a67bd0efafef8083cf750fe83f01bdf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854585920


   **[Test build #139336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139336/testReport)** for PR 32084 at commit [`7adda56`](https://github.com/apache/spark/commit/7adda56a5ce8336caaa9d72bfa28961669c22c66).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612223808



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {

Review comment:
       Back to code, I'm not very sure this approach is good enough for the more general requirement. At least, the idea here is 
   1. Optimize using the Union's children that we can treat them as the atomic plan which can be optimized
   2. Optimize using the whole plan if step 1 is not satisfied




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815414403


   **[Test build #137048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137048/testReport)** for PR 32084 at commit [`e4bae2b`](https://github.com/apache/spark/commit/e4bae2bf77fa088d41edd02e9155a108fab36c34).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855714028


   **[Test build #139407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139407/testReport)** for PR 32084 at commit [`797ea59`](https://github.com/apache/spark/commit/797ea5944cd405612cf3972f72ebf963cea694fc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927241163


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929428597


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48193/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r718092179



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,89 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        numUnion: Int,
+        numShuffleReader: Int,
+        numPartition: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == numUnion)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === numShuffleReader)
+      assert(df.rdd.partitions.length === numPartition)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""

Review comment:
       this will set a config whose key is an empty string. I think it's safer to do `SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> ""`

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,89 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        numUnion: Int,
+        numShuffleReader: Int,
+        numPartition: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == numUnion)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === numShuffleReader)
+      assert(df.rdd.partitions.length === numPartition)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      // advisory partition size 1048576 has no special meaning, just a big enough value
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+          SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+          SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+          SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+          SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+          combineUnionConfig) {
+        withTempView("t1", "t2") {
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+            .toDF().createOrReplaceTempView("t1")
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+            .toDF().createOrReplaceTempView("t2")
+
+          // positive test that could be coalesced
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+              """.stripMargin),
+            1,

Review comment:
       can we put the parameter name to make the test more readable? `numUnion = 1`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920970634


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47857/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921190594


   **[Test build #143352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143352/testReport)** for PR 32084 at commit [`62fb383`](https://github.com/apache/spark/commit/62fb3835204abb40a278016421669298fa687365).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929428597


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48193/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921197715


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143352/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717744836



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        unionNumber: Int,
+        shuffleReaderNumber: Int,
+        partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      // advisory partition size 1048576 has no special meaning, just a big enough value
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+        SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+        SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+        SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+        SQLConf.SHUFFLE_PARTITIONS.key -> "10",
+        combineUnionConfig) {
+        withTempView("t1", "t2") {
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 2)
+            .toDF().createOrReplaceTempView("t1")
+          spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, i.toString)), 4)
+            .toDF().createOrReplaceTempView("t2")
+
+          // positive test that could be coalesced
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+              """.stripMargin),
+            if (combineUnionEnabled) 1 else 1,
+            1,
+            1 + 4)
+
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+                |UNION ALL
+                |SELECT * FROM t1
+              """.stripMargin),
+            if (combineUnionEnabled) 1 else 2,
+            1,
+            1 + 4 + 2)
+
+          checkResultPartition(
+            sql("""
+                |SELECT key, count(*) FROM t1 GROUP BY key
+                |UNION ALL
+                |SELECT * FROM t2
+                |UNION ALL
+                |SELECT * FROM t1
+                |UNION ALL
+                |SELECT key, count(*) FROM t2 GROUP BY key

Review comment:
       it's not very useful to test 3 unions, as it's similar to the 2 cases above.
   
   Let's test SMJ UNION AGG




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r717716881



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1705,6 +1705,91 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+        df: Dataset[Row],
+        unionNumber: Int,
+        shuffleReaderNumber: Int,
+        partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(collect(df.queryExecution.executedPlan) {
+        case r: AQEShuffleReadExec => r
+      }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"

Review comment:
       does this really matter for the "coalesce through union" feature? I think we can just test the default case, which means this rule is enabled.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920873803


   **[Test build #143349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143349/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920921386


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47855/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921107051


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143351/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872404616


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140519/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818475159


   **[Test build #137273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137273/testReport)** for PR 32084 at commit [`a966759`](https://github.com/apache/spark/commit/a966759afa0cf7e71440a344ae833fab7835a6a0).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872214551


   **[Test build #140519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140519/testReport)** for PR 32084 at commit [`fb562fd`](https://github.com/apache/spark/commit/fb562fd5570c05b2830a3884d67210abb40b6bcf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872292667


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929371451


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48193/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920873803


   **[Test build #143349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143349/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612105250



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/UnionAwareOptimizerRule.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+
+trait UnionAwareOptimizerRule {

Review comment:
       > Can we make this rule more general? Ideally we need to split the query plan of the given query stage into several groups. The shuffles within one group must have the same number of partitions.
   > 
   > This rule is overly simplified right now and assumes the entire query stage is one group. It's definitely wrong with Union and we should fix it. To be conservative we can try to put everything in one group except for Union. We should make the code extensible so that we can add more "breaking points" that can split groups in the future
   
   @cloud-fan agree with it, try to add a new trait to make it clear. I believe other query stage optimizer rule e.g., `OptimizeSkewedJoin` also need this. Do you have any thought ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-880364957


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45558/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a change in pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on a change in pull request #32084:
URL: https://github.com/apache/spark/pull/32084#discussion_r612105702



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -1575,4 +1575,86 @@ class AdaptiveQueryExecSuite
       checkNoCoalescePartitions(df.sort($"key"), ENSURE_REQUIREMENTS)
     }
   }
+
+  test("SPARK-34980: Support coalesce partition through union") {
+    def checkResultPartition(
+      df: Dataset[Row], unionNumber: Int, shuffleReaderNumber: Int, partitionNumber: Int): Unit = {
+      df.collect()
+      assert(collect(df.queryExecution.executedPlan) {
+        case u: UnionExec => u
+      }.size == unionNumber)
+      assert(
+        collect(df.queryExecution.executedPlan) {
+          case s: CustomShuffleReaderExec => s
+        }.size === shuffleReaderNumber)
+      assert(df.rdd.partitions.length === partitionNumber)
+    }
+
+    Seq(true, false).foreach { combineUnionEnabled =>
+      val combineUnionConfig = if (combineUnionEnabled) {
+        "" -> ""
+      } else {
+        SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+          "org.apache.spark.sql.catalyst.optimizer.CombineUnions"
+      }
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",

Review comment:
       @maropu add the test that without rule `CombineUnions`. After this, the plan can introduce the nested `Union`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815414403


   **[Test build #137048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137048/testReport)** for PR 32084 at commit [`e4bae2b`](https://github.com/apache/spark/commit/e4bae2bf77fa088d41edd02e9155a108fab36c34).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854618208


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43856/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854638306


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43858/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854762987


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139335/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-872214551


   **[Test build #140519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140519/testReport)** for PR 32084 at commit [`fb562fd`](https://github.com/apache/spark/commit/fb562fd5570c05b2830a3884d67210abb40b6bcf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818512659






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-854576858


   **[Test build #139335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139335/testReport)** for PR 32084 at commit [`525d4d4`](https://github.com/apache/spark/commit/525d4d4c7a120967409baee65d97bf5e27a2fdbe).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-819204254


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41897/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-858316750


   **[Test build #139617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139617/testReport)** for PR 32084 at commit [`e34004f`](https://github.com/apache/spark/commit/e34004f8d42dcf7ecd0085f8ad062bffcad445d2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918861253


   **[Test build #143247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143247/testReport)** for PR 32084 at commit [`eeb4047`](https://github.com/apache/spark/commit/eeb4047a59773a6dba64a30b2e184badf69548b1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929959057


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920926709


   **[Test build #143351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143351/testReport)** for PR 32084 at commit [`b8ee590`](https://github.com/apache/spark/commit/b8ee590c44b12e7980a9514fef79639a05f59da7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-920967375


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47857/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918866398


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143247/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you closed pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
ulysses-you closed pull request #32084:
URL: https://github.com/apache/spark/pull/32084


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-918867646


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47750/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927240943


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921107051


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143351/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927272014


   **[Test build #143632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143632/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927241163


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-927276805


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143632/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-921398098


   **[Test build #143381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143381/testReport)** for PR 32084 at commit [`a846ecd`](https://github.com/apache/spark/commit/a846ecd5221bc4b21416c9c52552cdaa0e683d0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929891061


   **[Test build #143695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143695/testReport)** for PR 32084 at commit [`e2b25b4`](https://github.com/apache/spark/commit/e2b25b4f35b507665029162efc4e2808fecd14e3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-929906030






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-855883427


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139407/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union in AQE

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-818515251


   **[Test build #137284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137284/testReport)** for PR 32084 at commit [`39afd62`](https://github.com/apache/spark/commit/39afd6292466044a5527deba4e178a69ba364561).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org