You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/01 12:19:44 UTC

[GitHub] [spark] zhengruifeng opened a new pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

zhengruifeng opened a new pull request #33893:
URL: https://github.com/apache/spark/pull/33893


   ### What changes were proposed in this pull request?
   This PR aims to generalize `OptimizeSkewedJoin` to support all patterns that can be handled by current _split-duplicate_ method:
   
   1, select the _splitttable_ shuffle query stages by the semantic of internal nodes;
   
   2, for each splitttable shuffle query stages, check whether skew exists, if true, split the partitions;
   
   3, handle _Combinatorial Explosion_: for each skew partition, check whether the combination number is too large, if so, re-split the stages to keep a reasonable number of combinations. For example, for partition 0, stage A/B/C are split to 100/100/100 specs, respectively. Then there are 1M combinations, which is too large, and will cause performance regression.
   
   4, attach new spec to query stages;
   
   
   ### Why are the changes needed?
   to Generalize OptimizeSkewedJoin 
   
   
   ### Does this PR introduce _any_ user-facing change?
   two additional configs are added
   
   
   ### How was this patch tested?
   existing testsuites and added testsuites
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913944664


   **[Test build #143034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143034/testReport)** for PR 33893 at commit [`545bf9b`](https://github.com/apache/spark/commit/545bf9bc0d2463869fdc46833366cb53c6a9e2fa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920548688


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47832/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #33893:
URL: https://github.com/apache/spark/pull/33893#discussion_r805618329



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -2476,11 +2703,10 @@ class AdaptiveQueryExecSuite
             "UNION ALL SELECT key2 FROM skewData2 GROUP BY key2", 1, 1)
 
         // skewJoin1 union (skewJoin2 join aggregate)
-        // skewJoin2 will lead to extra shuffles, but skew1 cannot be optimized
          checkSkewJoin(
           "SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2 UNION ALL " +
             "SELECT key1 from (SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2) tmp1 " +
-            "JOIN (SELECT key2 FROM skewData2 GROUP BY key2) tmp2 ON key1 = key2", 3, 0)
+            "JOIN (SELECT key2 FROM skewData2 GROUP BY key2) tmp2 ON key1 = key2", 3, 3)

Review comment:
       this skew test case newly added in https://github.com/apache/spark/pull/34908 can be optimized by this PR without extra shuffle:
   
   Master | this PR
   --- | ---
   ![master_skew_case](https://user-images.githubusercontent.com/7322292/153830682-e83ac367-a4d6-4bcf-8699-bd5a944f70e2.png) | ![general_skew_case](https://user-images.githubusercontent.com/7322292/153830711-bb7d1bdf-e59e-400d-a79d-a1e61c9ec8e5.png)
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913291131


   friendly ping @JkSelf @cloud-fan @yaooqinn @ulysses-you . Could you please take a look in your spare time? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910263967


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47424/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-911079989


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973957248


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49920/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994483828


   **[Test build #146225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146225/testReport)** for PR 33893 at commit [`3cdd96e`](https://github.com/apache/spark/commit/3cdd96e87daf07bc62f90a954bf5b87526ea94d7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937504998


   **[Test build #143939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143939/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937867143


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599558






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920677140


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143328/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937504998






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918194227


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47707/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974001029


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145442/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994543229


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50699/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999704205


   **[Test build #146477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146477/testReport)** for PR 33893 at commit [`627d526`](https://github.com/apache/spark/commit/627d526a02d31b535ea78fb116ab0a75aadc876c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999488191


   **[Test build #146477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146477/testReport)** for PR 33893 at commit [`627d526`](https://github.com/apache/spark/commit/627d526a02d31b535ea78fb116ab0a75aadc876c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554005


   **[Test build #146482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146482/testReport)** for PR 33893 at commit [`f822bbf`](https://github.com/apache/spark/commit/f822bbfa2bc8618c1197674c99895e4a44b9d84f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927289334


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143633/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913962126


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913956801


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927242239


   **[Test build #143633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143633/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #33893:
URL: https://github.com/apache/spark/pull/33893#discussion_r805618329



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -2476,11 +2703,10 @@ class AdaptiveQueryExecSuite
             "UNION ALL SELECT key2 FROM skewData2 GROUP BY key2", 1, 1)
 
         // skewJoin1 union (skewJoin2 join aggregate)
-        // skewJoin2 will lead to extra shuffles, but skew1 cannot be optimized
          checkSkewJoin(
           "SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2 UNION ALL " +
             "SELECT key1 from (SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2) tmp1 " +
-            "JOIN (SELECT key2 FROM skewData2 GROUP BY key2) tmp2 ON key1 = key2", 3, 0)
+            "JOIN (SELECT key2 FROM skewData2 GROUP BY key2) tmp2 ON key1 = key2", 3, 3)

Review comment:
       this newly added skew test case can be optimized by this PR without extra shuffle:
   
   Master | this PR
   --- | ---
   ![master_skew_case](https://user-images.githubusercontent.com/7322292/153830682-e83ac367-a4d6-4bcf-8699-bd5a944f70e2.png) | ![general_skew_case](https://user-images.githubusercontent.com/7322292/153830711-bb7d1bdf-e59e-400d-a79d-a1e61c9ec8e5.png)
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973871931


   **[Test build #145448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145448/testReport)** for PR 33893 at commit [`d7c6678`](https://github.com/apache/spark/commit/d7c66789651aaacb36d86aaa572047377c222a15).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999574451


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912394950


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47476/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910271892


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47424/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913960171


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920552887






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912394897


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47476/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910263967






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912363971


   **[Test build #142975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142975/testReport)** for PR 33893 at commit [`c225d13`](https://github.com/apache/spark/commit/c225d13404ac420149ed5fffe1391b7cad108eb9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927242239


   **[Test build #143633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143633/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937557508


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48438/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599558


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48438/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912540486


   **[Test build #142975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142975/testReport)** for PR 33893 at commit [`c225d13`](https://github.com/apache/spark/commit/c225d13404ac420149ed5fffe1391b7cad108eb9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-914058475


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918426518


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143206/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910232141


   added test("General Skew Join: 3-table join")
   ```
   == Physical Plan ==
   AdaptiveSparkPlan (52)
   +- == Final Plan ==
      CollectLimit (33)
      +- * HashAggregate (32)
         +- AQEShuffleRead (31)
            +- ShuffleQueryStage (30)
               +- Exchange (29)
                  +- * HashAggregate (28)
                     +- * Project (27)
                        +- * SortMergeJoin(skew=true) LeftOuter (26)
                           :- * Project (15)
                           :  +- * SortMergeJoin(skew=true) Inner (14)
                           :     :- Window(skew=true) (7)
                           :     :  +- * Sort (6)
                           :     :     +- AQEShuffleRead (5)
                           :     :        +- ShuffleQueryStage (4)
                           :     :           +- Exchange (3)
                           :     :              +- * Project (2)
                           :     :                 +- * Range (1)
                           :     +- * Sort (13)
                           :        +- AQEShuffleRead (12)
                           :           +- ShuffleQueryStage (11)
                           :              +- Exchange (10)
                           :                 +- * Project (9)
                           :                    +- * Range (8)
                           +- * Sort (25)
                              +- * HashAggregate(skew=true) (24)
                                 +- AQEShuffleRead (23)
                                    +- ShuffleQueryStage (22)
                                       +- Exchange (21)
                                          +- * HashAggregate (20)
                                             +- ShuffleQueryStage (19)
                                                +- Exchange (18)
                                                   +- * Project (17)
                                                      +- * Range (16)
   ```
   
   ![image](https://user-images.githubusercontent.com/7322292/131670203-5857e7e0-f1ea-411e-b572-ea8ad0565929.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918202889


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47707/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973868208


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49914/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999477037


   **[Test build #146476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146476/testReport)** for PR 33893 at commit [`c8bfb0c`](https://github.com/apache/spark/commit/c8bfb0c2ea74181ae1f9de5fdf6d1138ef027bac).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999589613


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912363971


   **[Test build #142975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142975/testReport)** for PR 33893 at commit [`c225d13`](https://github.com/apache/spark/commit/c225d13404ac420149ed5fffe1391b7cad108eb9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on a change in pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
advancedxy commented on a change in pull request #33893:
URL: https://github.com/apache/spark/pull/33893#discussion_r701633142



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##########
@@ -101,166 +104,317 @@ object OptimizeSkewedJoin extends AQEShuffleReadRule {
       sizes.sum / sizes.length
   }
 
-  /*
-   * This method aim to optimize the skewed join with the following steps:
-   * 1. Check whether the shuffle partition is skewed based on the median size
-   *    and the skewed partition threshold in origin shuffled join (smj and shj).
-   * 2. Assuming partition0 is skewed in left side, and it has 5 mappers (Map0, Map1...Map4).
-   *    And we may split the 5 Mappers into 3 mapper ranges [(Map0, Map1), (Map2, Map3), (Map4)]
-   *    based on the map size and the max split number.
-   * 3. Wrap the join left child with a special shuffle read that loads each mapper range with one
-   *    task, so total 3 tasks.
-   * 4. Wrap the join right child with a special shuffle read that loads partition0 3 times by
-   *    3 tasks separately.
-   */
-  private def tryOptimizeJoinChildren(
-      left: ShuffleQueryStageExec,
-      right: ShuffleQueryStageExec,
-      joinType: JoinType): Option[(SparkPlan, SparkPlan)] = {
-    val canSplitLeft = canSplitLeftSide(joinType)
-    val canSplitRight = canSplitRightSide(joinType)
-    if (!canSplitLeft && !canSplitRight) return None
-
-    val leftSizes = left.mapStats.get.bytesByPartitionId
-    val rightSizes = right.mapStats.get.bytesByPartitionId
-    assert(leftSizes.length == rightSizes.length)
-    val numPartitions = leftSizes.length
-    // We use the median size of the original shuffle partitions to detect skewed partitions.
-    val leftMedSize = medianSize(leftSizes)
-    val rightMedSize = medianSize(rightSizes)
-    logDebug(
-      s"""
-         |Optimizing skewed join.
-         |Left side partitions size info:
-         |${getSizeInfo(leftMedSize, leftSizes)}
-         |Right side partitions size info:
-         |${getSizeInfo(rightMedSize, rightSizes)}
-      """.stripMargin)
-
-    val leftSkewThreshold = getSkewThreshold(leftMedSize)
-    val rightSkewThreshold = getSkewThreshold(rightMedSize)
-    val leftTargetSize = targetSize(leftSizes, leftSkewThreshold)
-    val rightTargetSize = targetSize(rightSizes, rightSkewThreshold)
-
-    val leftSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
-    val rightSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
-    var numSkewedLeft = 0
-    var numSkewedRight = 0
-    for (partitionIndex <- 0 until numPartitions) {
-      val leftSize = leftSizes(partitionIndex)
-      val isLeftSkew = canSplitLeft && leftSize > leftSkewThreshold
-      val rightSize = rightSizes(partitionIndex)
-      val isRightSkew = canSplitRight && rightSize > rightSkewThreshold
-      val leftNoSkewPartitionSpec =
-        Seq(CoalescedPartitionSpec(partitionIndex, partitionIndex + 1, leftSize))
-      val rightNoSkewPartitionSpec =
-        Seq(CoalescedPartitionSpec(partitionIndex, partitionIndex + 1, rightSize))
-
-      val leftParts = if (isLeftSkew) {
-        val skewSpecs = ShufflePartitionsUtil.createSkewPartitionSpecs(
-          left.mapStats.get.shuffleId, partitionIndex, leftTargetSize)
-        if (skewSpecs.isDefined) {
-          logDebug(s"Left side partition $partitionIndex " +
-            s"(${FileUtils.byteCountToDisplaySize(leftSize)}) is skewed, " +
-            s"split it into ${skewSpecs.get.length} parts.")
-          numSkewedLeft += 1
+  private def optimize(plan: SparkPlan): SparkPlan = {
+    val logPrefix = s"Optimizing ${plan.nodeName} #${plan.id}"
+
+    // Step 0: Collect all ShuffledJoins (SMJ/SHJ)
+    def collectShuffledJoins(plan: SparkPlan): Seq[ShuffledJoin] = plan match {
+      case join: ShuffledJoin => Seq(join) ++ join.children.flatMap(collectShuffledJoins)
+      case _ => plan.children.flatMap(collectShuffledJoins)
+    }
+    val joins = collectShuffledJoins(plan)
+    logDebug(s"$logPrefix: ShuffledJoins: ${joins.map(_.nodeName).mkString("[", ", ", "]")}")
+    if (joins.isEmpty || joins.exists(_.isSkewJoin)) return plan
+    val topJoin = joins.head
+
+    // Step1: validate physical operators
+    // There are more and more physical operators, this list is used to avoid correctness issues
+    // TODO: support more operators like AggregateInPandasExec/FlatMapCoGroupsInPandasExec/etc
+    val invalidOperators = topJoin.collect {
+      case _: WholeStageCodegenExec => None
+      case _: AQEShuffleReadExec => None
+      case _: QueryStageExec => None
+      case _: SortExec => None
+      case _: BaseJoinExec => None
+      case _: ObjectHashAggregateExec => None
+      case _: HashAggregateExec => None
+      case _: SortAggregateExec => None
+      case _: WindowExec => None
+      case _: ProjectExec => None
+      case _: FilterExec => None
+      case _: SampleExec => None
+      case _: ColumnarToRowExec => None
+      case _: RowToColumnarExec => None
+      case _: DeserializeToObjectExec => None
+      case _: SerializeFromObjectExec => None
+      case _: MapElementsExec => None
+      case _: MapPartitionsExec => None
+      case _: MapPartitionsInRWithArrowExec => None
+      case _: MapInPandasExec => None
+      case _: EvalPythonExec => None
+      case _: CollectMetricsExec => None
+      case invalid => Some(invalid)
+    }.flatten
+    if (invalidOperators.nonEmpty) {
+      logDebug(s"$logPrefix: Do NOT support operators " +
+        s"${invalidOperators.map(_.nodeName).mkString("[", ", ", "]")}")
+      return plan
+    }
+
+    // Step 2: Collect all ShuffleQueryStages
+    val leaves = topJoin.collectLeaves()
+    // for a N-Join stage, there should be N+1 leaves.
+    if (leaves.size != joins.size + 1) return plan
+    // stageId -> MapOutputStatistics
+    val stageStats = leaves.flatMap {
+      case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) =>
+        stage.mapStats.filter(_.bytesByPartitionId.nonEmpty).map(stats => stage.id -> stats)
+      case _ => None
+    }.toMap
+    // TODO: support Bucket Join with other types of leaves.

Review comment:
       @zhengruifeng there's another case for skewed join(which occurred in our internal usages). 
   ```
   sort           sort    BroadcastExchange
     \           /         /
       SMJ(skewd)         /
                \ BroadcastHashJoin
   ```
   
   This could also be addressed in another pr or in this one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920677140


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143328/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918201941


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47707/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974092464


   **[Test build #145448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145448/testReport)** for PR 33893 at commit [`d7c6678`](https://github.com/apache/spark/commit/d7c66789651aaacb36d86aaa572047377c222a15).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973868208


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49914/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994783444


   **[Test build #146225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146225/testReport)** for PR 33893 at commit [`3cdd96e`](https://github.com/apache/spark/commit/3cdd96e87daf07bc62f90a954bf5b87526ea94d7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999477037


   **[Test build #146476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146476/testReport)** for PR 33893 at commit [`c8bfb0c`](https://github.com/apache/spark/commit/c8bfb0c2ea74181ae1f9de5fdf6d1138ef027bac).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999630635


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50958/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999543374


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999791324


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146482/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910234795


   added test("General Skew Join: 5-table join")
   
   ```
   == Physical Plan ==
   AdaptiveSparkPlan (68)
   +- == Final Plan ==
      CollectLimit (45)
      +- * HashAggregate (44)
         +- AQEShuffleRead (43)
            +- ShuffleQueryStage (42)
               +- Exchange (41)
                  +- * HashAggregate (40)
                     +- * Project (39)
                        +- * SortMergeJoin(skew=true) Inner (38)
                           :- * Project (22)
                           :  +- * SortMergeJoin(skew=true) Cross (21)
                           :     :- * Project (14)
                           :     :  +- * SortMergeJoin(skew=true) LeftOuter (13)
                           :     :     :- * Sort (6)
                           :     :     :  +- AQEShuffleRead (5)
                           :     :     :     +- ShuffleQueryStage (4)
                           :     :     :        +- Exchange (3)
                           :     :     :           +- * Project (2)
                           :     :     :              +- * Range (1)
                           :     :     +- * Sort (12)
                           :     :        +- AQEShuffleRead (11)
                           :     :           +- ShuffleQueryStage (10)
                           :     :              +- Exchange (9)
                           :     :                 +- * Project (8)
                           :     :                    +- * Range (7)
                           :     +- * Sort (20)
                           :        +- AQEShuffleRead (19)
                           :           +- ShuffleQueryStage (18)
                           :              +- Exchange (17)
                           :                 +- * Project (16)
                           :                    +- * Range (15)
                           +- * Sort (37)
                              +- * Project (36)
                                 +- * ShuffledHashJoin(skew=true) Inner BuildLeft (35)
                                    :- * HashAggregate(skew=true) (29)
                                    :  +- AQEShuffleRead (28)
                                    :     +- ShuffleQueryStage (27)
                                    :        +- Exchange (26)
                                    :           +- * HashAggregate (25)
                                    :              +- * Project (24)
                                    :                 +- * Range (23)
                                    +- AQEShuffleRead (34)
                                       +- ShuffleQueryStage (33)
                                          +- Exchange (32)
                                             +- * Project (31)
                                                +- * Range (30)
   ```
   
   ![image](https://user-images.githubusercontent.com/7322292/131670788-f2bf7146-8533-4433-801f-17697c457567.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-911263087


   friendly ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927242776


   When developing this method, I used some tests like https://github.com/apache/spark/pull/34108 to check correctness. It should be helpful for reviewing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927256316


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48145/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913944664


   **[Test build #143034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143034/testReport)** for PR 33893 at commit [`545bf9b`](https://github.com/apache/spark/commit/545bf9bc0d2463869fdc46833366cb53c6a9e2fa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973905495


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49920/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973936937


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49920/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sigmod commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
sigmod commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-1076634639


   cc @maryannxue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918202889


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47707/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-911209203


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920675754


   **[Test build #143328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143328/testReport)** for PR 33893 at commit [`6b434b0`](https://github.com/apache/spark/commit/6b434b0baad47063ba1eca1f8f788e35e0af39b3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999720468






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999791324


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146482/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999510858


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50952/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999589613


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912541986


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142975/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910348426


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142922/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910271892


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47424/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-914052947


   **[Test build #143034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143034/testReport)** for PR 33893 at commit [`545bf9b`](https://github.com/apache/spark/commit/545bf9bc0d2463869fdc46833366cb53c6a9e2fa).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920552886






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927289059


   **[Test build #143633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143633/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class SkewJoinChildWrapper(plan: SparkPlan) extends LeafExecNode `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927255499


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48145/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920552705


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47833/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920551350


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47832/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920535147


   **[Test build #143328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143328/testReport)** for PR 33893 at commit [`6b434b0`](https://github.com/apache/spark/commit/6b434b0baad47063ba1eca1f8f788e35e0af39b3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554005


   **[Test build #146482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146482/testReport)** for PR 33893 at commit [`f822bbf`](https://github.com/apache/spark/commit/f822bbfa2bc8618c1197674c99895e4a44b9d84f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973957248


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49920/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973871931


   **[Test build #145448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145448/testReport)** for PR 33893 at commit [`d7c6678`](https://github.com/apache/spark/commit/d7c66789651aaacb36d86aaa572047377c222a15).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599558






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973796432


   **[Test build #145442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145442/testReport)** for PR 33893 at commit [`dc3b6ea`](https://github.com/apache/spark/commit/dc3b6ea57b61a5701ee376bfe524770e873f7d69).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973851609


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49914/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973796432


   **[Test build #145442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145442/testReport)** for PR 33893 at commit [`dc3b6ea`](https://github.com/apache/spark/commit/dc3b6ea57b61a5701ee376bfe524770e873f7d69).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599558


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48438/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918426518


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143206/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918155391


   **[Test build #143206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143206/testReport)** for PR 33893 at commit [`cd3c449`](https://github.com/apache/spark/commit/cd3c44907db3540a87c6820aac69ab82a8d0debc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918424718


   **[Test build #143206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143206/testReport)** for PR 33893 at commit [`cd3c449`](https://github.com/apache/spark/commit/cd3c44907db3540a87c6820aac69ab82a8d0debc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class OptimizeSkewedJoin(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913962126


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910233326


   **[Test build #142922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142922/testReport)** for PR 33893 at commit [`9a7b8ff`](https://github.com/apache/spark/commit/9a7b8ff18bc29d5a2f496a3e76aeddf857b2ed26).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910233326


   **[Test build #142922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142922/testReport)** for PR 33893 at commit [`9a7b8ff`](https://github.com/apache/spark/commit/9a7b8ff18bc29d5a2f496a3e76aeddf857b2ed26).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910348426


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142922/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-914058475


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554779


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50952/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-1075794743


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #33893:
URL: https://github.com/apache/spark/pull/33893#discussion_r701688868



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##########
@@ -101,166 +104,317 @@ object OptimizeSkewedJoin extends AQEShuffleReadRule {
       sizes.sum / sizes.length
   }
 
-  /*
-   * This method aim to optimize the skewed join with the following steps:
-   * 1. Check whether the shuffle partition is skewed based on the median size
-   *    and the skewed partition threshold in origin shuffled join (smj and shj).
-   * 2. Assuming partition0 is skewed in left side, and it has 5 mappers (Map0, Map1...Map4).
-   *    And we may split the 5 Mappers into 3 mapper ranges [(Map0, Map1), (Map2, Map3), (Map4)]
-   *    based on the map size and the max split number.
-   * 3. Wrap the join left child with a special shuffle read that loads each mapper range with one
-   *    task, so total 3 tasks.
-   * 4. Wrap the join right child with a special shuffle read that loads partition0 3 times by
-   *    3 tasks separately.
-   */
-  private def tryOptimizeJoinChildren(
-      left: ShuffleQueryStageExec,
-      right: ShuffleQueryStageExec,
-      joinType: JoinType): Option[(SparkPlan, SparkPlan)] = {
-    val canSplitLeft = canSplitLeftSide(joinType)
-    val canSplitRight = canSplitRightSide(joinType)
-    if (!canSplitLeft && !canSplitRight) return None
-
-    val leftSizes = left.mapStats.get.bytesByPartitionId
-    val rightSizes = right.mapStats.get.bytesByPartitionId
-    assert(leftSizes.length == rightSizes.length)
-    val numPartitions = leftSizes.length
-    // We use the median size of the original shuffle partitions to detect skewed partitions.
-    val leftMedSize = medianSize(leftSizes)
-    val rightMedSize = medianSize(rightSizes)
-    logDebug(
-      s"""
-         |Optimizing skewed join.
-         |Left side partitions size info:
-         |${getSizeInfo(leftMedSize, leftSizes)}
-         |Right side partitions size info:
-         |${getSizeInfo(rightMedSize, rightSizes)}
-      """.stripMargin)
-
-    val leftSkewThreshold = getSkewThreshold(leftMedSize)
-    val rightSkewThreshold = getSkewThreshold(rightMedSize)
-    val leftTargetSize = targetSize(leftSizes, leftSkewThreshold)
-    val rightTargetSize = targetSize(rightSizes, rightSkewThreshold)
-
-    val leftSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
-    val rightSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
-    var numSkewedLeft = 0
-    var numSkewedRight = 0
-    for (partitionIndex <- 0 until numPartitions) {
-      val leftSize = leftSizes(partitionIndex)
-      val isLeftSkew = canSplitLeft && leftSize > leftSkewThreshold
-      val rightSize = rightSizes(partitionIndex)
-      val isRightSkew = canSplitRight && rightSize > rightSkewThreshold
-      val leftNoSkewPartitionSpec =
-        Seq(CoalescedPartitionSpec(partitionIndex, partitionIndex + 1, leftSize))
-      val rightNoSkewPartitionSpec =
-        Seq(CoalescedPartitionSpec(partitionIndex, partitionIndex + 1, rightSize))
-
-      val leftParts = if (isLeftSkew) {
-        val skewSpecs = ShufflePartitionsUtil.createSkewPartitionSpecs(
-          left.mapStats.get.shuffleId, partitionIndex, leftTargetSize)
-        if (skewSpecs.isDefined) {
-          logDebug(s"Left side partition $partitionIndex " +
-            s"(${FileUtils.byteCountToDisplaySize(leftSize)}) is skewed, " +
-            s"split it into ${skewSpecs.get.length} parts.")
-          numSkewedLeft += 1
+  private def optimize(plan: SparkPlan): SparkPlan = {
+    val logPrefix = s"Optimizing ${plan.nodeName} #${plan.id}"
+
+    // Step 0: Collect all ShuffledJoins (SMJ/SHJ)
+    def collectShuffledJoins(plan: SparkPlan): Seq[ShuffledJoin] = plan match {
+      case join: ShuffledJoin => Seq(join) ++ join.children.flatMap(collectShuffledJoins)
+      case _ => plan.children.flatMap(collectShuffledJoins)
+    }
+    val joins = collectShuffledJoins(plan)
+    logDebug(s"$logPrefix: ShuffledJoins: ${joins.map(_.nodeName).mkString("[", ", ", "]")}")
+    if (joins.isEmpty || joins.exists(_.isSkewJoin)) return plan
+    val topJoin = joins.head
+
+    // Step1: validate physical operators
+    // There are more and more physical operators, this list is used to avoid correctness issues
+    // TODO: support more operators like AggregateInPandasExec/FlatMapCoGroupsInPandasExec/etc
+    val invalidOperators = topJoin.collect {
+      case _: WholeStageCodegenExec => None
+      case _: AQEShuffleReadExec => None
+      case _: QueryStageExec => None
+      case _: SortExec => None
+      case _: BaseJoinExec => None
+      case _: ObjectHashAggregateExec => None
+      case _: HashAggregateExec => None
+      case _: SortAggregateExec => None
+      case _: WindowExec => None
+      case _: ProjectExec => None
+      case _: FilterExec => None
+      case _: SampleExec => None
+      case _: ColumnarToRowExec => None
+      case _: RowToColumnarExec => None
+      case _: DeserializeToObjectExec => None
+      case _: SerializeFromObjectExec => None
+      case _: MapElementsExec => None
+      case _: MapPartitionsExec => None
+      case _: MapPartitionsInRWithArrowExec => None
+      case _: MapInPandasExec => None
+      case _: EvalPythonExec => None
+      case _: CollectMetricsExec => None
+      case invalid => Some(invalid)
+    }.flatten
+    if (invalidOperators.nonEmpty) {
+      logDebug(s"$logPrefix: Do NOT support operators " +
+        s"${invalidOperators.map(_.nodeName).mkString("[", ", ", "]")}")
+      return plan
+    }
+
+    // Step 2: Collect all ShuffleQueryStages
+    val leaves = topJoin.collectLeaves()
+    // for a N-Join stage, there should be N+1 leaves.
+    if (leaves.size != joins.size + 1) return plan
+    // stageId -> MapOutputStatistics
+    val stageStats = leaves.flatMap {
+      case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) =>
+        stage.mapStats.filter(_.bytesByPartitionId.nonEmpty).map(stats => stage.id -> stats)
+      case _ => None
+    }.toMap
+    // TODO: support Bucket Join with other types of leaves.

Review comment:
       Great catch! BHJ is also considered in our internal system (based on 3.0). Some non-trivial changes were made to port it to master, and BHJ is ignored. I will update this PR.
   
   ![image](https://user-images.githubusercontent.com/7322292/131971133-cb955d0a-8f47-4258-a89d-54cf5d1197cd.png)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910271832


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47424/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912394950


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47476/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927289334


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143633/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974001029


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145442/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974094561


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145448/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994483828


   **[Test build #146225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146225/testReport)** for PR 33893 at commit [`3cdd96e`](https://github.com/apache/spark/commit/3cdd96e87daf07bc62f90a954bf5b87526ea94d7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994785018


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146225/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999690801


   **[Test build #146476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146476/testReport)** for PR 33893 at commit [`c8bfb0c`](https://github.com/apache/spark/commit/c8bfb0c2ea74181ae1f9de5fdf6d1138ef027bac).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999584082


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50958/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999620841


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50958/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937833411


   **[Test build #143939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143939/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class SkewJoinChildWrapper(plan: SparkPlan) extends LeafExecNode `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912541986


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142975/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-911079989






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918155391


   **[Test build #143206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143206/testReport)** for PR 33893 at commit [`cd3c449`](https://github.com/apache/spark/commit/cd3c44907db3540a87c6820aac69ab82a8d0debc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920535147


   **[Test build #143328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143328/testReport)** for PR 33893 at commit [`6b434b0`](https://github.com/apache/spark/commit/6b434b0baad47063ba1eca1f8f788e35e0af39b3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920550567


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47833/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910348129


   **[Test build #142922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142922/testReport)** for PR 33893 at commit [`9a7b8ff`](https://github.com/apache/spark/commit/9a7b8ff18bc29d5a2f496a3e76aeddf857b2ed26).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912389671


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47476/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937504998


   **[Test build #143939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143939/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937504998


   **[Test build #143939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143939/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937867143


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599486


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48438/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910233816


   added test("General Skew Join: 3-table join UNION 2-table join")
   
   ```
   == Physical Plan ==
   AdaptiveSparkPlan (68)
   +- == Final Plan ==
      CollectLimit (43)
      +- * HashAggregate (42)
         +- AQEShuffleRead (41)
            +- ShuffleQueryStage (40)
               +- Exchange (39)
                  +- * HashAggregate (38)
                     +- Union (37)
                        :- * Project (24)
                        :  +- * SortMergeJoin(skew=true) LeftOuter (23)
                        :     :- * Project (14)
                        :     :  +- * SortMergeJoin(skew=true) Inner (13)
                        :     :     :- * Sort (6)
                        :     :     :  +- AQEShuffleRead (5)
                        :     :     :     +- ShuffleQueryStage (4)
                        :     :     :        +- Exchange (3)
                        :     :     :           +- * Project (2)
                        :     :     :              +- * Range (1)
                        :     :     +- * Sort (12)
                        :     :        +- AQEShuffleRead (11)
                        :     :           +- ShuffleQueryStage (10)
                        :     :              +- Exchange (9)
                        :     :                 +- * Project (8)
                        :     :                    +- * Range (7)
                        :     +- * Sort (22)
                        :        +- * HashAggregate(skew=true) (21)
                        :           +- AQEShuffleRead (20)
                        :              +- ShuffleQueryStage (19)
                        :                 +- Exchange (18)
                        :                    +- * HashAggregate (17)
                        :                       +- * Project (16)
                        :                          +- * Range (15)
                        +- * Project (36)
                           +- * SortMergeJoin(skew=true) LeftOuter (35)
                              :- * Sort (28)
                              :  +- AQEShuffleRead (27)
                              :     +- ShuffleQueryStage (26)
                              :        +- ReusedExchange (25)
                              +- * Sort (34)
                                 +- AQEShuffleRead (33)
                                    +- ShuffleQueryStage (32)
                                       +- Exchange (31)
                                          +- * Project (30)
                                             +- * Range (29)
   ```
   
   ![image](https://user-images.githubusercontent.com/7322292/131670494-2cd5d99c-d67f-41a0-be01-90bb71c872c2.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994785018


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146225/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994560760


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50699/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973999879


   **[Test build #145442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145442/testReport)** for PR 33893 at commit [`dc3b6ea`](https://github.com/apache/spark/commit/dc3b6ea57b61a5701ee376bfe524770e873f7d69).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974094561


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145448/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973822571


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49914/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994560760


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50699/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999720468






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554779


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50952/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554744


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50952/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999630635


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50958/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999767781


   **[Test build #146482 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146482/testReport)** for PR 33893 at commit [`f822bbf`](https://github.com/apache/spark/commit/f822bbfa2bc8618c1197674c99895e4a44b9d84f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999488191


   **[Test build #146477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146477/testReport)** for PR 33893 at commit [`627d526`](https://github.com/apache/spark/commit/627d526a02d31b535ea78fb116ab0a75aadc876c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927256316


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48145/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927247526


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48145/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org