You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/01 12:19:44 UTC
[GitHub] [spark] zhengruifeng opened a new pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
zhengruifeng opened a new pull request #33893:
URL: https://github.com/apache/spark/pull/33893
### What changes were proposed in this pull request?
This PR aims to generalize `OptimizeSkewedJoin` to support all patterns that can be handled by current _split-duplicate_ method:
1, select the _splitttable_ shuffle query stages by the semantic of internal nodes;
2, for each splitttable shuffle query stages, check whether skew exists, if true, split the partitions;
3, handle _Combinatorial Explosion_: for each skew partition, check whether the combination number is too large, if so, re-split the stages to keep a reasonable number of combinations. For example, for partition 0, stage A/B/C are split to 100/100/100 specs, respectively. Then there are 1M combinations, which is too large, and will cause performance regression.
4, attach new spec to query stages;
### Why are the changes needed?
to Generalize OptimizeSkewedJoin
### Does this PR introduce _any_ user-facing change?
two additional configs are added
### How was this patch tested?
existing testsuites and added testsuites
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913944664
**[Test build #143034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143034/testReport)** for PR 33893 at commit [`545bf9b`](https://github.com/apache/spark/commit/545bf9bc0d2463869fdc46833366cb53c6a9e2fa).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920548688
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47832/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #33893:
URL: https://github.com/apache/spark/pull/33893#discussion_r805618329
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -2476,11 +2703,10 @@ class AdaptiveQueryExecSuite
"UNION ALL SELECT key2 FROM skewData2 GROUP BY key2", 1, 1)
// skewJoin1 union (skewJoin2 join aggregate)
- // skewJoin2 will lead to extra shuffles, but skew1 cannot be optimized
checkSkewJoin(
"SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2 UNION ALL " +
"SELECT key1 from (SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2) tmp1 " +
- "JOIN (SELECT key2 FROM skewData2 GROUP BY key2) tmp2 ON key1 = key2", 3, 0)
+ "JOIN (SELECT key2 FROM skewData2 GROUP BY key2) tmp2 ON key1 = key2", 3, 3)
Review comment:
this skew test case newly added in https://github.com/apache/spark/pull/34908 can be optimized by this PR without extra shuffle:
Master | this PR
--- | ---
![master_skew_case](https://user-images.githubusercontent.com/7322292/153830682-e83ac367-a4d6-4bcf-8699-bd5a944f70e2.png) | ![general_skew_case](https://user-images.githubusercontent.com/7322292/153830711-bb7d1bdf-e59e-400d-a79d-a1e61c9ec8e5.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913291131
friendly ping @JkSelf @cloud-fan @yaooqinn @ulysses-you . Could you please take a look in your spare time? Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910263967
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47424/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-911079989
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973957248
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49920/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994483828
**[Test build #146225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146225/testReport)** for PR 33893 at commit [`3cdd96e`](https://github.com/apache/spark/commit/3cdd96e87daf07bc62f90a954bf5b87526ea94d7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937504998
**[Test build #143939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143939/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937867143
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143939/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599558
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920677140
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143328/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937504998
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918194227
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47707/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974001029
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145442/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994543229
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50699/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999704205
**[Test build #146477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146477/testReport)** for PR 33893 at commit [`627d526`](https://github.com/apache/spark/commit/627d526a02d31b535ea78fb116ab0a75aadc876c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999488191
**[Test build #146477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146477/testReport)** for PR 33893 at commit [`627d526`](https://github.com/apache/spark/commit/627d526a02d31b535ea78fb116ab0a75aadc876c).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554005
**[Test build #146482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146482/testReport)** for PR 33893 at commit [`f822bbf`](https://github.com/apache/spark/commit/f822bbfa2bc8618c1197674c99895e4a44b9d84f).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927289334
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143633/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913962126
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47536/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913956801
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47536/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927242239
**[Test build #143633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143633/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #33893:
URL: https://github.com/apache/spark/pull/33893#discussion_r805618329
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -2476,11 +2703,10 @@ class AdaptiveQueryExecSuite
"UNION ALL SELECT key2 FROM skewData2 GROUP BY key2", 1, 1)
// skewJoin1 union (skewJoin2 join aggregate)
- // skewJoin2 will lead to extra shuffles, but skew1 cannot be optimized
checkSkewJoin(
"SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2 UNION ALL " +
"SELECT key1 from (SELECT key1 FROM skewData1 JOIN skewData2 ON key1 = key2) tmp1 " +
- "JOIN (SELECT key2 FROM skewData2 GROUP BY key2) tmp2 ON key1 = key2", 3, 0)
+ "JOIN (SELECT key2 FROM skewData2 GROUP BY key2) tmp2 ON key1 = key2", 3, 3)
Review comment:
this newly added skew test case can be optimized by this PR without extra shuffle:
Master | this PR
--- | ---
![master_skew_case](https://user-images.githubusercontent.com/7322292/153830682-e83ac367-a4d6-4bcf-8699-bd5a944f70e2.png) | ![general_skew_case](https://user-images.githubusercontent.com/7322292/153830711-bb7d1bdf-e59e-400d-a79d-a1e61c9ec8e5.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973871931
**[Test build #145448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145448/testReport)** for PR 33893 at commit [`d7c6678`](https://github.com/apache/spark/commit/d7c66789651aaacb36d86aaa572047377c222a15).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999574451
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50953/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912394950
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47476/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910271892
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47424/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913960171
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47536/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920552887
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912394897
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47476/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910263967
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912363971
**[Test build #142975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142975/testReport)** for PR 33893 at commit [`c225d13`](https://github.com/apache/spark/commit/c225d13404ac420149ed5fffe1391b7cad108eb9).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927242239
**[Test build #143633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143633/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937557508
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48438/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599558
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48438/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912540486
**[Test build #142975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142975/testReport)** for PR 33893 at commit [`c225d13`](https://github.com/apache/spark/commit/c225d13404ac420149ed5fffe1391b7cad108eb9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-914058475
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143034/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918426518
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143206/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910232141
added test("General Skew Join: 3-table join")
```
== Physical Plan ==
AdaptiveSparkPlan (52)
+- == Final Plan ==
CollectLimit (33)
+- * HashAggregate (32)
+- AQEShuffleRead (31)
+- ShuffleQueryStage (30)
+- Exchange (29)
+- * HashAggregate (28)
+- * Project (27)
+- * SortMergeJoin(skew=true) LeftOuter (26)
:- * Project (15)
: +- * SortMergeJoin(skew=true) Inner (14)
: :- Window(skew=true) (7)
: : +- * Sort (6)
: : +- AQEShuffleRead (5)
: : +- ShuffleQueryStage (4)
: : +- Exchange (3)
: : +- * Project (2)
: : +- * Range (1)
: +- * Sort (13)
: +- AQEShuffleRead (12)
: +- ShuffleQueryStage (11)
: +- Exchange (10)
: +- * Project (9)
: +- * Range (8)
+- * Sort (25)
+- * HashAggregate(skew=true) (24)
+- AQEShuffleRead (23)
+- ShuffleQueryStage (22)
+- Exchange (21)
+- * HashAggregate (20)
+- ShuffleQueryStage (19)
+- Exchange (18)
+- * Project (17)
+- * Range (16)
```
![image](https://user-images.githubusercontent.com/7322292/131670203-5857e7e0-f1ea-411e-b572-ea8ad0565929.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918202889
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47707/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973868208
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49914/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999477037
**[Test build #146476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146476/testReport)** for PR 33893 at commit [`c8bfb0c`](https://github.com/apache/spark/commit/c8bfb0c2ea74181ae1f9de5fdf6d1138ef027bac).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999589613
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50953/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912363971
**[Test build #142975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142975/testReport)** for PR 33893 at commit [`c225d13`](https://github.com/apache/spark/commit/c225d13404ac420149ed5fffe1391b7cad108eb9).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] advancedxy commented on a change in pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
advancedxy commented on a change in pull request #33893:
URL: https://github.com/apache/spark/pull/33893#discussion_r701633142
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##########
@@ -101,166 +104,317 @@ object OptimizeSkewedJoin extends AQEShuffleReadRule {
sizes.sum / sizes.length
}
- /*
- * This method aim to optimize the skewed join with the following steps:
- * 1. Check whether the shuffle partition is skewed based on the median size
- * and the skewed partition threshold in origin shuffled join (smj and shj).
- * 2. Assuming partition0 is skewed in left side, and it has 5 mappers (Map0, Map1...Map4).
- * And we may split the 5 Mappers into 3 mapper ranges [(Map0, Map1), (Map2, Map3), (Map4)]
- * based on the map size and the max split number.
- * 3. Wrap the join left child with a special shuffle read that loads each mapper range with one
- * task, so total 3 tasks.
- * 4. Wrap the join right child with a special shuffle read that loads partition0 3 times by
- * 3 tasks separately.
- */
- private def tryOptimizeJoinChildren(
- left: ShuffleQueryStageExec,
- right: ShuffleQueryStageExec,
- joinType: JoinType): Option[(SparkPlan, SparkPlan)] = {
- val canSplitLeft = canSplitLeftSide(joinType)
- val canSplitRight = canSplitRightSide(joinType)
- if (!canSplitLeft && !canSplitRight) return None
-
- val leftSizes = left.mapStats.get.bytesByPartitionId
- val rightSizes = right.mapStats.get.bytesByPartitionId
- assert(leftSizes.length == rightSizes.length)
- val numPartitions = leftSizes.length
- // We use the median size of the original shuffle partitions to detect skewed partitions.
- val leftMedSize = medianSize(leftSizes)
- val rightMedSize = medianSize(rightSizes)
- logDebug(
- s"""
- |Optimizing skewed join.
- |Left side partitions size info:
- |${getSizeInfo(leftMedSize, leftSizes)}
- |Right side partitions size info:
- |${getSizeInfo(rightMedSize, rightSizes)}
- """.stripMargin)
-
- val leftSkewThreshold = getSkewThreshold(leftMedSize)
- val rightSkewThreshold = getSkewThreshold(rightMedSize)
- val leftTargetSize = targetSize(leftSizes, leftSkewThreshold)
- val rightTargetSize = targetSize(rightSizes, rightSkewThreshold)
-
- val leftSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
- val rightSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
- var numSkewedLeft = 0
- var numSkewedRight = 0
- for (partitionIndex <- 0 until numPartitions) {
- val leftSize = leftSizes(partitionIndex)
- val isLeftSkew = canSplitLeft && leftSize > leftSkewThreshold
- val rightSize = rightSizes(partitionIndex)
- val isRightSkew = canSplitRight && rightSize > rightSkewThreshold
- val leftNoSkewPartitionSpec =
- Seq(CoalescedPartitionSpec(partitionIndex, partitionIndex + 1, leftSize))
- val rightNoSkewPartitionSpec =
- Seq(CoalescedPartitionSpec(partitionIndex, partitionIndex + 1, rightSize))
-
- val leftParts = if (isLeftSkew) {
- val skewSpecs = ShufflePartitionsUtil.createSkewPartitionSpecs(
- left.mapStats.get.shuffleId, partitionIndex, leftTargetSize)
- if (skewSpecs.isDefined) {
- logDebug(s"Left side partition $partitionIndex " +
- s"(${FileUtils.byteCountToDisplaySize(leftSize)}) is skewed, " +
- s"split it into ${skewSpecs.get.length} parts.")
- numSkewedLeft += 1
+ private def optimize(plan: SparkPlan): SparkPlan = {
+ val logPrefix = s"Optimizing ${plan.nodeName} #${plan.id}"
+
+ // Step 0: Collect all ShuffledJoins (SMJ/SHJ)
+ def collectShuffledJoins(plan: SparkPlan): Seq[ShuffledJoin] = plan match {
+ case join: ShuffledJoin => Seq(join) ++ join.children.flatMap(collectShuffledJoins)
+ case _ => plan.children.flatMap(collectShuffledJoins)
+ }
+ val joins = collectShuffledJoins(plan)
+ logDebug(s"$logPrefix: ShuffledJoins: ${joins.map(_.nodeName).mkString("[", ", ", "]")}")
+ if (joins.isEmpty || joins.exists(_.isSkewJoin)) return plan
+ val topJoin = joins.head
+
+ // Step1: validate physical operators
+ // There are more and more physical operators, this list is used to avoid correctness issues
+ // TODO: support more operators like AggregateInPandasExec/FlatMapCoGroupsInPandasExec/etc
+ val invalidOperators = topJoin.collect {
+ case _: WholeStageCodegenExec => None
+ case _: AQEShuffleReadExec => None
+ case _: QueryStageExec => None
+ case _: SortExec => None
+ case _: BaseJoinExec => None
+ case _: ObjectHashAggregateExec => None
+ case _: HashAggregateExec => None
+ case _: SortAggregateExec => None
+ case _: WindowExec => None
+ case _: ProjectExec => None
+ case _: FilterExec => None
+ case _: SampleExec => None
+ case _: ColumnarToRowExec => None
+ case _: RowToColumnarExec => None
+ case _: DeserializeToObjectExec => None
+ case _: SerializeFromObjectExec => None
+ case _: MapElementsExec => None
+ case _: MapPartitionsExec => None
+ case _: MapPartitionsInRWithArrowExec => None
+ case _: MapInPandasExec => None
+ case _: EvalPythonExec => None
+ case _: CollectMetricsExec => None
+ case invalid => Some(invalid)
+ }.flatten
+ if (invalidOperators.nonEmpty) {
+ logDebug(s"$logPrefix: Do NOT support operators " +
+ s"${invalidOperators.map(_.nodeName).mkString("[", ", ", "]")}")
+ return plan
+ }
+
+ // Step 2: Collect all ShuffleQueryStages
+ val leaves = topJoin.collectLeaves()
+ // for a N-Join stage, there should be N+1 leaves.
+ if (leaves.size != joins.size + 1) return plan
+ // stageId -> MapOutputStatistics
+ val stageStats = leaves.flatMap {
+ case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) =>
+ stage.mapStats.filter(_.bytesByPartitionId.nonEmpty).map(stats => stage.id -> stats)
+ case _ => None
+ }.toMap
+ // TODO: support Bucket Join with other types of leaves.
Review comment:
@zhengruifeng there's another case for skewed join(which occurred in our internal usages).
```
sort sort BroadcastExchange
\ / /
SMJ(skewd) /
\ BroadcastHashJoin
```
This could also be addressed in another pr or in this one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920677140
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143328/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918201941
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47707/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974092464
**[Test build #145448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145448/testReport)** for PR 33893 at commit [`d7c6678`](https://github.com/apache/spark/commit/d7c66789651aaacb36d86aaa572047377c222a15).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973868208
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49914/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994783444
**[Test build #146225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146225/testReport)** for PR 33893 at commit [`3cdd96e`](https://github.com/apache/spark/commit/3cdd96e87daf07bc62f90a954bf5b87526ea94d7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999477037
**[Test build #146476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146476/testReport)** for PR 33893 at commit [`c8bfb0c`](https://github.com/apache/spark/commit/c8bfb0c2ea74181ae1f9de5fdf6d1138ef027bac).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999630635
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50958/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999543374
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50953/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999791324
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146482/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910234795
added test("General Skew Join: 5-table join")
```
== Physical Plan ==
AdaptiveSparkPlan (68)
+- == Final Plan ==
CollectLimit (45)
+- * HashAggregate (44)
+- AQEShuffleRead (43)
+- ShuffleQueryStage (42)
+- Exchange (41)
+- * HashAggregate (40)
+- * Project (39)
+- * SortMergeJoin(skew=true) Inner (38)
:- * Project (22)
: +- * SortMergeJoin(skew=true) Cross (21)
: :- * Project (14)
: : +- * SortMergeJoin(skew=true) LeftOuter (13)
: : :- * Sort (6)
: : : +- AQEShuffleRead (5)
: : : +- ShuffleQueryStage (4)
: : : +- Exchange (3)
: : : +- * Project (2)
: : : +- * Range (1)
: : +- * Sort (12)
: : +- AQEShuffleRead (11)
: : +- ShuffleQueryStage (10)
: : +- Exchange (9)
: : +- * Project (8)
: : +- * Range (7)
: +- * Sort (20)
: +- AQEShuffleRead (19)
: +- ShuffleQueryStage (18)
: +- Exchange (17)
: +- * Project (16)
: +- * Range (15)
+- * Sort (37)
+- * Project (36)
+- * ShuffledHashJoin(skew=true) Inner BuildLeft (35)
:- * HashAggregate(skew=true) (29)
: +- AQEShuffleRead (28)
: +- ShuffleQueryStage (27)
: +- Exchange (26)
: +- * HashAggregate (25)
: +- * Project (24)
: +- * Range (23)
+- AQEShuffleRead (34)
+- ShuffleQueryStage (33)
+- Exchange (32)
+- * Project (31)
+- * Range (30)
```
![image](https://user-images.githubusercontent.com/7322292/131670788-f2bf7146-8533-4433-801f-17697c457567.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-911263087
friendly ping @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927242776
When developing this method, I used some tests like https://github.com/apache/spark/pull/34108 to check correctness. It should be helpful for reviewing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927256316
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48145/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913944664
**[Test build #143034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143034/testReport)** for PR 33893 at commit [`545bf9b`](https://github.com/apache/spark/commit/545bf9bc0d2463869fdc46833366cb53c6a9e2fa).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973905495
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49920/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973936937
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49920/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sigmod commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
sigmod commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-1076634639
cc @maryannxue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918202889
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47707/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-911209203
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920675754
**[Test build #143328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143328/testReport)** for PR 33893 at commit [`6b434b0`](https://github.com/apache/spark/commit/6b434b0baad47063ba1eca1f8f788e35e0af39b3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999720468
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999791324
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146482/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999510858
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50952/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999589613
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50953/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912541986
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142975/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910348426
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142922/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910271892
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47424/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-914052947
**[Test build #143034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143034/testReport)** for PR 33893 at commit [`545bf9b`](https://github.com/apache/spark/commit/545bf9bc0d2463869fdc46833366cb53c6a9e2fa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920552886
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927289059
**[Test build #143633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143633/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class SkewJoinChildWrapper(plan: SparkPlan) extends LeafExecNode `
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927255499
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48145/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920552705
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47833/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920551350
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47832/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920535147
**[Test build #143328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143328/testReport)** for PR 33893 at commit [`6b434b0`](https://github.com/apache/spark/commit/6b434b0baad47063ba1eca1f8f788e35e0af39b3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554005
**[Test build #146482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146482/testReport)** for PR 33893 at commit [`f822bbf`](https://github.com/apache/spark/commit/f822bbfa2bc8618c1197674c99895e4a44b9d84f).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973957248
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49920/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973871931
**[Test build #145448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145448/testReport)** for PR 33893 at commit [`d7c6678`](https://github.com/apache/spark/commit/d7c66789651aaacb36d86aaa572047377c222a15).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599558
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973796432
**[Test build #145442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145442/testReport)** for PR 33893 at commit [`dc3b6ea`](https://github.com/apache/spark/commit/dc3b6ea57b61a5701ee376bfe524770e873f7d69).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973851609
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49914/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973796432
**[Test build #145442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145442/testReport)** for PR 33893 at commit [`dc3b6ea`](https://github.com/apache/spark/commit/dc3b6ea57b61a5701ee376bfe524770e873f7d69).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599558
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48438/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918426518
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143206/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918155391
**[Test build #143206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143206/testReport)** for PR 33893 at commit [`cd3c449`](https://github.com/apache/spark/commit/cd3c44907db3540a87c6820aac69ab82a8d0debc).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918424718
**[Test build #143206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143206/testReport)** for PR 33893 at commit [`cd3c449`](https://github.com/apache/spark/commit/cd3c44907db3540a87c6820aac69ab82a8d0debc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class OptimizeSkewedJoin(`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913962126
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47536/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910233326
**[Test build #142922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142922/testReport)** for PR 33893 at commit [`9a7b8ff`](https://github.com/apache/spark/commit/9a7b8ff18bc29d5a2f496a3e76aeddf857b2ed26).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910233326
**[Test build #142922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142922/testReport)** for PR 33893 at commit [`9a7b8ff`](https://github.com/apache/spark/commit/9a7b8ff18bc29d5a2f496a3e76aeddf857b2ed26).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910348426
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142922/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-914058475
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143034/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554779
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50952/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-1075794743
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #33893:
URL: https://github.com/apache/spark/pull/33893#discussion_r701688868
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##########
@@ -101,166 +104,317 @@ object OptimizeSkewedJoin extends AQEShuffleReadRule {
sizes.sum / sizes.length
}
- /*
- * This method aim to optimize the skewed join with the following steps:
- * 1. Check whether the shuffle partition is skewed based on the median size
- * and the skewed partition threshold in origin shuffled join (smj and shj).
- * 2. Assuming partition0 is skewed in left side, and it has 5 mappers (Map0, Map1...Map4).
- * And we may split the 5 Mappers into 3 mapper ranges [(Map0, Map1), (Map2, Map3), (Map4)]
- * based on the map size and the max split number.
- * 3. Wrap the join left child with a special shuffle read that loads each mapper range with one
- * task, so total 3 tasks.
- * 4. Wrap the join right child with a special shuffle read that loads partition0 3 times by
- * 3 tasks separately.
- */
- private def tryOptimizeJoinChildren(
- left: ShuffleQueryStageExec,
- right: ShuffleQueryStageExec,
- joinType: JoinType): Option[(SparkPlan, SparkPlan)] = {
- val canSplitLeft = canSplitLeftSide(joinType)
- val canSplitRight = canSplitRightSide(joinType)
- if (!canSplitLeft && !canSplitRight) return None
-
- val leftSizes = left.mapStats.get.bytesByPartitionId
- val rightSizes = right.mapStats.get.bytesByPartitionId
- assert(leftSizes.length == rightSizes.length)
- val numPartitions = leftSizes.length
- // We use the median size of the original shuffle partitions to detect skewed partitions.
- val leftMedSize = medianSize(leftSizes)
- val rightMedSize = medianSize(rightSizes)
- logDebug(
- s"""
- |Optimizing skewed join.
- |Left side partitions size info:
- |${getSizeInfo(leftMedSize, leftSizes)}
- |Right side partitions size info:
- |${getSizeInfo(rightMedSize, rightSizes)}
- """.stripMargin)
-
- val leftSkewThreshold = getSkewThreshold(leftMedSize)
- val rightSkewThreshold = getSkewThreshold(rightMedSize)
- val leftTargetSize = targetSize(leftSizes, leftSkewThreshold)
- val rightTargetSize = targetSize(rightSizes, rightSkewThreshold)
-
- val leftSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
- val rightSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
- var numSkewedLeft = 0
- var numSkewedRight = 0
- for (partitionIndex <- 0 until numPartitions) {
- val leftSize = leftSizes(partitionIndex)
- val isLeftSkew = canSplitLeft && leftSize > leftSkewThreshold
- val rightSize = rightSizes(partitionIndex)
- val isRightSkew = canSplitRight && rightSize > rightSkewThreshold
- val leftNoSkewPartitionSpec =
- Seq(CoalescedPartitionSpec(partitionIndex, partitionIndex + 1, leftSize))
- val rightNoSkewPartitionSpec =
- Seq(CoalescedPartitionSpec(partitionIndex, partitionIndex + 1, rightSize))
-
- val leftParts = if (isLeftSkew) {
- val skewSpecs = ShufflePartitionsUtil.createSkewPartitionSpecs(
- left.mapStats.get.shuffleId, partitionIndex, leftTargetSize)
- if (skewSpecs.isDefined) {
- logDebug(s"Left side partition $partitionIndex " +
- s"(${FileUtils.byteCountToDisplaySize(leftSize)}) is skewed, " +
- s"split it into ${skewSpecs.get.length} parts.")
- numSkewedLeft += 1
+ private def optimize(plan: SparkPlan): SparkPlan = {
+ val logPrefix = s"Optimizing ${plan.nodeName} #${plan.id}"
+
+ // Step 0: Collect all ShuffledJoins (SMJ/SHJ)
+ def collectShuffledJoins(plan: SparkPlan): Seq[ShuffledJoin] = plan match {
+ case join: ShuffledJoin => Seq(join) ++ join.children.flatMap(collectShuffledJoins)
+ case _ => plan.children.flatMap(collectShuffledJoins)
+ }
+ val joins = collectShuffledJoins(plan)
+ logDebug(s"$logPrefix: ShuffledJoins: ${joins.map(_.nodeName).mkString("[", ", ", "]")}")
+ if (joins.isEmpty || joins.exists(_.isSkewJoin)) return plan
+ val topJoin = joins.head
+
+ // Step1: validate physical operators
+ // There are more and more physical operators, this list is used to avoid correctness issues
+ // TODO: support more operators like AggregateInPandasExec/FlatMapCoGroupsInPandasExec/etc
+ val invalidOperators = topJoin.collect {
+ case _: WholeStageCodegenExec => None
+ case _: AQEShuffleReadExec => None
+ case _: QueryStageExec => None
+ case _: SortExec => None
+ case _: BaseJoinExec => None
+ case _: ObjectHashAggregateExec => None
+ case _: HashAggregateExec => None
+ case _: SortAggregateExec => None
+ case _: WindowExec => None
+ case _: ProjectExec => None
+ case _: FilterExec => None
+ case _: SampleExec => None
+ case _: ColumnarToRowExec => None
+ case _: RowToColumnarExec => None
+ case _: DeserializeToObjectExec => None
+ case _: SerializeFromObjectExec => None
+ case _: MapElementsExec => None
+ case _: MapPartitionsExec => None
+ case _: MapPartitionsInRWithArrowExec => None
+ case _: MapInPandasExec => None
+ case _: EvalPythonExec => None
+ case _: CollectMetricsExec => None
+ case invalid => Some(invalid)
+ }.flatten
+ if (invalidOperators.nonEmpty) {
+ logDebug(s"$logPrefix: Do NOT support operators " +
+ s"${invalidOperators.map(_.nodeName).mkString("[", ", ", "]")}")
+ return plan
+ }
+
+ // Step 2: Collect all ShuffleQueryStages
+ val leaves = topJoin.collectLeaves()
+ // for a N-Join stage, there should be N+1 leaves.
+ if (leaves.size != joins.size + 1) return plan
+ // stageId -> MapOutputStatistics
+ val stageStats = leaves.flatMap {
+ case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) =>
+ stage.mapStats.filter(_.bytesByPartitionId.nonEmpty).map(stats => stage.id -> stats)
+ case _ => None
+ }.toMap
+ // TODO: support Bucket Join with other types of leaves.
Review comment:
Great catch! BHJ is also considered in our internal system (based on 3.0). Some non-trivial changes were made to port it to master, and BHJ is ignored. I will update this PR.
![image](https://user-images.githubusercontent.com/7322292/131971133-cb955d0a-8f47-4258-a89d-54cf5d1197cd.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910271832
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47424/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912394950
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47476/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927289334
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143633/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974001029
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145442/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974094561
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145448/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994483828
**[Test build #146225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146225/testReport)** for PR 33893 at commit [`3cdd96e`](https://github.com/apache/spark/commit/3cdd96e87daf07bc62f90a954bf5b87526ea94d7).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994785018
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146225/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999690801
**[Test build #146476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146476/testReport)** for PR 33893 at commit [`c8bfb0c`](https://github.com/apache/spark/commit/c8bfb0c2ea74181ae1f9de5fdf6d1138ef027bac).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999584082
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50958/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999620841
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50958/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937833411
**[Test build #143939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143939/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class SkewJoinChildWrapper(plan: SparkPlan) extends LeafExecNode `
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912541986
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142975/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-911079989
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-918155391
**[Test build #143206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143206/testReport)** for PR 33893 at commit [`cd3c449`](https://github.com/apache/spark/commit/cd3c44907db3540a87c6820aac69ab82a8d0debc).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920535147
**[Test build #143328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143328/testReport)** for PR 33893 at commit [`6b434b0`](https://github.com/apache/spark/commit/6b434b0baad47063ba1eca1f8f788e35e0af39b3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-920550567
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47833/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910348129
**[Test build #142922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142922/testReport)** for PR 33893 at commit [`9a7b8ff`](https://github.com/apache/spark/commit/9a7b8ff18bc29d5a2f496a3e76aeddf857b2ed26).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-912389671
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47476/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937504998
**[Test build #143939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143939/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937504998
**[Test build #143939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143939/testReport)** for PR 33893 at commit [`c1d3e48`](https://github.com/apache/spark/commit/c1d3e48827caab6b616a2c6c4fb8bed4ab9888a0).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937867143
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143939/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-937599486
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48438/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-910233816
added test("General Skew Join: 3-table join UNION 2-table join")
```
== Physical Plan ==
AdaptiveSparkPlan (68)
+- == Final Plan ==
CollectLimit (43)
+- * HashAggregate (42)
+- AQEShuffleRead (41)
+- ShuffleQueryStage (40)
+- Exchange (39)
+- * HashAggregate (38)
+- Union (37)
:- * Project (24)
: +- * SortMergeJoin(skew=true) LeftOuter (23)
: :- * Project (14)
: : +- * SortMergeJoin(skew=true) Inner (13)
: : :- * Sort (6)
: : : +- AQEShuffleRead (5)
: : : +- ShuffleQueryStage (4)
: : : +- Exchange (3)
: : : +- * Project (2)
: : : +- * Range (1)
: : +- * Sort (12)
: : +- AQEShuffleRead (11)
: : +- ShuffleQueryStage (10)
: : +- Exchange (9)
: : +- * Project (8)
: : +- * Range (7)
: +- * Sort (22)
: +- * HashAggregate(skew=true) (21)
: +- AQEShuffleRead (20)
: +- ShuffleQueryStage (19)
: +- Exchange (18)
: +- * HashAggregate (17)
: +- * Project (16)
: +- * Range (15)
+- * Project (36)
+- * SortMergeJoin(skew=true) LeftOuter (35)
:- * Sort (28)
: +- AQEShuffleRead (27)
: +- ShuffleQueryStage (26)
: +- ReusedExchange (25)
+- * Sort (34)
+- AQEShuffleRead (33)
+- ShuffleQueryStage (32)
+- Exchange (31)
+- * Project (30)
+- * Range (29)
```
![image](https://user-images.githubusercontent.com/7322292/131670494-2cd5d99c-d67f-41a0-be01-90bb71c872c2.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994785018
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146225/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994560760
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50699/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973999879
**[Test build #145442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145442/testReport)** for PR 33893 at commit [`dc3b6ea`](https://github.com/apache/spark/commit/dc3b6ea57b61a5701ee376bfe524770e873f7d69).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-974094561
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145448/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-973822571
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49914/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-994560760
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50699/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999720468
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554779
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50952/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999554744
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50952/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999630635
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50958/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999767781
**[Test build #146482 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146482/testReport)** for PR 33893 at commit [`f822bbf`](https://github.com/apache/spark/commit/f822bbfa2bc8618c1197674c99895e4a44b9d84f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-999488191
**[Test build #146477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146477/testReport)** for PR 33893 at commit [`627d526`](https://github.com/apache/spark/commit/627d526a02d31b535ea78fb116ab0a75aadc876c).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927256316
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48145/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-927247526
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48145/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org