You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/27 16:09:48 UTC

[GitHub] [spark] wangyum opened a new pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

wangyum opened a new pull request #28934:
URL: https://github.com/apache/spark/pull/28934


   ### What changes were proposed in this pull request?
   
   The data usually expand if joining event-based table(Chinese named 拉链表). This PR makes it avoid coalescing shuffle partitions if joining event-based table(join condition has inequality predicate).
   
   
   
   Default | Avoid coalescing shuffle partitions
   --- | ---
   <img src="https://user-images.githubusercontent.com/5399861/85926502-87d8d980-b8d2-11ea-816b-c44c0216c3f2.png" width="410"> | <img src="https://user-images.githubusercontent.com/5399861/85926505-8d362400-b8d2-11ea-82e9-7120a0b0aa0c.png" width="410">
   
   
   ### Why are the changes needed?
   
   Improve query performance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Unit test
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum closed pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #28934:
URL: https://github.com/apache/spark/pull/28934


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-652715188


   +1 with @viirya . Also, I think we need a benchmark result to prove it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650629435






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650581756


   **[Test build #124566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124566/testReport)** for PR 28934 at commit [`1d88f08`](https://github.com/apache/spark/commit/1d88f084949af185049c489bf0930ed78966ed97).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650629435






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650582509


   Another case:
   
   Default | Avoid coalescing shuffle partitions
   --- | ---
   <img src="https://user-images.githubusercontent.com/5399861/85927019-f3707600-b8d5-11ea-8f61-d2456037d02b.png" width="410"> | <img src="https://user-images.githubusercontent.com/5399861/85927016-ef445880-b8d5-11ea-8366-db1bd47d12dd.png" width="410">
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650581756


   **[Test build #124566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124566/testReport)** for PR 28934 at commit [`1d88f08`](https://github.com/apache/spark/commit/1d88f084949af185049c489bf0930ed78966ed97).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650705906


   Can you elaborate it more why should not coalesce shuffle partition if the join condition has inequality predicate?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650580251






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650580251






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-650629035


   **[Test build #124566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124566/testReport)** for PR 28934 at commit [`1d88f08`](https://github.com/apache/spark/commit/1d88f084949af185049c489bf0930ed78966ed97).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #28934:
URL: https://github.com/apache/spark/pull/28934#issuecomment-706457752


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org