You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/02/12 21:52:05 UTC
[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...
GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/11188
[SPARK-13314] [SQL] Fix worst case of broadcast join of two ints
If the two join columns have the same value, the hash code of them will be (a ^ b), which is 0, then the HashMap will be very very slow.
This PR will rotate the second int to avoid this case. In theory, it's still have the possibility that has lots of collisions, the pattern will be (1, 131072), (2, 131073) ... (n, n + 131072).
This PR also added some micro benchmark, and updated the results for broadcast hash joins.
This PR is based on #11130
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark fix_ints
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11188.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11188
----
commit 52efe91168a4be7ce721d2f56e2b1e7aab9379db
Author: Davies Liu <da...@databricks.com>
Date: 2016-02-09T00:33:44Z
generated broadcast outer join
commit 9525782c971f52c1343830402276086cd0e4ae8f
Author: Davies Liu <da...@databricks.com>
Date: 2016-02-09T18:18:45Z
refactor
commit 9a1f5325e954d8464d28ebf415c9dca665e15d35
Author: Davies Liu <da...@databricks.com>
Date: 2016-02-09T18:21:31Z
fix style
commit 98cda0be6cdc3687d045aa5b881676758d66842c
Author: Davies Liu <da...@databricks.com>
Date: 2016-02-09T18:32:35Z
Merge branch 'master' into gen_out
commit edbc284921281358a38b300218ff288c33cdc3b4
Author: Davies Liu <da...@databricks.com>
Date: 2016-02-09T19:48:58Z
fix tests
commit da45df1536f112a14bfe15d6d30d307cdbd99d5b
Author: Davies Liu <da...@databricks.com>
Date: 2016-02-10T19:20:33Z
address comments
commit 9b05c7cd335f06079c241f681282bd36306dc739
Author: Davies Liu <da...@databricks.com>
Date: 2016-02-12T18:26:24Z
Merge branch 'master' of github.com:apache/spark into gen_out
Conflicts:
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
commit 1c0ee96e80d5cc1909d7d5ec794b74e76979ae45
Author: Davies Liu <da...@databricks.com>
Date: 2016-02-12T20:40:30Z
fix worst case of broadcast join with two ints
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11188#issuecomment-186369304
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51571/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...
Posted by davies <gi...@git.apache.org>.
Github user davies closed the pull request at:
https://github.com/apache/spark/pull/11188
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11188#issuecomment-186369295
**[Test build #51571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51571/consoleFull)** for PR 11188 at commit [`f6416a6`](https://github.com/apache/spark/commit/f6416a6f262b3f1f9a552442c65ece93080ca4f9).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11188#issuecomment-183489396
**[Test build #51201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51201/consoleFull)** for PR 11188 at commit [`1c0ee96`](https://github.com/apache/spark/commit/1c0ee96e80d5cc1909d7d5ec794b74e76979ae45).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11188#issuecomment-183489748
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11188#issuecomment-186368925
**[Test build #51571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51571/consoleFull)** for PR 11188 at commit [`f6416a6`](https://github.com/apache/spark/commit/f6416a6f262b3f1f9a552442c65ece93080ca4f9).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11188#issuecomment-183489745
**[Test build #51201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51201/consoleFull)** for PR 11188 at commit [`1c0ee96`](https://github.com/apache/spark/commit/1c0ee96e80d5cc1909d7d5ec794b74e76979ae45).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11188#issuecomment-186369299
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11188#issuecomment-183489750
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51201/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org