You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/02/12 21:52:05 UTC

[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/11188

    [SPARK-13314] [SQL] Fix worst case of broadcast join of two ints

    If the two join columns have the same value, the hash code of them will be (a ^ b), which is 0, then the HashMap will be very very slow.
    
    This PR will rotate the second int to avoid this case. In theory, it's still have the possibility that has lots of collisions, the pattern will be (1, 131072), (2, 131073) ... (n, n + 131072).
    
    This PR also added some micro benchmark, and updated the results for broadcast hash joins.
    
    This PR is based on #11130 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark fix_ints

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11188
    
----
commit 52efe91168a4be7ce721d2f56e2b1e7aab9379db
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-09T00:33:44Z

    generated broadcast outer join

commit 9525782c971f52c1343830402276086cd0e4ae8f
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-09T18:18:45Z

    refactor

commit 9a1f5325e954d8464d28ebf415c9dca665e15d35
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-09T18:21:31Z

    fix style

commit 98cda0be6cdc3687d045aa5b881676758d66842c
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-09T18:32:35Z

    Merge branch 'master' into gen_out

commit edbc284921281358a38b300218ff288c33cdc3b4
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-09T19:48:58Z

    fix tests

commit da45df1536f112a14bfe15d6d30d307cdbd99d5b
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-10T19:20:33Z

    address comments

commit 9b05c7cd335f06079c241f681282bd36306dc739
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-12T18:26:24Z

    Merge branch 'master' of github.com:apache/spark into gen_out
    
    Conflicts:
    	sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala
    	sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
    	sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala
    	sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

commit 1c0ee96e80d5cc1909d7d5ec794b74e76979ae45
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-12T20:40:30Z

    fix worst case of broadcast join with two ints

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11188#issuecomment-186369304
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51571/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...

Posted by davies <gi...@git.apache.org>.
Github user davies closed the pull request at:

    https://github.com/apache/spark/pull/11188


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11188#issuecomment-186369295
  
    **[Test build #51571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51571/consoleFull)** for PR 11188 at commit [`f6416a6`](https://github.com/apache/spark/commit/f6416a6f262b3f1f9a552442c65ece93080ca4f9).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11188#issuecomment-183489396
  
    **[Test build #51201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51201/consoleFull)** for PR 11188 at commit [`1c0ee96`](https://github.com/apache/spark/commit/1c0ee96e80d5cc1909d7d5ec794b74e76979ae45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11188#issuecomment-183489748
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11188#issuecomment-186368925
  
    **[Test build #51571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51571/consoleFull)** for PR 11188 at commit [`f6416a6`](https://github.com/apache/spark/commit/f6416a6f262b3f1f9a552442c65ece93080ca4f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11188#issuecomment-183489745
  
    **[Test build #51201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51201/consoleFull)** for PR 11188 at commit [`1c0ee96`](https://github.com/apache/spark/commit/1c0ee96e80d5cc1909d7d5ec794b74e76979ae45).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13304] [SQL] Fix worst case of broadcas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11188#issuecomment-186369299
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13314] [SQL] Fix worst case of broadcas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11188#issuecomment-183489750
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51201/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org