You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hvanhovell <gi...@git.apache.org> on 2016/04/27 11:20:05 UTC

[GitHub] spark pull request: [SPARK-14590][SQL]Fix BroadcastHashJoin's uniq...

GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/12730

    [SPARK-14590][SQL]Fix BroadcastHashJoin's unique key Anti-Joins

    ### What changes were proposed in this pull request?
    Anti-Joins using BroadcastHashJoin's unique key code path are broken; it currently returns Semi Join results . This PR fixes this bug.
    
    ### How was this patch tested?
    Added tests cases to `ExistenceJoinSuite`.
    
    cc @davies @gatorsmile 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-14950

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12730.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12730
    
----
commit fb7c186a414cee416d00ad7815813d7e7477e463
Author: Herman van Hovell <hv...@questtec.nl>
Date:   2016-04-27T09:16:04Z

    Fix BroadcastHashJoin's unique key Anti-Joins

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14590][SQL] Fix BroadcastHashJoin's uni...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12730#issuecomment-215024447
  
    **[Test build #57112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57112/consoleFull)** for PR 12730 at commit [`fb7c186`](https://github.com/apache/spark/commit/fb7c186a414cee416d00ad7815813d7e7477e463).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14590][SQL] Fix BroadcastHashJoin's uni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12730#issuecomment-215047984
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14950][SQL] Fix BroadcastHashJoin's uni...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12730#discussion_r61283268
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -351,11 +359,12 @@ case class BroadcastHashJoinExec(
        */
       private def codegenAnti(ctx: CodegenContext, input: Seq[ExprCode]): String = {
         val (broadcastRelation, relationTerm) = prepareBroadcast(ctx)
    +    val uniqueKeyCodePath = broadcastRelation.value.keyIsUnique
         val (keyEv, anyNull) = genStreamSideJoinKey(ctx, input)
    -    val (matched, checkCondition, _) = getJoinCondition(ctx, input)
    +    val (matched, checkCondition, _) = getJoinCondition(ctx, input, uniqueKeyCodePath)
    --- End diff --
    
    it's correct, nwm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14590][SQL] Fix BroadcastHashJoin's uni...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12730#issuecomment-215113804
  
    LGTM, except the title:  [SPARK-14590] -> [SPARK-14950]



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14950][SQL] Fix BroadcastHashJoin's uni...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/12730#issuecomment-215154484
  
    Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14590][SQL] Fix BroadcastHashJoin's uni...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12730#issuecomment-215047536
  
    **[Test build #57112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57112/consoleFull)** for PR 12730 at commit [`fb7c186`](https://github.com/apache/spark/commit/fb7c186a414cee416d00ad7815813d7e7477e463).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14950][SQL] Fix BroadcastHashJoin's uni...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12730#discussion_r61282299
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ---
    @@ -351,11 +359,12 @@ case class BroadcastHashJoinExec(
        */
       private def codegenAnti(ctx: CodegenContext, input: Seq[ExprCode]): String = {
         val (broadcastRelation, relationTerm) = prepareBroadcast(ctx)
    +    val uniqueKeyCodePath = broadcastRelation.value.keyIsUnique
         val (keyEv, anyNull) = genStreamSideJoinKey(ctx, input)
    -    val (matched, checkCondition, _) = getJoinCondition(ctx, input)
    +    val (matched, checkCondition, _) = getJoinCondition(ctx, input, uniqueKeyCodePath)
    --- End diff --
    
    checkCondition is also used by non-unique-key


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14950][SQL] Fix BroadcastHashJoin's uni...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12730#issuecomment-215126781
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14590][SQL] Fix BroadcastHashJoin's uni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12730#issuecomment-215047989
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57112/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14950][SQL] Fix BroadcastHashJoin's uni...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12730


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org