You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by scwf <gi...@git.apache.org> on 2015/04/21 15:52:10 UTC

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

GitHub user scwf opened a pull request:

    https://github.com/apache/spark/pull/5612

    [SPARK-7026][SQL] Fix bugs when there are non equal join predicates for left semi join

    When the `condition` extracted by `ExtractEquiJoinKeys` contain join Predicate for left semi join, we can not plan it as semiJoin. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/scwf/spark spark-7026

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5612.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5612
    
----
commit a0f1bddbde11c5290ae0284d7fbc6833341512bf
Author: wangfei <wa...@huawei.com>
Date:   2015-04-21T13:46:59Z

    Fix bug when there are non equal join predicates for left semi join

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94848445
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30675/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95459557
  
    Get it. interesting case!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94852400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30676/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94985307
  
    @adrian-wang


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95001061
  
    @adrian-wang , So when the condition is join predicate how you handle it in leftsemijoin?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94848399
  
      [Test build #30675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30675/consoleFull) for   PR 5612 at commit [`a0f1bdd`](https://github.com/apache/spark/commit/a0f1bddbde11c5290ae0284d7fbc6833341512bf).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95441755
  
    in this case `select a1 from t1 left semi join t3 on t1.a1 = t3.a3` will not go into hash join, it should go with left semi join since there is no additional condition


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95464884
  
      [Test build #30813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30813/consoleFull) for   PR 5612 at commit [`3d7f589`](https://github.com/apache/spark/commit/3d7f589ec749ab837bb87ba4d356591dcf992f47).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95464897
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30813/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95456832
  
    select a1 from t1 left semi join t3 on t1.a1 = t3.a3 and t1.a1>t3.b3 - 100;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94985097
  
    This is actually a bug that we couldn't extract the condition and add it on top as a Filter for the left semi join. It's not right in logic. We probably need to add the `condition` field for `LeftSemiJoinHash`, `BroadcastLeftSemiJonHash` and `LeftSemiJoinBNL` etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5612#discussion_r28827377
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -36,12 +36,14 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
         def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
           case ExtractEquiJoinKeys(LeftSemi, leftKeys, rightKeys, condition, left, right)
             if sqlContext.conf.autoBroadcastJoinThreshold > 0 &&
    -          right.statistics.sizeInBytes <= sqlContext.conf.autoBroadcastJoinThreshold =>
    +          right.statistics.sizeInBytes <= sqlContext.conf.autoBroadcastJoinThreshold &&
    +          canEvaluate(condition.getOrElse(Literal(true)), left) =>
             val semiJoin = joins.BroadcastLeftSemiJoinHash(
               leftKeys, rightKeys, planLater(left), planLater(right))
             condition.map(Filter(_, semiJoin)).getOrElse(semiJoin) :: Nil
           // Find left semi joins where at least some predicates can be evaluated by matching join keys
    -      case ExtractEquiJoinKeys(LeftSemi, leftKeys, rightKeys, condition, left, right) =>
    +      case ExtractEquiJoinKeys(LeftSemi, leftKeys, rightKeys, condition, left, right)
    +        if canEvaluate(condition.getOrElse(Literal(true)), left) =>
    --- End diff --
    
    I don't think this is the best solution as you are going to fall back to broadcasting, even when there are equality keys that can be used for hashing.  This will OOM when both sides are large.  Instead, I think that you should add a case that tries to plan a HashJoin, but throws out the right output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94975693
  
    @marmbrus ,you are right, updated, does this ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95423015
  
    table t1 (a1 int)
    table t2 (a2 int)
    t1: {1,2,3}
    t2: {2,2,2}
    
        select a1 from t1 left semi join t2 on t1.a1 = t2.a2;
    should output {2}
    
    the result of shuffledHashJoin {2,2,2}
    After filter nothing: {2,2,2}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94852387
  
      [Test build #30676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30676/consoleFull) for   PR 5612 at commit [`6a474a8`](https://github.com/apache/spark/commit/6a474a8f507a9e7515a63751c5adea5c3abdba44).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95401013
  
    @scwf 
    If you do shuffleHashJoin and then filter the result, you may get wrong results for the join if the right table contains rows with same key.
    see my patch for this at #5643 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95463452
  
    i am closing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94998467
  
    @chenghao-intel , For this case i do not think we should add a `condition` field to concrete leftsemi join physical node. This is because the condition is a  join predicate(such as `x.a >= y.a + 2`),  we can not use leftsemijoin, should use hash join instead. 
    
    But when the condition is not a  join predicate (such as `x.a >= 1 and  y.a + 2 < 3`), we can add a `condition` field to do the filter in leftsemi join, i think this is a optimization.
    
    This PR focus on the issue for condition is a  join predicate. 
    
    For the condition is not join predicate, i am writring a new PR.
    
    /cc @marmbrus any more comments?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94994504
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30712/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95001462
  
    IMO in left semi join, we would build a hashset for build iterator and filter stream iterator based on the hashset, adding a join predicate field in the lefesemi join may probably damage the mechanism or make it mush complex, so i  used hash join here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94803116
  
      [Test build #30675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30675/consoleFull) for   PR 5612 at commit [`a0f1bdd`](https://github.com/apache/spark/commit/a0f1bddbde11c5290ae0284d7fbc6833341512bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94808428
  
      [Test build #30676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30676/consoleFull) for   PR 5612 at commit [`6a474a8`](https://github.com/apache/spark/commit/6a474a8f507a9e7515a63751c5adea5c3abdba44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95446474
  
    Assume we have data t1(2), t3((2,2), (2,1))
    
    `select a1 from t1 left semi join t3 on t1.a1 = t3.a3 and t1.a1>=t3.b3` will only outputs 2.
    However, if we transformed that as Left Outer Join + Filter, which kind of like: (as you did in another PR)
    `SELECT a1 from (select t1.a1, t3.b3 from t1 left outer join t3 on t1.a1=t3.a3) where t1.a1>=t3.b3`, it will output 2 rows (2, 2).
    
    That's why we have to add the `condition` field for `HashLeftSemiJoin`, as well as the broadcast left semijoin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94976139
  
      [Test build #30712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30712/consoleFull) for   PR 5612 at commit [`8687fa4`](https://github.com/apache/spark/commit/8687fa41205dd301e3ee449840fa7caf7961794e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95438125
  
      [Test build #30813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30813/consoleFull) for   PR 5612 at commit [`3d7f589`](https://github.com/apache/spark/commit/3d7f589ec749ab837bb87ba4d356591dcf992f47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95440920
  
    This is still not right... 
    table t1 (a1 int)
    table t3 (a3 int, b3 int)
    t1: {1,2,3}
    t3: {(2,1),(2,2),(2,2)}
        select a1 from t1 left semi join t3 on t1.a1 = t3.a3;
    should output {2}
    
    Distinct(right) will transform{(2,1),(2,2),(2,2)} into {(2,1),(2,2)}
    Then the join produce {(2,2,1),(2,2,2)}
    and the filtered project gets {2,2}
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95000032
  
    I think @chenghao-intel 's comment makes sense, This PR could damage the performance here as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf closed the pull request at:

    https://github.com/apache/spark/pull/5612


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-95415298
  
    >>>
    If you do shuffleHashJoin and then filter the result, you may get wrong results for the join if the right table contains rows with same key.
    
    can you give a example or a test case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-7026][SQL] Fix bugs when there are non ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5612#issuecomment-94994480
  
      [Test build #30712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30712/consoleFull) for   PR 5612 at commit [`8687fa4`](https://github.com/apache/spark/commit/8687fa41205dd301e3ee449840fa7caf7961794e).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org