You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2016/03/14 09:09:18 UTC

[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/11691

    [SPARK-13854][SQL] Add constraints to outer join

    JIRA: https://issues.apache.org/jira/browse/SPARK-13854
    
    ## What changes were proposed in this pull request?
    
    
    
    Currently, for left outer join we only keep left side constraint. For right outer join, we only keep right side constraints. For full outer join, the constraints are empty.
    
    In fact, the constraints are less than the actual constraints for the join operator.
    
    For example, for left outer join, besides the constraints from left side, the constraints of right side should be inherited with a bit modification.
    
    Consider a join as following:
    
        val tr1 = LocalRelation('a.int, 'b.int, 'c.int).subquery('tr1)
        val tr2 = LocalRelation('a.int, 'd.int, 'e.int).subquery('tr2)
    
        tr1.where('a.attr > 10)
          .join(tr2.where('d.attr < 100), LeftOuter, Some("tr1.a".attr === "tr2.a".attr))
    
    The constraints are not only "a" > 10, "a" is not null. It should also include ("d" is null || "d" < 100).
    
    ## How was this patch tested?
    
    Three tests in `ConstraintPropagationSuite` are modified for this PR.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 join-constraints

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11691.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11691
    
----
commit ab7ea3734777f378c7b2d809595f52a4bf8b4ce2
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-03-14T06:11:38Z

    init import.

commit f9a3ed035fd1b072f07f78bc63a2fdc8cddb3c7b
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-03-14T06:32:32Z

    Merge remote-tracking branch 'upstream/master' into join-constraints

commit deae036525bc58e6c81d41a8cadfa8b33f7f9d74
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-03-14T06:36:24Z

    Fix.

commit 1aa85500725a4a4d5a55583cc400d7b7d4171c37
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-03-14T07:57:22Z

    Add constraints to outer join.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196194355
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196193652
  
    **[Test build #53059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53059/consoleFull)** for PR 11691 at commit [`1aa8550`](https://github.com/apache/spark/commit/1aa85500725a4a4d5a55583cc400d7b7d4171c37).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196237594
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196565718
  
    Yea. I was thinking that we may use these constraints for some filtering and condition reduction. But I found there are some problems to do that with this disjunctive predicates. Let me close this now and if I figure how to use it I can re-open it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/11691


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196194359
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53059/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196368073
  
    @viirya I am not sure if we should add these Constraints now. These constraints are intentionally excluded from the original design for simplicity, I think. @marmbrus @sameeragarwal Please correct me if my understanding is wrong.
    
    Could you explain how to use them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196434407
  
    I tend to agree.  We certainly could enumerate many more constraints, but I think that we should limit the scope to those that we know we are going to use later (`IsNotNull` for example is very useful for lots of things).  Reasoning about disjunctive predicates on the other hand is not super easy, so I'd want to see concrete examples where this information was useful before going down this path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196236956
  
    **[Test build #53061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53061/consoleFull)** for PR 11691 at commit [`6379d8b`](https://github.com/apache/spark/commit/6379d8bf800c3f9af40875a9ac11f34da23860ee).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196202546
  
    **[Test build #53061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53061/consoleFull)** for PR 11691 at commit [`6379d8b`](https://github.com/apache/spark/commit/6379d8bf800c3f9af40875a9ac11f34da23860ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196194346
  
    **[Test build #53059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53059/consoleFull)** for PR 11691 at commit [`1aa8550`](https://github.com/apache/spark/commit/1aa85500725a4a4d5a55583cc400d7b7d4171c37).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13854][SQL] Add constraints to outer jo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11691#issuecomment-196237600
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53061/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org