You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/01/04 03:48:40 UTC

[GitHub] spark pull request: Outer join elimination by parent join predicat...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/10566

    Outer join elimination by parent join predicate

    This PR is another enhancement to Optimizer. It does not conflict with the other PRs (https://github.com/apache/spark/pull/10542 and https://github.com/apache/spark/pull/10551).  
    
    Given an outer join is involved in another join (called parent join), when the join type of the parent join is inner, left-semi, left-outer and right-outer, checking if the join condition of the parent join satisfies the following two conditions:
      1) there exist null filtering predicates against the columns in the null-supplying side of parent join.
      2) these columns are from the child join.
    
    If having such join predicates, execute the elimination rules:
     - full outer -> inner if both sides of the child join have such predicates
     - left outer -> inner if the right side of the child join has such predicates
     - right outer -> inner if the left side of the child join has such predicates
     - full outer -> left outer if only the left side of the child join has such predicates
     - full outer -> right outer if only the right side of the child join has such predicates
    
    If applicable, this can greatly improve the performance, since outer join is much slower than inner join, full outer join is much slower than left/right outer join. 
    
    BTW, since the rule is different from the rule in https://github.com/apache/spark/pull/10542, I did not merge them in the same one for simplifying the code review. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark OuterJoinEliminationByParentJoinPredicate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10566
    
----
commit bde74f83e24c2dc9bd9fd9e5541362049594c972
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-03T17:45:38Z

    Merge remote-tracking branch 'upstream/master' into OuterJoinEliminationByParentJoinPredicate

commit e18ba758aa94cc75115cc689f49b75ccd5d0ce51
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-04T02:21:59Z

    Merge remote-tracking branch 'upstream/master' into OuterJoinEliminationByParentJoinPredicate

commit d6a6e9cc31b0f7547b35cf25884135ea65b03676
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-04T02:40:26Z

    outer join elimination by parent join.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-189425893
  
    The existing constraint propagation is bottom up. The join conditions of full-outer joins will not filter out NULL. 
    
    Here, it is top down. The join conditions of full-outer joins can filter out the NULL of the child outer joins. 
    
    Will open a separate PR for top-down constraint propagation. Thanks for your suggestions!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10566#discussion_r52503590
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -769,6 +770,107 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper {
     }
     
     /**
    + * Elimination of Outer Join by Parent Join Condition
    + *
    + * Given an outer join is involved in another join (called parent join), when the join type of the
    + * parent join is inner, left-semi, left-outer and right-outer, checking if the join condition of
    + * the parent join satisfies the following two conditions:
    + *
    + * 1) there exist null filtering predicates against the columns in the null-supplying side of
    + *    parent join.
    + * 2) these columns are from the child join.
    + *
    + * If having such join predicates, execute the elimination rules:
    + * - full outer -> inner if both sides of the child join have such predicates
    + * - left outer -> inner if the right side of the child join has such predicates
    + * - right outer -> inner if the left side of the child join has such predicates
    + * - full outer -> left outer if only the left side of the child join has such predicates
    + * - full outer -> right outer if only the right side of the child join has such predicates
    + *
    + */
    +object OuterJoinElimination extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  private def containsAttr(plan: LogicalPlan, attr: Attribute): Boolean =
    +    plan.outputSet.exists(_.semanticEquals(attr))
    +
    +  private def hasNullFilteringPredicate(predicate: Expression, plan: LogicalPlan): Boolean = {
    --- End diff --
    
    @gatorsmile similar to https://github.com/apache/spark/pull/10566, I think we should now be just able to apply this optimization rule more generally along the lines of:
    
    ```scala
      def apply(plan: LogicalPlan): LogicalPlan = plan transform {
        case f @ Filter(condition, j @ Join(_, _, RightOuter | LeftOuter | FullOuter, _)) =>
          Filter(condition, buildNewJoin(f, j))
    
        // Case 1: when parent join is Inner|LeftSemi|LeftOuter and the child join is on the right side
        case pj @ Join(pLeft, j @ Join(left, right, RightOuter|LeftOuter|FullOuter, condition), Inner|LeftSemi|LeftOuter, Some(pJoinCond)) =>
          Join(pLeft, buildNewJoin(pj, j), pj.joinType, Some(pJoinCond))
    
        // Case 2: when parent join is Inner|LeftSemi|RightOuter and the child join is on the left side
        case pj @ Join(j @ Join(left, right, RightOuter|LeftOuter|FullOuter, condition), pRight, Inner|LeftSemi|RightOuter, Some(pJoinCond)) =>
          Join(buildNewJoin(pj, j), pRight, pj.joinType, Some(pJoinCond))
      }
    ```
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-188920585
  
    > The first way is based on the constraints. If the parent join is full outer, the parent join will not have any IsNotNull constraint. In the current constraint propagation, its constraints is Set.empty[Expression]. Thus, $"c.int" === $"a.int" is not eligible for using the first way.
    
    Why isn't the constraint present?  We should fix that instead of inventing another unrelated way to reason about nullability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-189566284
  
    The  will be based on 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-188690764
  
    **[Test build #51943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51943/consoleFull)** for PR 10566 at commit [`1a9ebdf`](https://github.com/apache/spark/commit/1a9ebdff1da8b0661738db1c7cf466344261ed33).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-189434167
  
    : ) Basically, top-down constraint propagation has been down in the optimizer rules:
    
    -       PushPredicateThroughJoin,
    -       PushPredicateThroughProject,
    -       PushPredicateThroughGenerate,
    -       PushPredicateThroughAggregate,
    
    Plan to add a new rule in optimizer for constraints pushdown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-188663070
  
    First, will add test cases to `OuterJoinEliminationSuite` tomorrow. 
    
    Second, the current fix does not cover all the possible cases. I need to get the inputs from you about the issues this PR is facing:
    
    ```scala
        val df = Seq((1, 2, "1"), (3, 4, "3")).toDF("int", "int2", "str").as("a")
        val df2 = Seq((1, 2, "1"), (5, 6, "5")).toDF("int", "int2", "str").as("b")
        val df3 = Seq((1, 3, "1"), (4, 6, "5")).toDF("int", "int2", "str").as("c")
    
        // Full -> Left
        val full2Left = df.join(df2, $"a.int" === $"b.int", "full")
          .join(df3, $"c.int" === $"a.int", "right").select($"a.*", $"b.*", $"c.*")
    ```
    In the above case, the parent join condition `$"c.int" === $"a.int"` is not eligible for the current two ways we are using to decide if the predicates are null filtering.
    
    1. The first way is based on the constraints. If the parent join is full outer, the parent join will not have any IsNotNull constraint. In the current constraint propagation, its constraints is `Set.empty[Expression]`. Thus, `$"c.int" === $"a.int"` is not eligible for using the first way. 
    
    2. The second way is based on the run-time evaluation, `canFilterOutNull`. This requires that all the attributes are from the same side. In the predicate `$"c.int" === $"a.int"`, `$"a.int"` is from the left side, but `$"c.int"` is not from the left side. (Actually, `$"c.int"` is from the other side in the parent join.) Thus, it is not eligible for the second way too.
    
    However, the parent join condition `$"c.int" === $"a.int"` is very common in the join condition. We definitely can use such predicates as null-filtering predicates. Maybe we can keep the original way as the third way, as shown in the following link:
    https://github.com/gatorsmile/spark/blob/d6a6e9cc31b0f7547b35cf25884135ea65b03676/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L798-L799
    
    Does that look good to you? Thanks! : ) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-168577564
  
    **[Test build #48631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48631/consoleFull)** for PR 10566 at commit [`d6a6e9c`](https://github.com/apache/spark/commit/d6a6e9cc31b0f7547b35cf25884135ea65b03676).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
GitHub user gatorsmile reopened a pull request:

    https://github.com/apache/spark/pull/10566

    [SPARK-12613] [SQL] Outer Join Elimination by Parent Join Condition

    This PR is another enhancement to Optimizer. It does not conflict with the other PRs (https://github.com/apache/spark/pull/10567 and https://github.com/apache/spark/pull/10551).  
    
    Given an outer join (**OJ**) is involved in another join (called parent join **PJ**), when the join type of **PJ**  is `inner`, `left-semi`, `left-outer` and `right-outer`, checking if the join condition of the  **PJ** satisfies the following two conditions:
      1) there exist null filtering predicates against the columns in the null-supplying side of **PJ**.
      2) these columns are from **OJ**.
    
    If having such join predicates, execute the elimination rules:
     - `full outer` -> `inner` if both sides of **OJ** have such predicates
     - `left outer` -> `inner` if the right side of **OJ** has such predicates
     - `right outer` -> `inner` if the left side of **OJ** has such predicates
     - `full outer` -> `left outer` if only the left side of **OJ** has such predicates
     - `full outer` -> `right outer` if only the right side of **OJ** has such predicates
    
    If applicable, this can greatly improve the performance, since `outer join` is much slower than `inner join`, `full outer` join is much slower than `left`/`right outer` join. 
    
    BTW, since the rule is different from the rule in https://github.com/apache/spark/pull/10567, I did not merge them in the same one for simplifying the code review. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark OuterJoinEliminationByParentJoinPredicate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10566
    
----
commit 01e4cdfcfc4ac37644165923c6e8eb65fcfdf3ac
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-13T22:50:39Z

    Merge remote-tracking branch 'upstream/master'

commit 6835704c273abc13e8eda37f5a10715027e4d17b
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-14T02:50:51Z

    Merge remote-tracking branch 'upstream/master'

commit 9180687775649f97763bdbd7c004fe6fc392989c
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-14T17:01:59Z

    Merge remote-tracking branch 'upstream/master'

commit b38a21ef6146784e4b93ef4ce8c899f1eee14572
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-17T02:30:26Z

    SPARK-11633

commit d2b84af8cce7fc2c03c748a2d443c07bad3f0ed1
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-17T02:32:12Z

    Merge remote-tracking branch 'upstream/master' into joinMakeCopy

commit fda8025195a0f872c9b357e8d86a1ce37d4c8379
Author: xiaoli <li...@gmail.com>
Date:   2015-11-17T04:16:43Z

    Merge remote-tracking branch 'upstream/master'

commit ac0dccd0bc7149ec1e6a2e89d2c462388ad4a80c
Author: xiaoli <li...@gmail.com>
Date:   2015-11-17T04:18:59Z

    Merge branch 'master' of https://github.com/gatorsmile/spark

commit 6e0018b67faf769f11032a217417bd09bace7ed7
Author: Xiao Li <xi...@xiaos-macbook-pro.local>
Date:   2015-11-20T18:51:19Z

    Merge remote-tracking branch 'upstream/master'

commit 0546772f151f83d6d3cf4d000cbe341f52545007
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-20T18:56:45Z

    converge

commit b37a64f13956b6ddd0e38ddfd9fe1caee611f1a8
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-20T18:58:37Z

    converge

commit bde74f83e24c2dc9bd9fd9e5541362049594c972
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-03T17:45:38Z

    Merge remote-tracking branch 'upstream/master' into OuterJoinEliminationByParentJoinPredicate

commit e18ba758aa94cc75115cc689f49b75ccd5d0ce51
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-04T02:21:59Z

    Merge remote-tracking branch 'upstream/master' into OuterJoinEliminationByParentJoinPredicate

commit d6a6e9cc31b0f7547b35cf25884135ea65b03676
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-04T02:40:26Z

    outer join elimination by parent join.

commit c2a872c226dd3ca59c12ee74ff7b12372be639fe
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-06T04:09:27Z

    Merge remote-tracking branch 'upstream/master'

commit ab6dbd74421fde3690c53c80c9873510d70174e5
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-06T19:59:17Z

    Merge remote-tracking branch 'upstream/master'

commit 4276356305dc70585e0d76bc271da59cfd446867
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-06T22:06:13Z

    Merge remote-tracking branch 'upstream/master'

commit 2dab7087855aa7e3411a1399fb1535ab30f2aa14
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-07T15:28:57Z

    Merge remote-tracking branch 'upstream/master'

commit 0458770af543520f3bc7d45289d8aef07b0a36e6
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-08T22:20:32Z

    Merge remote-tracking branch 'upstream/master'

commit 1debdfa8d14a6330ab46ed4b62c71e4fb35ca286
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-09T05:31:06Z

    Merge remote-tracking branch 'upstream/master'

commit 763706d8c13d21faba182a30c885af1e1b7c9ee0
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-14T07:01:13Z

    Merge remote-tracking branch 'upstream/master'

commit 4de6ec1c78a726ea3bd85ec32c1038d4c1b2e713
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-18T22:13:58Z

    Merge remote-tracking branch 'upstream/master'

commit 9422a4fc32fec9b956ae1db5e5c6f3599a230fbc
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-19T21:13:35Z

    Merge remote-tracking branch 'upstream/master'

commit 52bdf48b2cae6322cdb6bcb8b543e20acbdb15fc
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-20T17:10:26Z

    Merge remote-tracking branch 'upstream/master'

commit 1e95df34bd7d0ca8a0fa389db53c4bd030e7fa10
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-23T05:51:41Z

    Merge remote-tracking branch 'upstream/master'

commit fab24cfa2e1a3ad8b1bde7aaa2dcba144df31e53
Author: gatorsmile <ga...@gmail.com>
Date:   2016-02-01T07:39:27Z

    Merge remote-tracking branch 'upstream/master'

commit 8b2e33be439722313c8129197f72ffbb5f5a0b48
Author: gatorsmile <ga...@gmail.com>
Date:   2016-02-05T07:01:36Z

    Merge remote-tracking branch 'upstream/master'

commit 2ee1876efe378de25011990fcaaa1bffc9aac5bc
Author: gatorsmile <ga...@gmail.com>
Date:   2016-02-11T01:53:14Z

    Merge remote-tracking branch 'upstream/master'

commit b9f0090fdba30fa0c0bd41f720af05c71a6ddfc0
Author: gatorsmile <ga...@gmail.com>
Date:   2016-02-12T21:19:32Z

    Merge remote-tracking branch 'upstream/master'

commit ade6f7e36eb8f2328af74f6a98a46a89b7d804d9
Author: gatorsmile <ga...@gmail.com>
Date:   2016-02-15T23:57:30Z

    Merge remote-tracking branch 'upstream/master'

commit 9fd63d20c8636ed8145526c784083612b993ed6f
Author: gatorsmile <ga...@gmail.com>
Date:   2016-02-19T03:45:37Z

    Merge remote-tracking branch 'upstream/master'

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10566#discussion_r49672786
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -769,6 +770,107 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper {
     }
     
     /**
    + * Elimination of Outer Join by Parent Join Condition
    + *
    + * Given an outer join is involved in another join (called parent join), when the join type of the
    + * parent join is inner, left-semi, left-outer and right-outer, checking if the join condition of
    + * the parent join satisfies the following two conditions:
    + *
    + * 1) there exist null filtering predicates against the columns in the null-supplying side of
    + *    parent join.
    + * 2) these columns are from the child join.
    + *
    + * If having such join predicates, execute the elimination rules:
    + * - full outer -> inner if both sides of the child join have such predicates
    + * - left outer -> inner if the right side of the child join has such predicates
    + * - right outer -> inner if the left side of the child join has such predicates
    + * - full outer -> left outer if only the left side of the child join has such predicates
    + * - full outer -> right outer if only the right side of the child join has such predicates
    + *
    + */
    +object OuterJoinElimination extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  private def containsAttr(plan: LogicalPlan, attr: Attribute): Boolean =
    +    plan.outputSet.exists(_.semanticEquals(attr))
    +
    +  private def hasNullFilteringPredicate(predicate: Expression, plan: LogicalPlan): Boolean = {
    --- End diff --
    
    As I commented on the other PR, I think we should have a more general way to infer null propagation / filtering.  Maybe you can discuss with @sameeragarwal and then update these PRs after his machinery is available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-168577703
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10566: [SPARK-12613] [SQL] Outer Join Elimination by Par...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10566


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-168595589
  
    Sure, I can make a try! Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-188656355
  
    **[Test build #51943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51943/consoleFull)** for PR 10566 at commit [`1a9ebdf`](https://github.com/apache/spark/commit/1a9ebdff1da8b0661738db1c7cf466344261ed33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10566#discussion_r52566542
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -769,6 +770,107 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper {
     }
     
     /**
    + * Elimination of Outer Join by Parent Join Condition
    + *
    + * Given an outer join is involved in another join (called parent join), when the join type of the
    + * parent join is inner, left-semi, left-outer and right-outer, checking if the join condition of
    + * the parent join satisfies the following two conditions:
    + *
    + * 1) there exist null filtering predicates against the columns in the null-supplying side of
    + *    parent join.
    + * 2) these columns are from the child join.
    + *
    + * If having such join predicates, execute the elimination rules:
    + * - full outer -> inner if both sides of the child join have such predicates
    + * - left outer -> inner if the right side of the child join has such predicates
    + * - right outer -> inner if the left side of the child join has such predicates
    + * - full outer -> left outer if only the left side of the child join has such predicates
    + * - full outer -> right outer if only the right side of the child join has such predicates
    + *
    + */
    +object OuterJoinElimination extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  private def containsAttr(plan: LogicalPlan, attr: Attribute): Boolean =
    +    plan.outputSet.exists(_.semanticEquals(attr))
    +
    +  private def hasNullFilteringPredicate(predicate: Expression, plan: LogicalPlan): Boolean = {
    --- End diff --
    
    Let me do the outer join elimination by `Filter` at first. That one can directly use the existing infrastructure of constraint propagation. https://github.com/apache/spark/pull/10567 Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Outer Join Elimination by Parent Join Conditio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-168569409
  
    **[Test build #48631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48631/consoleFull)** for PR 10566 at commit [`d6a6e9c`](https://github.com/apache/spark/commit/d6a6e9cc31b0f7547b35cf25884135ea65b03676).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10566#discussion_r49676639
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -769,6 +770,107 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper {
     }
     
     /**
    + * Elimination of Outer Join by Parent Join Condition
    + *
    + * Given an outer join is involved in another join (called parent join), when the join type of the
    + * parent join is inner, left-semi, left-outer and right-outer, checking if the join condition of
    + * the parent join satisfies the following two conditions:
    + *
    + * 1) there exist null filtering predicates against the columns in the null-supplying side of
    + *    parent join.
    + * 2) these columns are from the child join.
    + *
    + * If having such join predicates, execute the elimination rules:
    + * - full outer -> inner if both sides of the child join have such predicates
    + * - left outer -> inner if the right side of the child join has such predicates
    + * - right outer -> inner if the left side of the child join has such predicates
    + * - full outer -> left outer if only the left side of the child join has such predicates
    + * - full outer -> right outer if only the right side of the child join has such predicates
    + *
    + */
    +object OuterJoinElimination extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  private def containsAttr(plan: LogicalPlan, attr: Attribute): Boolean =
    +    plan.outputSet.exists(_.semanticEquals(attr))
    +
    +  private def hasNullFilteringPredicate(predicate: Expression, plan: LogicalPlan): Boolean = {
    --- End diff --
    
    Sure, will do. Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10566: [SPARK-12613] [SQL] Outer Join Elimination by Parent Joi...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/10566
  
    Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-168577704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48631/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10566#discussion_r52505007
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -769,6 +770,107 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper {
     }
     
     /**
    + * Elimination of Outer Join by Parent Join Condition
    + *
    + * Given an outer join is involved in another join (called parent join), when the join type of the
    + * parent join is inner, left-semi, left-outer and right-outer, checking if the join condition of
    + * the parent join satisfies the following two conditions:
    + *
    + * 1) there exist null filtering predicates against the columns in the null-supplying side of
    + *    parent join.
    + * 2) these columns are from the child join.
    + *
    + * If having such join predicates, execute the elimination rules:
    + * - full outer -> inner if both sides of the child join have such predicates
    + * - left outer -> inner if the right side of the child join has such predicates
    + * - right outer -> inner if the left side of the child join has such predicates
    + * - full outer -> left outer if only the left side of the child join has such predicates
    + * - full outer -> right outer if only the right side of the child join has such predicates
    + *
    + */
    +object OuterJoinElimination extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  private def containsAttr(plan: LogicalPlan, attr: Attribute): Boolean =
    +    plan.outputSet.exists(_.semanticEquals(attr))
    +
    +  private def hasNullFilteringPredicate(predicate: Expression, plan: LogicalPlan): Boolean = {
    --- End diff --
    
    Sure, will do the changes. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-188416738
  
    To you want to update this now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-188456156
  
    Will do it tonight. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10566#discussion_r52566286
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -769,6 +770,107 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper {
     }
     
     /**
    + * Elimination of Outer Join by Parent Join Condition
    + *
    + * Given an outer join is involved in another join (called parent join), when the join type of the
    + * parent join is inner, left-semi, left-outer and right-outer, checking if the join condition of
    + * the parent join satisfies the following two conditions:
    + *
    + * 1) there exist null filtering predicates against the columns in the null-supplying side of
    + *    parent join.
    + * 2) these columns are from the child join.
    + *
    + * If having such join predicates, execute the elimination rules:
    + * - full outer -> inner if both sides of the child join have such predicates
    + * - left outer -> inner if the right side of the child join has such predicates
    + * - right outer -> inner if the left side of the child join has such predicates
    + * - full outer -> left outer if only the left side of the child join has such predicates
    + * - full outer -> right outer if only the right side of the child join has such predicates
    + *
    + */
    +object OuterJoinElimination extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  private def containsAttr(plan: LogicalPlan, attr: Attribute): Boolean =
    +    plan.outputSet.exists(_.semanticEquals(attr))
    +
    +  private def hasNullFilteringPredicate(predicate: Expression, plan: LogicalPlan): Boolean = {
    --- End diff --
    
    @sameeragarwal Unfortunately, they are unable to share the same `buildNewJoin` function. 
    
    For example, if the parent join is full outer, the parent join will not have any IsNotNull constraint. That means, the constraints is `Set.empty[Expression]`. However, the join condition of this parent join still can be used for outer join elimination of the child join. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-189507604
  
    After more thinking, in my opinion, the best way is to add extra `Filter` between two Join and let the existing Filter-condition-based rule to do outer join elimination, but we need another rule to remove unnecessary Filter which only contains Null constraints.
    
    Let me first create a PR to do Filter removal/cleaning. 
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/10566


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-188691566
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-168594949
  
    btw i created this: https://issues.apache.org/jira/browse/SPARK-12616 
    
    seems like something you can do?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10566#issuecomment-188691572
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51943/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org