You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by wzhfy <gi...@git.apache.org> on 2017/03/25 07:36:18 UTC

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with subqu...

GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/17428

    [SPARK-20094][SQL] Don't put predicate with subquery into join condition in ReorderJoin because it fails RewritePredicateSubquery.rewriteExistentialExpr

    ## What changes were proposed in this pull request?
    
    `ReorderJoin` collects all predicates and try to put them into join condition when creating ordered join. If a predicate with a subquery is in a join condition instead of a filter condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the subquery to an `ExistenceJoin`, and thus result in error.
    
    ## How was this patch tested?
    
    Add a new test case in `JoinOptimizationSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark noSubqueryInJoinCond

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17428
    
----
commit bd91947aaaa49050da59fb6caf704e572f842822
Author: wangzhenhua <wa...@huawei.com>
Date:   2017-03-25T07:19:42Z

    don't put predicate with subquery into join condition in ReorderJoin

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75212/testReport)** for PR 17428 at commit [`6af6b95`](https://github.com/apache/spark/commit/6af6b951c22a6fcffa611e1652bf33cf7d0a8c3e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by nsyca <gi...@git.apache.org>.

Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108222730
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala ---
    @@ -123,6 +136,31 @@ class JoinOptimizationSuite extends PlanTest {
         }
       }
     
    +  test("reorder inner joins - don't put predicate with IN subquery into join condition") {
    +    // ReorderJoin collects all predicates and try to put them into join condition when creating
    +    // ordered join. If a predicate with an IN subquery is in a join condition instead of a filter
    +    // condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the
    +    // subquery to an ExistenceJoin, and thus result in error.
    +    val x = testRelation.subquery('x)
    +    val y = testRelation1.subquery('y)
    +    val z = testRelation.subquery('z)
    +    val w = testRelation1.subquery('w)
    +    val exists: AttributeReference = AttributeReference("exists", BooleanType, nullable = false)()
    +
    +    val queryPlan = x.join(y).join(z)
    +      .where(("x.b".attr === "z.b".attr) && ("y.d".attr === "z.a".attr) &&
    +        ("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr)))))
    +
    +    val expectedPlan = x.join(z, Inner, Some("x.b".attr === "z.b".attr))
    +      .join(w, ExistenceJoin(exists), Some("z.c".attr === "w.d".attr))
    +      .where("x.a".attr > 1 || exists)
    +      .select("x.a".attr, "x.b".attr, "x.c".attr, "z.a".attr, "z.b".attr, "z.c".attr)
    +      .join(y, Inner, Some("y.d".attr === "z.a".attr))
    --- End diff --
    
    I guess your test case depends on the fact that currently we don't have an optimization to push down table y below the join of (x, z).
    
    Essentially, this is not a problem specifically to `ReorderJoin` as implied from the title of the PR. The problem is at the definition `canEvaluateWithinJoin`. I can demonstrate the same problem without the table y in this test case.
    
    A slight modification of your test case as shown below also reveals the problem.
    
    ````
        val queryPlan = x.join(z)
          .where(("x.b".attr === "z.b".attr) &&
            ("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr)))))
    
        val expectedPlan = x.join(z, Inner, Some("x.b".attr === "z.b".attr))
          .join(w, ExistenceJoin(exists), Some("z.c".attr === "w.d".attr))
          .where("x.a".attr > 1 || exists)
          .select("x.a".attr, "x.b".attr, "x.c".attr, "z.a".attr, "z.b".attr, "z.c".attr)
    ````
    
    which has a call to `canEvaluateWithinJoin` from the rule `PushPredicateThroughJoin`.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75265/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by nsyca <gi...@git.apache.org>.

Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108217534
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -289,6 +289,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
           case m: Map[_, _] => m
           case d: DataType => d // Avoid unpacking Structs
           case seq: Traversable[_] => seq.map(recursiveTransform)
    +      case ExistenceJoin(exists: Attribute) =>
    +        // `exists` must be an Attribute in ExistenceJoin
    +        ExistenceJoin(transformExpression(exists).asInstanceOf[Attribute])
    --- End diff --
    
    Can you give an example of `transformExpression` of an instance of `Attribute` not returning a value of type `Attribute` or its subtypes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75309/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108101023
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -90,6 +90,7 @@ trait PredicateHelper {
        * Returns true iff `expr` could be evaluated as a condition within join.
        */
       protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr match {
    +    case l: ListQuery => false
    --- End diff --
    
    add some comments to explain why `ListQuery` should always return false even the children is empty.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by nsyca <gi...@git.apache.org>.

Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108185240
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -289,6 +289,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
           case m: Map[_, _] => m
           case d: DataType => d // Avoid unpacking Structs
           case seq: Traversable[_] => seq.map(recursiveTransform)
    +      case ExistenceJoin(exists: Attribute) =>
    +        // `exists` must be an Attribute in ExistenceJoin
    +        ExistenceJoin(transformExpression(exists).asInstanceOf[Attribute])
    --- End diff --
    
    Is there a particular reason we need to cast the return type of `transformExpression(<instance_of_Attribute>)` to type `Attribute`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Preventing push down of IN sub...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17428


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with subquery int...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    cc @nsyca @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by nsyca <gi...@git.apache.org>.

Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108218188
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala ---
    @@ -123,6 +136,31 @@ class JoinOptimizationSuite extends PlanTest {
         }
       }
     
    +  test("reorder inner joins - don't put predicate with IN subquery into join condition") {
    +    // ReorderJoin collects all predicates and try to put them into join condition when creating
    +    // ordered join. If a predicate with an IN subquery is in a join condition instead of a filter
    +    // condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the
    +    // subquery to an ExistenceJoin, and thus result in error.
    +    val x = testRelation.subquery('x)
    +    val y = testRelation1.subquery('y)
    +    val z = testRelation.subquery('z)
    +    val w = testRelation1.subquery('w)
    +    val exists: AttributeReference = AttributeReference("exists", BooleanType, nullable = false)()
    +
    +    val queryPlan = x.join(y).join(z)
    +      .where(("x.b".attr === "z.b".attr) && ("y.d".attr === "z.a".attr) &&
    +        ("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr)))))
    +
    +    val expectedPlan = x.join(z, Inner, Some("x.b".attr === "z.b".attr))
    +      .join(w, ExistenceJoin(exists), Some("z.c".attr === "w.d".attr))
    +      .where("x.a".attr > 1 || exists)
    +      .select("x.a".attr, "x.b".attr, "x.c".attr, "z.a".attr, "z.b".attr, "z.c".attr)
    +      .join(y, Inner, Some("y.d".attr === "z.a".attr))
    --- End diff --
    
    My question was why Optimizer does not pick a plan of Join( Join(y, z), x). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108209215
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala ---
    @@ -123,6 +136,31 @@ class JoinOptimizationSuite extends PlanTest {
         }
       }
     
    +  test("reorder inner joins - don't put predicate with IN subquery into join condition") {
    +    // ReorderJoin collects all predicates and try to put them into join condition when creating
    +    // ordered join. If a predicate with an IN subquery is in a join condition instead of a filter
    +    // condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the
    +    // subquery to an ExistenceJoin, and thus result in error.
    +    val x = testRelation.subquery('x)
    +    val y = testRelation1.subquery('y)
    +    val z = testRelation.subquery('z)
    +    val w = testRelation1.subquery('w)
    +    val exists: AttributeReference = AttributeReference("exists", BooleanType, nullable = false)()
    +
    +    val queryPlan = x.join(y).join(z)
    +      .where(("x.b".attr === "z.b".attr) && ("y.d".attr === "z.a".attr) &&
    +        ("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr)))))
    +
    +    val expectedPlan = x.join(z, Inner, Some("x.b".attr === "z.b".attr))
    +      .join(w, ExistenceJoin(exists), Some("z.c".attr === "w.d".attr))
    +      .where("x.a".attr > 1 || exists)
    +      .select("x.a".attr, "x.b".attr, "x.c".attr, "z.a".attr, "z.b".attr, "z.c".attr)
    +      .join(y, Inner, Some("y.d".attr === "z.a".attr))
    --- End diff --
    
    This test suite is for `JoinReorder` rule. This rule is not cost based and it's not related to estimation. It only searches for joinable tables and put cartesian product to the end of the plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75257/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108373968
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -289,6 +289,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
           case m: Map[_, _] => m
           case d: DataType => d // Avoid unpacking Structs
           case seq: Traversable[_] => seq.map(recursiveTransform)
    +      case ExistenceJoin(exists: Attribute) =>
    +        // `exists` must be an Attribute in ExistenceJoin
    +        ExistenceJoin(transformExpression(exists).asInstanceOf[Attribute])
    --- End diff --
    
    Yes I feel the same. But after modification as suggested by @nsyca , this change can also be removed. Because I move the test case to `FilterPushdownSuite`, we don't have to touch the `RewritePredicateSubquery` rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75265/testReport)** for PR 17428 at commit [`c32599d`](https://github.com/apache/spark/commit/c32599de69b9246a19c2ac9794e8b1445d6d8966).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75265/testReport)** for PR 17428 at commit [`c32599d`](https://github.com/apache/spark/commit/c32599de69b9246a19c2ac9794e8b1445d6d8966).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108116714
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -90,6 +90,7 @@ trait PredicateHelper {
        * Returns true iff `expr` could be evaluated as a condition within join.
        */
       protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr match {
    +    case l: ListQuery => false
    --- End diff --
    
    OK. The comment looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by nsyca <gi...@git.apache.org>.

Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108195342
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -90,6 +90,10 @@ trait PredicateHelper {
        * Returns true iff `expr` could be evaluated as a condition within join.
        */
       protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr match {
    +    case l: ListQuery =>
    +      // Now pulling up IN predicates is deferred to `RewritePredicateSubquery`. As a result,
    +      // `ListQuery`'s children can be empty before that rule and falls into the next case.
    +      false
    --- End diff --
    
    Suggested text in the comment:
    
    [NOT] IN (<ListQuery>) expression cannot be evaluated as part of a Join operator. Currently the only way to evaluate a [NOT] IN subquery is to convert it to a LeftSemi or LeftAnti by `RewritePredicateSubquery` rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75308/testReport)** for PR 17428 at commit [`bf7d202`](https://github.com/apache/spark/commit/bf7d202e15f892ead08e7efeb0b2332bed1082ba).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    LGTM - pending jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75210/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75308/testReport)** for PR 17428 at commit [`bf7d202`](https://github.com/apache/spark/commit/bf7d202e15f892ead08e7efeb0b2332bed1082ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108086458
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -90,6 +90,7 @@ trait PredicateHelper {
        * Returns true iff `expr` could be evaluated as a condition within join.
        */
       protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr match {
    +    case l: ListQuery => false
    --- End diff --
    
    Doesn't the following `case e: SubqueryExpression` prevent it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75257/testReport)** for PR 17428 at commit [`9f7dac5`](https://github.com/apache/spark/commit/9f7dac5928bc70e240096798574b035490457f5f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75309/testReport)** for PR 17428 at commit [`0cc6137`](https://github.com/apache/spark/commit/0cc61376ea306fc9234da87f91c496fc7a7eba4a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75309/testReport)** for PR 17428 at commit [`0cc6137`](https://github.com/apache/spark/commit/0cc61376ea306fc9234da87f91c496fc7a7eba4a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108104559
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -90,6 +90,7 @@ trait PredicateHelper {
        * Returns true iff `expr` could be evaluated as a condition within join.
        */
       protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr match {
    +    case l: ListQuery => false
    --- End diff --
    
    @viirya Now pulling up IN predicates is deferred to `RewritePredicateSubquery`. As a result, `ListQuery`'s children can be empty before that rule and falls into `case e: SubqueryExpression`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with subquery int...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75210/testReport)** for PR 17428 at commit [`bd91947`](https://github.com/apache/spark/commit/bd91947aaaa49050da59fb6caf704e572f842822).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75257/testReport)** for PR 17428 at commit [`9f7dac5`](https://github.com/apache/spark/commit/9f7dac5928bc70e240096798574b035490457f5f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by nsyca <gi...@git.apache.org>.

Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108224172
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala ---
    @@ -123,6 +136,31 @@ class JoinOptimizationSuite extends PlanTest {
         }
       }
     
    +  test("reorder inner joins - don't put predicate with IN subquery into join condition") {
    +    // ReorderJoin collects all predicates and try to put them into join condition when creating
    +    // ordered join. If a predicate with an IN subquery is in a join condition instead of a filter
    +    // condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the
    +    // subquery to an ExistenceJoin, and thus result in error.
    +    val x = testRelation.subquery('x)
    +    val y = testRelation1.subquery('y)
    +    val z = testRelation.subquery('z)
    +    val w = testRelation1.subquery('w)
    +    val exists: AttributeReference = AttributeReference("exists", BooleanType, nullable = false)()
    +
    +    val queryPlan = x.join(y).join(z)
    +      .where(("x.b".attr === "z.b".attr) && ("y.d".attr === "z.a".attr) &&
    +        ("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr)))))
    +
    +    val expectedPlan = x.join(z, Inner, Some("x.b".attr === "z.b".attr))
    +      .join(w, ExistenceJoin(exists), Some("z.c".attr === "w.d".attr))
    +      .where("x.a".attr > 1 || exists)
    +      .select("x.a".attr, "x.b".attr, "x.c".attr, "z.a".attr, "z.b".attr, "z.c".attr)
    +      .join(y, Inner, Some("y.d".attr === "z.a".attr))
    --- End diff --
    
    May I suggest the following:
    - Change the title of the PR to "Preventing push down of IN subquery to Join operator"
    - Update the test case to its bare minimum by removing the join to table y
    - Change the title of the test case to reflect the new test
    
    On the last point, removing the "reorder inner joins - " should do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108208611
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -90,6 +90,10 @@ trait PredicateHelper {
        * Returns true iff `expr` could be evaluated as a condition within join.
        */
       protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr match {
    +    case l: ListQuery =>
    +      // Now pulling up IN predicates is deferred to `RewritePredicateSubquery`. As a result,
    +      // `ListQuery`'s children can be empty before that rule and falls into the next case.
    +      false
    --- End diff --
    
    ok, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with subquery int...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75212/testReport)** for PR 17428 at commit [`6af6b95`](https://github.com/apache/spark/commit/6af6b951c22a6fcffa611e1652bf33cf7d0a8c3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    **[Test build #75210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75210/testReport)** for PR 17428 at commit [`bd91947`](https://github.com/apache/spark/commit/bd91947aaaa49050da59fb6caf704e572f842822).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108373273
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala ---
    @@ -123,6 +136,31 @@ class JoinOptimizationSuite extends PlanTest {
         }
       }
     
    +  test("reorder inner joins - don't put predicate with IN subquery into join condition") {
    +    // ReorderJoin collects all predicates and try to put them into join condition when creating
    +    // ordered join. If a predicate with an IN subquery is in a join condition instead of a filter
    +    // condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the
    +    // subquery to an ExistenceJoin, and thus result in error.
    +    val x = testRelation.subquery('x)
    +    val y = testRelation1.subquery('y)
    +    val z = testRelation.subquery('z)
    +    val w = testRelation1.subquery('w)
    +    val exists: AttributeReference = AttributeReference("exists", BooleanType, nullable = false)()
    +
    +    val queryPlan = x.join(y).join(z)
    +      .where(("x.b".attr === "z.b".attr) && ("y.d".attr === "z.a".attr) &&
    +        ("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr)))))
    +
    +    val expectedPlan = x.join(z, Inner, Some("x.b".attr === "z.b".attr))
    +      .join(w, ExistenceJoin(exists), Some("z.c".attr === "w.d".attr))
    +      .where("x.a".attr > 1 || exists)
    +      .select("x.a".attr, "x.b".attr, "x.c".attr, "z.a".attr, "z.b".attr, "z.c".attr)
    +      .join(y, Inner, Some("y.d".attr === "z.a".attr))
    --- End diff --
    
    Yea, thanks for the suggestion. Also I will remove the test case to `FilterPushdownSuite` since now it will not go into `ReorderJoin` rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108103096
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -90,6 +90,7 @@ trait PredicateHelper {
        * Returns true iff `expr` could be evaluated as a condition within join.
        */
       protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr match {
    +    case l: ListQuery => false
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    also cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108220965
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -289,6 +289,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
           case m: Map[_, _] => m
           case d: DataType => d // Avoid unpacking Structs
           case seq: Traversable[_] => seq.map(recursiveTransform)
    +      case ExistenceJoin(exists: Attribute) =>
    +        // `exists` must be an Attribute in ExistenceJoin
    +        ExistenceJoin(transformExpression(exists).asInstanceOf[Attribute])
    --- End diff --
    
    Why is this change necessary? This smells for two reasons:
    
    - `QueryPlan` shouldn't have know about such a detail. This means something is wrong with the abstraction.
    - Why do we want to change the attribute in an ExistenceJoin? It is produced by the join, and we really should not replace it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Preventing push down of IN subquery t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75308/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108208545
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -289,6 +289,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
           case m: Map[_, _] => m
           case d: DataType => d // Avoid unpacking Structs
           case seq: Traversable[_] => seq.map(recursiveTransform)
    +      case ExistenceJoin(exists: Attribute) =>
    +        // `exists` must be an Attribute in ExistenceJoin
    +        ExistenceJoin(transformExpression(exists).asInstanceOf[Attribute])
    --- End diff --
    
    Because `ExistenceJoin` only accepts `Attribute` as its parameter type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17428: [SPARK-20094][SQL] Don't put predicate with IN su...

Posted by nsyca <gi...@git.apache.org>.

Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17428#discussion_r108196677
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala ---
    @@ -123,6 +136,31 @@ class JoinOptimizationSuite extends PlanTest {
         }
       }
     
    +  test("reorder inner joins - don't put predicate with IN subquery into join condition") {
    +    // ReorderJoin collects all predicates and try to put them into join condition when creating
    +    // ordered join. If a predicate with an IN subquery is in a join condition instead of a filter
    +    // condition, `RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the
    +    // subquery to an ExistenceJoin, and thus result in error.
    +    val x = testRelation.subquery('x)
    +    val y = testRelation1.subquery('y)
    +    val z = testRelation.subquery('z)
    +    val w = testRelation1.subquery('w)
    +    val exists: AttributeReference = AttributeReference("exists", BooleanType, nullable = false)()
    +
    +    val queryPlan = x.join(y).join(z)
    +      .where(("x.b".attr === "z.b".attr) && ("y.d".attr === "z.a".attr) &&
    +        ("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr)))))
    +
    +    val expectedPlan = x.join(z, Inner, Some("x.b".attr === "z.b".attr))
    +      .join(w, ExistenceJoin(exists), Some("z.c".attr === "w.d".attr))
    +      .where("x.a".attr > 1 || exists)
    +      .select("x.a".attr, "x.b".attr, "x.c".attr, "z.a".attr, "z.b".attr, "z.c".attr)
    +      .join(y, Inner, Some("y.d".attr === "z.a".attr))
    --- End diff --
    
    This is a side question. It's not a review comment.
    
    Is there any mechanism in this test suite that ensures that the plan Join( Join(x, z), y) is picked by the join reorder algorithm? Isn't a Join( Join(y, z), x) is also a viable plan if the join predicate (y.d = z.a) is estimated to be more selective than the predicate sets between x and z? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17428: [SPARK-20094][SQL] Don't put predicate with IN subquery ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17428
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75212/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org