You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2018/08/18 22:30:52 UTC

[GitHub] spark pull request #22141: [SPARK-25154] Support NOT IN sub-queries inside n...

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/22141

    [SPARK-25154] Support NOT IN sub-queries inside nested OR conditions.

    ## What changes were proposed in this pull request?
    Currently NOT IN subqueries (predicated null aware subquery) are not allowed inside OR expressions. We currently catch this condition in checkAnalysis and throw an error.
    
    This PR enhances the subquery rewrite to support this type of queries.
    
    Query
    ```SQL
    SELECT * FROM s1 WHERE a > 5 or b NOT IN (SELECT c FROM s2);
    ```
    Optimized Plan
    ```SQL
    a: int, b: int
    Project [a#16, b#17]
    +- Filter ((a#16 > 5) || NOT b#17 IN (list#13 []))
       :  +- Project [c#18]
       :     +- SubqueryAlias `default`.`s2`
       :        +- HiveTableRelation `default`.`s2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c#18, d#19]
       +- SubqueryAlias `default`.`s1`
          +- HiveTableRelation `default`.`s1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#16, b#17]
    ```
    ## How was this patch tested?
    Added new testsin SQLQueryTestSuite, RewriteSubquerySuite and SubquerySuite.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark SPARK-25154

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22141
    
----
commit 473bfb500b07626ff42a9e5ddc167970299bde21
Author: Dilip Biswal <db...@...>
Date:   2018-08-18T21:22:37Z

    [SPARK-25154] Support NOT IN sub-queries inside nested OR conditions.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    **[Test build #94932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94932/testReport)** for PR 22141 at commit [`926c7fc`](https://github.com/apache/spark/commit/926c7fc7a766b1e310798a87ae7a485f731ade99).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    @dilipbiswal Any update?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    @maropu I think, this will be considered post 2.4. So we have some time. I am now exploring options to plan better at physical level. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2303/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Any update?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211804802
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    @liwensun Just to confirm, here is a list of queries i was going to try -
    select * from t1 where not (c1 not in (select c2 from t2))
    select * from t1 where not (c1 > 5 or c1 in (select c2 from t2))
    select * from t1 where not (c1 > 5 or c1 not in (select c2 from t2))
    select * from t1 where not (c1 > 5 and c1 in (select c2 from t2))
    select * from t1 where not (c1 > 5 and c1 not in (select c2 from t2))
    
    Does this look ok to you ? Please let me know if i missed out on any cases ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211969009
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
    --- End diff --
    
    to be able to see Not(In) first before (In) ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211802296
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    @liwensun Thank you very much. Let me try the scenarios and get back.. I may have some questions on the semantics.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2304/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211986809
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
    --- End diff --
    
    thanks @dilipbiswal :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    **[Test build #95086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95086/testReport)** for PR 22141 at commit [`844a3ff`](https://github.com/apache/spark/commit/844a3ff82a688e7398bb130a44750aec78420698).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2431/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211961738
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
    --- End diff --
    
    why did you change this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211979620
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
    --- End diff --
    
    yes, thanks, but that doesn't test when the outer values are null, right? I think it would be good to have also cases with:
     - more than 2 attributes;
     - with the outer values being null;
     - complex data types involved (eg. structs)
    
    What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    **[Test build #94932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94932/testReport)** for PR 22141 at commit [`926c7fc`](https://github.com/apache/spark/commit/926c7fc7a766b1e310798a87ae7a485f731ade99).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by liwensun <gi...@git.apache.org>.
Github user liwensun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211806238
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    Thanks for the follow up! I think these should be enough to reveal the issue if I understand it correctly. Make sure c2 has null values. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    nvm ;) we still have much time until the next release.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    **[Test build #95091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95091/testReport)** for PR 22141 at commit [`844a3ff`](https://github.com/apache/spark/commit/844a3ff82a688e7398bb130a44750aec78420698).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95091/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r215333831
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql ---
    @@ -0,0 +1,198 @@
    +-- Tests NOT-IN subqueries nested inside OR expression(s).
    --- End diff --
    
    @maropu Oh.. thank you very much for trying out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r212785260
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    --- End diff --
    
    yea, it's ok to keep the current one. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    @maropu Very sorry. I haven't had the time to come back to it. I have some stuff on my plate. I will get to this after i am done. Thanks !!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r214005837
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    --- End diff --
    
    nit: removes outermost parentheses


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211839392
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    --- End diff --
    
    @maropu The common method looks like this -
    ``` scala
     def getCommonParams(
            values: Seq[Expression],
            sub: LogicalPlan) : (AttributeReference, Seq[Expression]) = {
            (
              AttributeReference("exists", BooleanType, nullable = false)(),
              values.zip(sub.output).map(EqualTo.tupled)
            )
        }
    ```
    Not much in terms of lines saved since things don't fit in single line.. You still wanted to do it :-) ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    **[Test build #95086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95086/testReport)** for PR 22141 at commit [`844a3ff`](https://github.com/apache/spark/commit/844a3ff82a688e7398bb130a44750aec78420698).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211801302
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    --- End diff --
    
    nit: There are duplicate codes in `case (Not(InSubqyer...` and `case InSubquery...`. Can we make a simple helper method to remove these?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by dmateusp <gi...@git.apache.org>.
Github user dmateusp commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    I reproduced the issue with the following code (was a bit surprised with the behavior)
    
    The tables:
    ```scala
    scala> spark.sql("SELECT * FROM users").show
    +---+-------+
    | id|country|
    +---+-------+
    |  0|     10|
    |  1|     20|
    +---+-------+
    
    
    scala> spark.sql("SELECT * FROM countries").show
    +---+--------+
    | id|    name|
    +---+--------+
    | 10|Portugal|
    +---+--------+
    ```
    
    Without the OR:
    ```scala
    scala> spark.sql("SELECT * FROM users u WHERE u.country NOT IN (SELECT id from countries)").show
    +---+-------+
    | id|country|
    +---+-------+
    |  1|     20|
    +---+-------+
    ```
    
    With an OR and IN:
    scala> spark.sql("SELECT * FROM users u WHERE 1=0 OR u.country IN (SELECT id from countries)").show
    +---+-------+
    | id|country|
    +---+-------+
    |  0|     10|
    +---+-------+
    
    With an OR and NOT IN:
    ```scala
    scala> spark.sql("SELECT * FROM users u WHERE 1=0 OR u.country NOT IN (SELECT id from countries)").show
    org.apache.spark.sql.AnalysisException: Null-aware predicate sub-queries cannot be used in nested conditions: ((1 = 0) || NOT country#9 IN (list#62 []));;
    ```
    
    +1 to get that fixed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r213988466
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql ---
    @@ -0,0 +1,198 @@
    +-- Tests NOT-IN subqueries nested inside OR expression(s).
    --- End diff --
    
    https://github.com/apache/spark/pull/22141/files#r211835129
    To make sure all the results are correct, I also checked that postgresql had the same output by the queries below. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94932/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    **[Test build #94930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94930/testReport)** for PR 22141 at commit [`473bfb5`](https://github.com/apache/spark/commit/473bfb500b07626ff42a9e5ddc167970299bde21).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2426/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    **[Test build #95091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95091/testReport)** for PR 22141 at commit [`844a3ff`](https://github.com/apache/spark/commit/844a3ff82a688e7398bb130a44750aec78420698).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211788303
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    This sounds a reasonable fix. cc @hvanhovell @liwensun 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211864482
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    shall we add this only when the condition is nullable?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by liwensun <gi...@git.apache.org>.
Github user liwensun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211798295
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    Thanks for working on this! 
    
    But I'm not sure if this can handle the expression like this correctly:
    ```Not(And/Or(InSubquery, otherExpressiions*))```
    or this
    ```Not(Not(InSubquery))``` 
    
    Based on my understanding I think fundamentally what we want is probably to change the handling for the InSubquery case here by making the ExistenceJoin null aware somehow instead of adding another `Not(InSubquery(..))` case, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95086/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211835129
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    @liwensun I tried all the five queries and they work fine. I verified the results with another database just to make sure. I briefly looked at the plan and they look ok to me.
    Also i have added all the five tests in my last commit. Please take a look and let me know if anything amiss. Thanks a lot.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    @dilipbiswal Can we resume this for v3.0?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211961021
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    makes sense, thanks for your answer @dilipbiswal 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r214007983
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
    @@ -52,4 +52,21 @@ class RewriteSubquerySuite extends PlanTest {
         comparePlans(optimized, correctAnswer)
       }
     
    +  test("NOT-IN subquery nested inside OR") {
    +    val relation1 = LocalRelation('a.int, 'b.int)
    +    val relation2 = LocalRelation('c.int, 'd.int)
    +    val exists = 'exists.boolean.notNull
    +
    +    val query = relation1.where('b === 1 || Not('a.in(ListQuery(relation2.select('c))))).select('a)
    +
    +    val plan = relation1.select('a).where('b === 1 || Not('exists))
    +    val correctAnswer = relation1
    +      .join(relation2.select('c), ExistenceJoin(exists), Some('a === 'c || IsNull('a === 'c)))
    +      .where('b === 1 || Not(exists))
    +      .select('a)
    +      .analyze
    +    val optimized = Optimize.execute(query.analyze)
    +
    --- End diff --
    
    The end-to-end tests for conflicting attr case already exist though, IMO it'd be better to add deduplication tests here, too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211955605
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable = false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable = false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    @mgaido91 Thanks !! Actually i have been thinking about it for last few days :-). We probably need a new optimizer rule that simplifies the join conditions based on its child's constraints. So we should be able to simplify -
    
    ``` SQL
    select * from t1 join t2 on (t1c1 = t2c1 OR isnull(t1c1 = t2c1) where t1c1 is not null and t2c1 is not null
    ```
    to
    ```SQL
    select * from t1 join t2 on (t1c1 = t2c1) where  t1c1 is not null and t2c1 is not null
    ````
    I wanted to handle it as a follow-up. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94930/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211985537
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
    --- End diff --
    
    @mgaido91 How can i say "no" to more testing :-) ? I will add the tests. Thanks !!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22141
  
    **[Test build #94930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94930/testReport)** for PR 22141 at commit [`473bfb5`](https://github.com/apache/spark/commit/473bfb500b07626ff42a9e5ddc167970299bde21).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211971929
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
    --- End diff --
    
    @mgaido91 
    >> I don't see any test (please correct me if I am wrong) where multiple attributes are used as output of the subquery. Can we add and compare with other RDBMS? Thanks.
    
    In [here](https://github.com/apache/spark/blob/844a3ff82a688e7398bb130a44750aec78420698/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql#L113-L134) ? Is this what you meant ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org