You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/10/11 21:43:21 UTC

[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/22702

    [SPARK-25714] Fix Null Handling in the Optimizer rule BooleanSimplification

    ## What changes were proposed in this pull request?
    ```Scala
        val df1 = Seq(("abc", 1), (null, 2)).toDF("col1", "col2")
        df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
        val df2 = spark.read.parquet("/tmp/test1")
        df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
    ```
    
    Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release.
    
    ## How was this patch tested?
    Added test cases

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark fixBooleanSimplify2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22702
    
----
commit 5d1dde12b6cb1b3a61f32d678b952ac4ce5b6c0f
Author: gatorsmile <ga...@...>
Date:   2018-10-11T21:38:38Z

    fix

commit a9359abff62017f46f33ef18d7f56f97c885af3d
Author: gatorsmile <ga...@...>
Date:   2018-10-11T21:40:44Z

    style

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224745249
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
    +      case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
    +      case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
    +
    +      case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
    --- End diff --
    
    > when a is null, And(a, c) returns null
    
    This is not always the case, `null && false`  is `false`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    **[Test build #97328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97328/testReport)** for PR 22702 at commit [`ca3172f`](https://github.com/apache/spark/commit/ca3172f346e19dc2a6a84ae0a3855f967d129619).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3903/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    **[Test build #97288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97288/testReport)** for PR 22702 at commit [`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224950588
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,31 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      // The following optimization is applicable only when the operands are nullable,
    --- End diff --
    
    typo: `only when the operands are not nullable`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Thanks! Merged to master/2.4/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224708102
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
    +      case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
    +      case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
    +
    +      case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
    --- End diff --
    
    I see now, sorry. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224655860
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    --- End diff --
    
    Since this is complicated, shall we put a comment to explain it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224709234
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
    +      case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
    +      case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
    +
    +      case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
    --- End diff --
    
    Sorry, it is the other case where the change is not needed, right?
    `a And (b Or c)` -> `And(a, c)` when `a` is `null`, `And(a, c)` returns `null` (I got a bit confused earlier, sorry).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224658881
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    --- End diff --
    
    after more thoughts, `a And (b Or c)` should be better than `If(IsNull(a), null, And(a, c))`, as it's more likely to get pushed down to data source, so the changes here LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97328/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97288/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    **[Test build #97284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97284/testReport)** for PR 22702 at commit [`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3907/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    **[Test build #97288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97288/testReport)** for PR 22702 at commit [`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    **[Test build #97284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97284/testReport)** for PR 22702 at commit [`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224950875
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,31 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      // The following optimization is applicable only when the operands are nullable,
    --- End diff --
    
    fixed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224706383
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
    +      case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
    +      case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
    +
    +      case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
    --- End diff --
    
    the problem is when a is null, c is true


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    **[Test build #97328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97328/testReport)** for PR 22702 at commit [`ca3172f`](https://github.com/apache/spark/commit/ca3172f346e19dc2a6a84ae0a3855f967d129619).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22702


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3936/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224655771
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    --- End diff --
    
    assuming a is null, then b is also null.
    If c is null: `a And (b Or c)` -> null, And(a, c) -> null
    If c is true: `a And (b Or c)` -> null, And(a, c) -> null
    if c is false: `a And (b Or c)` -> null, And(a, c) -> false
    
    So yes this is a bug, and we should rewrite it to `If(IsNull(a), a, And(a, c))`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224704266
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
    +      case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
    +      case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
    +
    +      case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
    --- End diff --
    
    these shouldn't be a problem, since if `a` is `true`, then `a Or b` is true, regardless of `b`'s value/nullability, isn't it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22702#discussion_r224748765
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
           case a And b if a.semanticEquals(b) => a
           case a Or b if a.semanticEquals(b) => a
     
    -      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    -      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    -      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    -      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    -
    -      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    -      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    -      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    -      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
    +      case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
    +      case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
    +      case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
    +
    +      case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
    --- End diff --
    
    oh, yes you're right, this might be a problem indeed if the expression is inside a `not`. Sorry, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22702
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97284/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org