You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/10/11 21:43:21 UTC
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/22702
[SPARK-25714] Fix Null Handling in the Optimizer rule BooleanSimplification
## What changes were proposed in this pull request?
```Scala
val df1 = Seq(("abc", 1), (null, 2)).toDF("col1", "col2")
df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
val df2 = spark.read.parquet("/tmp/test1")
df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
```
Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release.
## How was this patch tested?
Added test cases
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark fixBooleanSimplify2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22702.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22702
----
commit 5d1dde12b6cb1b3a61f32d678b952ac4ce5b6c0f
Author: gatorsmile <ga...@...>
Date: 2018-10-11T21:38:38Z
fix
commit a9359abff62017f46f33ef18d7f56f97c885af3d
Author: gatorsmile <ga...@...>
Date: 2018-10-11T21:40:44Z
style
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224745249
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
+ case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
+ case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
+
+ case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
--- End diff --
> when a is null, And(a, c) returns null
This is not always the case, `null && false` is `false`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22702
**[Test build #97328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97328/testReport)** for PR 22702 at commit [`ca3172f`](https://github.com/apache/spark/commit/ca3172f346e19dc2a6a84ae0a3855f967d129619).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3903/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22702
**[Test build #97288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97288/testReport)** for PR 22702 at commit [`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22702
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224950588
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,31 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ // The following optimization is applicable only when the operands are nullable,
--- End diff --
typo: `only when the operands are not nullable`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22702
Thanks! Merged to master/2.4/2.3
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224708102
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
+ case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
+ case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
+
+ case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
--- End diff --
I see now, sorry. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224655860
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
--- End diff --
Since this is complicated, shall we put a comment to explain it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224709234
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
+ case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
+ case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
+
+ case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
--- End diff --
Sorry, it is the other case where the change is not needed, right?
`a And (b Or c)` -> `And(a, c)` when `a` is `null`, `And(a, c)` returns `null` (I got a bit confused earlier, sorry).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224658881
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
--- End diff --
after more thoughts, `a And (b Or c)` should be better than `If(IsNull(a), null, And(a, c))`, as it's more likely to get pushed down to data source, so the changes here LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97328/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97288/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22702
**[Test build #97284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97284/testReport)** for PR 22702 at commit [`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3907/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22702
**[Test build #97288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97288/testReport)** for PR 22702 at commit [`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22702
**[Test build #97284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97284/testReport)** for PR 22702 at commit [`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224950875
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,31 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ // The following optimization is applicable only when the operands are nullable,
--- End diff --
fixed
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224706383
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
+ case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
+ case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
+
+ case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
--- End diff --
the problem is when a is null, c is true
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22702
**[Test build #97328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97328/testReport)** for PR 22702 at commit [`ca3172f`](https://github.com/apache/spark/commit/ca3172f346e19dc2a6a84ae0a3855f967d129619).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22702
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3936/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224655771
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
--- End diff --
assuming a is null, then b is also null.
If c is null: `a And (b Or c)` -> null, And(a, c) -> null
If c is true: `a And (b Or c)` -> null, And(a, c) -> null
if c is false: `a And (b Or c)` -> null, And(a, c) -> false
So yes this is a bug, and we should rewrite it to `If(IsNull(a), a, And(a, c))`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224704266
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
+ case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
+ case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
+
+ case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
--- End diff --
these shouldn't be a problem, since if `a` is `true`, then `a Or b` is true, regardless of `b`'s value/nullability, isn't it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22702#discussion_r224748765
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case a And b if a.semanticEquals(b) => a
case a Or b if a.semanticEquals(b) => a
- case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
- case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
- case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
- case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
- case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
- case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
- case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
- case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => And(a, c)
+ case a And (b Or c) if !a.nullable && Not(a).semanticEquals(c) => And(a, b)
+ case (a Or b) And c if !a.nullable && a.semanticEquals(Not(c)) => And(b, c)
+ case (a Or b) And c if !b.nullable && b.semanticEquals(Not(c)) => And(a, c)
+
+ case a Or (b And c) if !a.nullable && Not(a).semanticEquals(b) => Or(a, c)
--- End diff --
oh, yes you're right, this might be a problem indeed if the expression is inside a `not`. Sorry, thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22702
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97284/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org