You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/08 00:23:57 UTC

[GitHub] [spark] kazuyukitanimura opened a new pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

kazuyukitanimura opened a new pull request #33930:
URL: https://github.com/apache/spark/pull/33930


   ### What changes were proposed in this pull request?
   This PR proposes to add more Not operator simplifications in `BooleanSimplification` by applying the following rules
     - Not(null) == null
       - e.g. IsNull(Not(...)) can be IsNull(...)
     - (Not(a) = b) == (a = Not(b))
       - e.g. Not(...) = true can be (...) = false
     - (a != b) == (a = Not(b))
       - e.g. (...) != true can be (...) = false
   
   
   ### Why are the changes needed?
   This PR simplifies SQL statements that includes Not operators.
   In addition, the following query does not push down the filter in the current implementation
   ```
   SELECT * FROM t WHERE (not boolean_col) <=> null
   ```
   although the following equivalent query pushes down the filter as expected.
   ```
   SELECT * FROM t WHERE boolean_col <=> null
   ```
   That is because the first query creates `IsNull(Not(boolean_col))` in the current implementation, which should be able to get simplified further to `IsNull(boolean_col)`
   This PR helps optimizing such cases.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added unit tests
   ```
   build/sbt "testOnly *BooleanSimplificationSuite  -- -z SPARK-36665"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917120340


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47658/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-931817676


   @cloud-fan @sunchao It would be grateful if you could take another look when you have a chance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928193939


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48165/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927118537


   **[Test build #143621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143621/testReport)** for PR 33930 at commit [`b32728e`](https://github.com/apache/spark/commit/b32728e5e12bb8104fd49b2bdbf63e660be829e8).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917408796


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143163/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706581720



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y

Review comment:
       Thanks updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915616457


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47595/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917114590


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47658/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915836342


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47612/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915664479


   **[Test build #143096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143096/testReport)** for PR 33930 at commit [`17e7bde`](https://github.com/apache/spark/commit/17e7bde5768695494bf3ba773565623c91ffc12c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918827010


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47744/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r708741878



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -797,6 +834,19 @@ object NullPropagation extends Rule[LogicalPlan] {
       // a null literal.
       case e: NullIntolerant if e.children.exists(isNullLiteral) =>
         Literal.create(null, e.dataType)
+
+      // [SPARK-36665] Unwrap inside of IsNull/IsNotNull if the inside is NullIntolerant
+      // E.g. IsNull(Not(null)) == IsNull(null)
+      // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+      // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+      // UnaryExpression only for now as applying this optimization to other expressions is too
+      // disruptive for some tests (e.g. [SPARK-32290].) TODO remove e.isInstanceOf[UnaryExpression]
+      case IsNull(e: NullIntolerant) if e.isInstanceOf[UnaryExpression] &&
+        !e.isInstanceOf[ExtractValue] && e.children.nonEmpty =>
+        e.children.map(IsNull(_): Expression).reduceLeft(Or)

Review comment:
       I was thinking about making it to future proof so that we can remove the `UnaryExpression` restriction. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915715345


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143096/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918747703


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143224/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r742352377



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated

Review comment:
       I re-organized the comment, hopefully it is more understandable now.
   In particular, this particular part of the comment **"E.g. `Cast` is `NullIntolerant`; however,  cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`"**




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r736031215



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.
+  // E.g. Cast is NullIntolerant; however, cast('Infinity' as integer) returns null.
+  // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+  // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] test cases
+  // e with multiple children requires the deterministic check because optimizing IsNull(a > b) to
+  // Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual

Review comment:
       nit: why `EqualTo` is not handled here?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -446,6 +446,53 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 }
 
 
+/**
+ * Move/Push `Not` operator if it's beneficial.

Review comment:
       nit: maybe mention this is only for the case where leaf nodes are boolean or null. The name appears to be more general.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.

Review comment:
       could you rephrase this sentence? I got confused while reading it. Perhaps "Return true iff the expression returns non-null result for all non-null inputs"?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated

Review comment:
       nit: could you write some comments explaining why not all NullIntolerant can be propagated? this can help the future readers to better understand this code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-949987313


   Hi @sunchao It would be great if you could take one more look when you get a chance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707826619



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -288,8 +288,24 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 2. Eliminates / extracts common factors.
  * 3. Merge same expressions
  * 4. Removes `Not` operator.
+ * 5. Move/Push `Not` operator if it's beneficial.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. let x == Not(y), then canSimplifyNot(x) == true because Not(x) == Not(Not(y)) == y
+  // For the case of x = EqualTo(a, b), recursively check each child expression
+  // Extra nullable check is required for EqualNullSafe because
+  // Not(EqualNullSafe(x, null)) is different from EqualNullSafe(x, Not(null))
+  private def canSimplifyNot(x: Expression): Boolean = x match {

Review comment:
       That is correct!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915664479


   **[Test build #143096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143096/testReport)** for PR 33930 at commit [`17e7bde`](https://github.com/apache/spark/commit/17e7bde5768695494bf3ba773565623c91ffc12c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917253499


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143155/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917252045


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143154/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917375997


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47667/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917375228


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47667/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916643333


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47642/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915126639


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143068/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917123091


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47658/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916668682


   **[Test build #143140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143140/testReport)** for PR 33930 at commit [`33f76fd`](https://github.com/apache/spark/commit/33f76fd5f4c150724dc219586bd9d1088737eb6f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916696166


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47644/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917408132


   **[Test build #143163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143163/testReport)** for PR 33930 at commit [`bf4b0b8`](https://github.com/apache/spark/commit/bf4b0b8b4ad9ccf3c66b90257b4627a5e60777b8).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704180322



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using Not(null) == null rules
+      case IsNull(Not(e)) => IsNull(e)
+      case IsNotNull(Not(e)) => IsNotNull(e)

Review comment:
       shall we have a more general optimization in `NullPropagation`? It seems like we can remove many expressions under `IsNull`/`IsNotNull` if they are `NullIntolerant`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915921356


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143108/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917140009


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47659/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914852515


   **[Test build #143063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143063/testReport)** for PR 33930 at commit [`d425982`](https://github.com/apache/spark/commit/d425982410ecda5b8d26eb8178f33820caefd586).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
dbtsai commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706562387



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       Do we want to add the checks like the following? This will help that we apply all the `Not` optimizations first before we hit the following roles, and also, the checks explicitly ensure the roles will converge.
   ```
         case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))
         case EqualTo(a, Not(b)) if canSimplifyNot(a) && !canSimplifyNot(b) => EqualTo(Not(a), b)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916688043


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47644/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928193939






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r742345065



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.

Review comment:
       Thanks updated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914760492


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915022505


   cc @cloud-fan @sunchao @dbtsai FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960149664






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r743038155



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.
+  // E.g. Cast is NullIntolerant; however, cast('Infinity' as integer) returns null.
+  // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+  // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] test cases
+  // e with multiple children requires the deterministic check because optimizing IsNull(a > b) to
+  // Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual

Review comment:
       I see, thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930825339


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143740/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-934007951


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48339/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915877127


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47620/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915688496






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930485880






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927112296


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930697728


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48251/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930744527


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48251/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917095182


   **[Test build #143155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143155/testReport)** for PR 33930 at commit [`e4ef486`](https://github.com/apache/spark/commit/e4ef4869b97211597aa59ee6bcf0ec1f494c5eb2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917140633






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
dbtsai commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707796357



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Push `Not` to one side of `EqualTo`/`EqualNullSafe` if it's beneficial.
+      // E.g. Not(EqualTo(x, false)) => EqualTo(x, true)
+      case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case Not(EqualTo(a, b)) if canSimplifyNot(a) => EqualTo(Not(a), b)

Review comment:
       agree, it's a NP-hard problem :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918669704


   Ping @viirya and @sunchao for another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-934010024


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48339/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928095646


   **[Test build #143652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143652/testReport)** for PR 33930 at commit [`931c906`](https://github.com/apache/spark/commit/931c9061a99e6f63bd9fc3371a9fe7ac189c423c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930485880


   **[Test build #143729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143729/testReport)** for PR 33930 at commit [`35caa42`](https://github.com/apache/spark/commit/35caa427d9a03a0ff5202fd04f0f452ebc394c5e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960135090


   **[Test build #144886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144886/testReport)** for PR 33930 at commit [`f503822`](https://github.com/apache/spark/commit/f503822a8aed5173f0f2f18c16079c9c95371869).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960279258


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960296007






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960149664


   Thank you @sunchao updated the comments for final touch-up. @cloud-fan @viirya @dongjoon-hyun if it would be wonderful if you could help us merging if it looks good to you.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960296007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r744416341



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Return true iff the expression returns non-null result for all non-null inputs.
+  // Not all `NullIntolerant` can be propagated. E.g. `Cast` is `NullIntolerant`; however,
+  // cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`.
+  // `ExtractValue` is also not supported. E.g. the planner may resolve column `a` to `a#123`,
+  // then IsNull(a#123) cannot be optimized.
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] optimization, not supported for now.
+  // If e has multiple children, the deterministic check is required because optimizing
+  // IsNull(a > b) to Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual
+      if e.deterministic => true
+    case _ => false
+  }) && e.children.nonEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+    _.containsPattern(NULL_CHECK), ruleId) {
+    case q: LogicalPlan => q.transformExpressionsDownWithPruning(
+      _.containsPattern(NULL_CHECK), ruleId) {
+      case IsNull(e: NullIntolerant) if supportedNullIntolerant(e) =>

Review comment:
       does this exclude getting the inner field? I think `IsNull(a.b)` is better than `IsNull(a)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915688496






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704940833



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Using (a =!= b) == (a === Not(b)), Not(a <=> b) == (a <=> Not(b)) rules
+      case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       The example of `EqualTo(Not(a), true)` will be further simplified to  `EqualTo(a, false)` in the next cycle. So in summary, `Not(EqualTo(a, false))` => `EqualTo(Not(a), true)` => `EqualTo(a, false)`
   So that we can completely remove `Not` (1 less operation)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915826810


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47612/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918636972


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47727/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928180430


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48165/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927100526


   **[Test build #143621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143621/testReport)** for PR 33930 at commit [`b32728e`](https://github.com/apache/spark/commit/b32728e5e12bb8104fd49b2bdbf63e660be829e8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927119134


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143621/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704940833



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Using (a =!= b) == (a === Not(b)), Not(a <=> b) == (a <=> Not(b)) rules
+      case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       For the example of `EqualTo(Not(a), true)` will be further simplified to  `EqualTo(a, false)` in the next cycle. So in summary, `Not(EqualTo(a, false))` => `EqualTo(Not(a), true)` => `EqualTo(a, false)`
   So that we can completely remove `Not`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917139985


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47659/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704181087



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using Not(null) == null rules
+      case IsNull(Not(e)) => IsNull(e)
+      case IsNotNull(Not(e)) => IsNotNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       why is `EqualTo(a, Not(b))` better than `EqualTo(Not(a), b)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914726982


   **[Test build #143063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143063/testReport)** for PR 33930 at commit [`d425982`](https://github.com/apache/spark/commit/d425982410ecda5b8d26eb8178f33820caefd586).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915609255


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47595/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915844699


   **[Test build #143116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143116/testReport)** for PR 33930 at commit [`f5d3984`](https://github.com/apache/spark/commit/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918742858


   **[Test build #143224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143224/testReport)** for PR 33930 at commit [`d5eec82`](https://github.com/apache/spark/commit/d5eec8289b627c63f5d691297ce820ce0542c569).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r708768728



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -797,6 +834,19 @@ object NullPropagation extends Rule[LogicalPlan] {
       // a null literal.
       case e: NullIntolerant if e.children.exists(isNullLiteral) =>
         Literal.create(null, e.dataType)
+
+      // [SPARK-36665] Unwrap inside of IsNull/IsNotNull if the inside is NullIntolerant
+      // E.g. IsNull(Not(null)) == IsNull(null)
+      // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+      // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+      // UnaryExpression only for now as applying this optimization to other expressions is too
+      // disruptive for some tests (e.g. [SPARK-32290].) TODO remove e.isInstanceOf[UnaryExpression]
+      case IsNull(e: NullIntolerant) if e.isInstanceOf[UnaryExpression] &&
+        !e.isInstanceOf[ExtractValue] && e.children.nonEmpty =>
+        e.children.map(IsNull(_): Expression).reduceLeft(Or)

Review comment:
       I see. Make sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704940833



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Using (a =!= b) == (a === Not(b)), Not(a <=> b) == (a <=> Not(b)) rules
+      case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       Actually equivalent expression of `Not(EqualTo(a, false))` is `EqualTo(a, true)`
   So that we can completely remove `Not` (1 less operation)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706585126



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Push `Not` to one side of `EqualTo`/`EqualNullSafe` if it's beneficial.
+      // E.g. Not(EqualTo(x, false)) => EqualTo(x, true)
+      case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case Not(EqualTo(a, b)) if canSimplifyNot(a) => EqualTo(Not(a), b)

Review comment:
       We do not know which one to simplify first is better, but at least we know that it will become simpler form than the original. In that sense, I would say we can just simplify which ever we can in the greedy manner. These are all heuristics anyway. Achieving the global optimum is a NP-hard problem...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915685516


   **[Test build #143093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143093/testReport)** for PR 33930 at commit [`941aef4`](https://github.com/apache/spark/commit/941aef4ab5be167bacaa0466faac2ff94f03a98a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-919002907


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706584087



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       We can but the extra checks are a bit redundant because all `Not` optimizations are listed above this line of the case match. Do you feel we must need the extra checks? @dbtsai 
   It is actually does not matter `a` or `b` gets simplified because `canSimplifyNot()` promises it will be simpler than current form.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917368677


   **[Test build #143163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143163/testReport)** for PR 33930 at commit [`bf4b0b8`](https://github.com/apache/spark/commit/bf4b0b8b4ad9ccf3c66b90257b4627a5e60777b8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914983432


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47573/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705902435



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##########
@@ -404,20 +404,25 @@ object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper with Pre
    * which is very time consuming because it's an O(M*N) calculation.
    * But if it's a single column case O(M*N) calculation could be optimized into O(M)
    * using hash lookup instead of loop lookup.
+   *
+   * [SPARK-36665] IsNull(EqualTo(a=b)) gets optimized to Or(IsNull(a), IsNull(b))

Review comment:
       The optimization over binary expressions are powerful enough to keep it.
   For example, `Or(IsNull(a), IsNull(b))` is always a better than `IsNull(EqualTo(a=b))` because `Or(IsNull(a), IsNull(b))` can go into the push down filter. In contrast, `IsNull(EqualTo(a=b))` cannot be pushed down.
   It is just this specific hand-optimization pattern is outlier.
   Given the benefit of optimization, small tweak like this is okay.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704940833



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Using (a =!= b) == (a === Not(b)), Not(a <=> b) == (a <=> Not(b)) rules
+      case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       The example of `EqualTo(Not(a), true)` will be further simplified to  `EqualTo(a, false)` in the next cycle. So in summary, `Not(EqualTo(a, true))` => `EqualTo(Not(a), true)` => `EqualTo(a, false)`
   So that we can completely remove `Not` (1 less operation)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917099314


   **[Test build #143156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143156/testReport)** for PR 33930 at commit [`938d824`](https://github.com/apache/spark/commit/938d8246bc4d73e1f9507846c00be4eb94e69add).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917087853


   **[Test build #143154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143154/testReport)** for PR 33930 at commit [`d4e9500`](https://github.com/apache/spark/commit/d4e9500120b5572fe5c3145abde78b6743a5e79b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917374230


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47667/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927100526


   **[Test build #143621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143621/testReport)** for PR 33930 at commit [`b32728e`](https://github.com/apache/spark/commit/b32728e5e12bb8104fd49b2bdbf63e660be829e8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927112296


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930567531


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930485880


   **[Test build #143729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143729/testReport)** for PR 33930 at commit [`35caa42`](https://github.com/apache/spark/commit/35caa427d9a03a0ff5202fd04f0f452ebc394c5e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930567531






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-934083992


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143825/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-962570066


   Thanks. Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960387690


   **[Test build #144886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144886/testReport)** for PR 33930 at commit [`f503822`](https://github.com/apache/spark/commit/f503822a8aed5173f0f2f18c16079c9c95371869).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705598219



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y
+  // For the case of x = EqualTo(a, b), recursively check each child expression
+  // Extra nullable check is required for EqualNullSafe because
+  // Not(EqualNullSafe(x, null)) is different from EqualNullSafe(x, Not(null))
+  private def canSimplifyNot(x: Expression): Boolean = x match {
+    case Literal(_, BooleanType) | Literal(_, NullType) => true
+    case _: Not | _: IsNull | _: IsNotNull | _: And | _: Or => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual => true
+    case EqualTo(a, b) if canSimplifyNot(a) || canSimplifyNot(b) => true

Review comment:
       Actually the code does not work in the way you described.
   It works like this.
   `Not(EqualTo(a, EqualTo(b, EqualTo(c, d))))` where `d` is optimizable, first `canSimplifyNot(a)` or `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` are called depending on the case matches here
   https://github.com/apache/spark/pull/33930/files/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb#diff-d43484d56a4d9991066b5c00d12ec2465c75131e055fc02ee7fb6dfd45b5006fR472-R473
   If `a` happens to be optimizable, `canSimplifyNot(a)` returns `true` and it propagates `Not` to `a`.
   Otherwise, `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` will call `canSimplifyNot(b)` and `canSimplifyNot(EqualTo(c, d))`, then that will call `canSimplifyNot(c)` and `canSimplifyNot(d)`.
   Since `d` is optimizable in this example, `canSimplifyNot(d)` returns `true` that makes `canSimplifyNot(EqualTo(c, d))` return `true`. Eventually `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` returns `true`
   In conclusion, that will match the first one in the above two cases (not the second one), i.e. https://github.com/apache/spark/pull/33930/files/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb#diff-d43484d56a4d9991066b5c00d12ec2465c75131e055fc02ee7fb6dfd45b5006fR472
   After all it will optimize `Not(EqualTo(a, EqualTo(b, EqualTo(c, d))))` => `EqualTo(a, Not(EqualTo(b, EqualTo(c, d))))` => `EqualTo(a, EqualTo(b, Not(EqualTo(c, d))))` => `EqualTo(a, EqualTo(b, EqualTo(c, Not(d))))` => `EqualTo(a, EqualTo(b, EqualTo(c, e)))` where `e` is optimized `Not(d)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917140609


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47660/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914777321


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915836342


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47612/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705122120



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y
+  // For the case of x = EqualTo(a, b), recursively check each child expression
+  // Extra nullable check is required for EqualNullSafe because
+  // Not(EqualNullSafe(x, null)) is different from EqualNullSafe(x, Not(null))
+  private def canSimplifyNot(x: Expression): Boolean = x match {
+    case Literal(_, BooleanType) | Literal(_, NullType) => true
+    case _: Not | _: IsNull | _: IsNotNull | _: And | _: Or => true

Review comment:
       why is it always beneficial to push down `Not` into `And` and `Or`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914939020


   **[Test build #143068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143068/testReport)** for PR 33930 at commit [`5600207`](https://github.com/apache/spark/commit/5600207665f0cf3b4deeaf7e4665a8eb10beb724).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916668682


   **[Test build #143140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143140/testReport)** for PR 33930 at commit [`33f76fd`](https://github.com/apache/spark/commit/33f76fd5f4c150724dc219586bd9d1088737eb6f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917408796


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143163/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914990036


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47573/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915777712


   **[Test build #143108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143108/testReport)** for PR 33930 at commit [`ffc2ac3`](https://github.com/apache/spark/commit/ffc2ac3859c39afa3259f7832ddb834d08f55377).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-934010024


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48339/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930825339


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143740/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930567531


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930647332


   **[Test build #143729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143729/testReport)** for PR 33930 at commit [`35caa42`](https://github.com/apache/spark/commit/35caa427d9a03a0ff5202fd04f0f452ebc394c5e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928095646






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915969834


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143116/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-934068281


   Thank you for the review @sunchao Addressed the issues! 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930822276


   **[Test build #143740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143740/testReport)** for PR 33930 at commit [`4ae314d`](https://github.com/apache/spark/commit/4ae314d0d9b094b6cbb7589bdaa483f25856eaff).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930744527


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48251/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930527287


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915616457


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47595/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960296007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960135090






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r744995201



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Return true iff the expression returns non-null result for all non-null inputs.
+  // Not all `NullIntolerant` can be propagated. E.g. `Cast` is `NullIntolerant`; however,
+  // cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`.
+  // `ExtractValue` is also not supported. E.g. the planner may resolve column `a` to `a#123`,
+  // then IsNull(a#123) cannot be optimized.
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] optimization, not supported for now.
+  // If e has multiple children, the deterministic check is required because optimizing
+  // IsNull(a > b) to Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual
+      if e.deterministic => true
+    case _ => false
+  }) && e.children.nonEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+    _.containsPattern(NULL_CHECK), ruleId) {
+    case q: LogicalPlan => q.transformExpressionsDownWithPruning(
+      _.containsPattern(NULL_CHECK), ruleId) {
+      case IsNull(e: NullIntolerant) if supportedNullIntolerant(e) =>

Review comment:
       Thanks. It gets into the inner field in pre-order if I understood your question correctly. Actually `supportedNullIntolerant(e)` checks whether `e` has children.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya closed pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
viirya closed pull request #33930:
URL: https://github.com/apache/spark/pull/33930


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r745277048



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Return true iff the expression returns non-null result for all non-null inputs.
+  // Not all `NullIntolerant` can be propagated. E.g. `Cast` is `NullIntolerant`; however,
+  // cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`.
+  // `ExtractValue` is also not supported. E.g. the planner may resolve column `a` to `a#123`,
+  // then IsNull(a#123) cannot be optimized.
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] optimization, not supported for now.
+  // If e has multiple children, the deterministic check is required because optimizing
+  // IsNull(a > b) to Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual
+      if e.deterministic => true
+    case _ => false
+  }) && e.children.nonEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+    _.containsPattern(NULL_CHECK), ruleId) {
+    case q: LogicalPlan => q.transformExpressionsDownWithPruning(
+      _.containsPattern(NULL_CHECK), ruleId) {
+      case IsNull(e: NullIntolerant) if supportedNullIntolerant(e) =>

Review comment:
       I see, `supportedNullIntolerant` defines an allow list, only `Not` and binary comparisons can apply this optimization.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917133920






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706359376



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##########
@@ -404,20 +404,25 @@ object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper with Pre
    * which is very time consuming because it's an O(M*N) calculation.
    * But if it's a single column case O(M*N) calculation could be optimized into O(M)
    * using hash lookup instead of loop lookup.
+   *
+   * [SPARK-36665] IsNull(EqualTo(a=b)) gets optimized to Or(IsNull(a), IsNull(b))

Review comment:
       I changed my mind. I took your advice and made it unaryExpressions only for now. For extending it to other expressions I will leave it as TODO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705891369



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules

Review comment:
       Let's simplify the comment
   ```
   // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707686460



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       Added the checks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918827010


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47744/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915715345


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143096/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918602348


   **[Test build #143224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143224/testReport)** for PR 33930 at commit [`d5eec82`](https://github.com/apache/spark/commit/d5eec8289b627c63f5d691297ce820ce0542c569).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918747703


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143224/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707686199



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.

Review comment:
       updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915549883


   **[Test build #143093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143093/testReport)** for PR 33930 at commit [`941aef4`](https://github.com/apache/spark/commit/941aef4ab5be167bacaa0466faac2ff94f03a98a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915969194


   **[Test build #143116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143116/testReport)** for PR 33930 at commit [`f5d3984`](https://github.com/apache/spark/commit/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704640769



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using Not(null) == null rules
+      case IsNull(Not(e)) => IsNull(e)
+      case IsNotNull(Not(e)) => IsNotNull(e)

Review comment:
       Thank you. Great point. I will spend some time to think trough how I can achieve that. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705891686



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Using (a =!= b) == (a === Not(b)), Not(a <=> b) == (a <=> Not(b)) rules
+      // E.g. Not(EqualTo(x, false)) => EqualTo(x, true)

Review comment:
       ditto
   ```
   // Push `Note` to one side of `EqualTo`/`EqualNullSafe` if it's beneficial.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704913622



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`

Review comment:
       then shall we simply match `EqualTo(Not(a), Not(b))` and turn it into `EqualTo(a, b)`? The code can be clearer




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r742345065



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.

Review comment:
       Thanks updated!

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated

Review comment:
       I re-organized the comment, hopefully it is more understandable now.
   In particular, this particular part of the comment **"E.g. `Cast` is `NullIntolerant`; however,  cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`"**

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated

Review comment:
       I re-organized the comment, hopefully it is more understandable now.
   In particular, this part of the comment **"E.g. `Cast` is `NullIntolerant`; however,  cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`"**

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -446,6 +446,53 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 }
 
 
+/**
+ * Move/Push `Not` operator if it's beneficial.

Review comment:
       This optimization is actually more than just the case where leaf nodes are boolean or null. So in that sense, the generic name should be okay.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.
+  // E.g. Cast is NullIntolerant; however, cast('Infinity' as integer) returns null.
+  // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+  // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] test cases
+  // e with multiple children requires the deterministic check because optimizing IsNull(a > b) to
+  // Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual

Review comment:
       This is due to earlier feedback in this review. Addressed in this part of the comment;
   **"Applying to `EqualTo` is too disruptive for [SPARK-32290] optimization, not supported for now."**

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.

Review comment:
       Thanks updated!

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated

Review comment:
       I re-organized the comment, hopefully it is more understandable now.
   In particular, this particular part of the comment **"E.g. `Cast` is `NullIntolerant`; however,  cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`"**

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated

Review comment:
       I re-organized the comment, hopefully it is more understandable now.
   In particular, this part of the comment **"E.g. `Cast` is `NullIntolerant`; however,  cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`"**

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -446,6 +446,53 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 }
 
 
+/**
+ * Move/Push `Not` operator if it's beneficial.

Review comment:
       This optimization is actually more than just the case where leaf nodes are boolean or null. So in that sense, the generic name should be okay.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.
+  // E.g. Cast is NullIntolerant; however, cast('Infinity' as integer) returns null.
+  // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+  // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] test cases
+  // e with multiple children requires the deterministic check because optimizing IsNull(a > b) to
+  // Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual

Review comment:
       This is due to earlier feedback in this review. Addressed in this part of the comment;
   **"Applying to `EqualTo` is too disruptive for [SPARK-32290] optimization, not supported for now."**




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705578779



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y
+  // For the case of x = EqualTo(a, b), recursively check each child expression
+  // Extra nullable check is required for EqualNullSafe because
+  // Not(EqualNullSafe(x, null)) is different from EqualNullSafe(x, Not(null))
+  private def canSimplifyNot(x: Expression): Boolean = x match {
+    case Literal(_, BooleanType) | Literal(_, NullType) => true
+    case _: Not | _: IsNull | _: IsNotNull | _: And | _: Or => true

Review comment:
       For `Not(And())` and `Not(Or())`, we have further optimization rule here
   https://github.com/apache/spark/blob/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L452
   So that we can keep pushing down `Not`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916624104


   **[Test build #143138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143138/testReport)** for PR 33930 at commit [`ef25595`](https://github.com/apache/spark/commit/ef255953ccb043f473c35b5fb1d756e1d6a976b9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916760828


   **[Test build #143138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143138/testReport)** for PR 33930 at commit [`ef25595`](https://github.com/apache/spark/commit/ef255953ccb043f473c35b5fb1d756e1d6a976b9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705598219



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y
+  // For the case of x = EqualTo(a, b), recursively check each child expression
+  // Extra nullable check is required for EqualNullSafe because
+  // Not(EqualNullSafe(x, null)) is different from EqualNullSafe(x, Not(null))
+  private def canSimplifyNot(x: Expression): Boolean = x match {
+    case Literal(_, BooleanType) | Literal(_, NullType) => true
+    case _: Not | _: IsNull | _: IsNotNull | _: And | _: Or => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual => true
+    case EqualTo(a, b) if canSimplifyNot(a) || canSimplifyNot(b) => true

Review comment:
       Actually the code does not work in the way you described.
   It works like this.
   `Not(EqualTo(a, EqualTo(b, EqualTo(c, d))))` where `d` is optimizable, first `canSimplifyNot(a)` or `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` are called depending on the case matches here
   https://github.com/apache/spark/blob/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L472-L473
   If `a` happens to be optimizable, `canSimplifyNot(a)` returns `true` and it propagates `Not` to `a`.
   Otherwise, `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` will call `canSimplifyNot(b)` and `canSimplifyNot(EqualTo(c, d))`, then that will call `canSimplifyNot(c)` and `canSimplifyNot(d)`.
   Since `d` is optimizable in this example, `canSimplifyNot(d)` returns `true` that makes `canSimplifyNot(EqualTo(c, d))` return `true`. Eventually `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` returns `true`
   In conclusion, that will match the first one in the above two cases (not the second one), i.e. https://github.com/apache/spark/blob/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L472
   After all it will optimize `Not(EqualTo(a, EqualTo(b, EqualTo(c, d))))` => `EqualTo(a, Not(EqualTo(b, EqualTo(c, d))))` => `EqualTo(a, EqualTo(b, Not(EqualTo(c, d))))` => `EqualTo(a, EqualTo(b, EqualTo(c, Not(d))))` => `EqualTo(a, EqualTo(b, EqualTo(c, e)))` where `e` is optimized `Not(d)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914770209


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915125031


   **[Test build #143068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143068/testReport)** for PR 33930 at commit [`5600207`](https://github.com/apache/spark/commit/5600207665f0cf3b4deeaf7e4665a8eb10beb724).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707912025



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
##########
@@ -330,4 +330,118 @@ class BooleanSimplificationSuite extends PlanTest with ExpressionEvalHelper with
       checkEvaluation(originalExpr, optimizedVal, inputRow)
     }
   }
+
+  test("SPARK-36665: Add more Not operator simplifications") {
+    // Using IsNull(e(inputs)) == IsNull(input1) or IsNull(input2) ... rules
+    checkCondition(IsNull(Not('e)), IsNull('e))
+    checkCondition(IsNotNull(Not('e)), IsNotNull('e))
+

Review comment:
       updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928359174


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143652/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930567498


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917253499


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143155/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r721796559



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,37 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.
+  // E.g. Cast is NullIntolerant; however, cast('Infinity' as integer) returns true
+  // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+  // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+  // Cannot apply to `EqualTo` as applying this optimization is too disruptive for some tests.
+  // E.g. [SPARK-32290]
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual => true

Review comment:
       Thank you. Good point. Updated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-933965932


   **[Test build #143825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143825/testReport)** for PR 33930 at commit [`b3f6864`](https://github.com/apache/spark/commit/b3f68642191f6e9dcfebc6a2c1076f9cbc076aca).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916769207


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143138/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917368677


   **[Test build #143163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143163/testReport)** for PR 33930 at commit [`bf4b0b8`](https://github.com/apache/spark/commit/bf4b0b8b4ad9ccf3c66b90257b4627a5e60777b8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704937543



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Using (a =!= b) == (a === Not(b)), Not(a <=> b) == (a <=> Not(b)) rules
+      case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       Is `EqualTo(Not(a), true)` better than `Not(EqualTo(a, false))`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r745276061



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Return true iff the expression returns non-null result for all non-null inputs.
+  // Not all `NullIntolerant` can be propagated. E.g. `Cast` is `NullIntolerant`; however,
+  // cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`.
+  // `ExtractValue` is also not supported. E.g. the planner may resolve column `a` to `a#123`,
+  // then IsNull(a#123) cannot be optimized.
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] optimization, not supported for now.
+  // If e has multiple children, the deterministic check is required because optimizing
+  // IsNull(a > b) to Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual
+      if e.deterministic => true
+    case _ => false
+  }) && e.children.nonEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+    _.containsPattern(NULL_CHECK), ruleId) {
+    case q: LogicalPlan => q.transformExpressionsDownWithPruning(
+      _.containsPattern(NULL_CHECK), ruleId) {
+      case IsNull(e: NullIntolerant) if supportedNullIntolerant(e) =>

Review comment:
       Then it's wrong, right? we turn `IsNull(a.b)` into `IsNull(a)`, which is a regression, not optimization.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927109508


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928095646


   **[Test build #143652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143652/testReport)** for PR 33930 at commit [`931c906`](https://github.com/apache/spark/commit/931c9061a99e6f63bd9fc3371a9fe7ac189c423c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
dbtsai commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706562387



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       Do we want to add the checks like the following? This will help that we apply all the `Not` optimization first before we hit the following roles, and also, the checks explicitly ensure the roles will converge.
   ```
         case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))
         case EqualTo(a, Not(b)) if canSimplifyNot(a) && !canSimplifyNot(b) => EqualTo(Not(a), b)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915969834


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143116/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915844699


   **[Test build #143116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143116/testReport)** for PR 33930 at commit [`f5d3984`](https://github.com/apache/spark/commit/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914726982


   **[Test build #143063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143063/testReport)** for PR 33930 at commit [`d425982`](https://github.com/apache/spark/commit/d425982410ecda5b8d26eb8178f33820caefd586).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930668328


   **[Test build #143740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143740/testReport)** for PR 33930 at commit [`4ae314d`](https://github.com/apache/spark/commit/4ae314d0d9b094b6cbb7589bdaa483f25856eaff).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-933986086


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48339/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927119134


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143621/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r719870007



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       I addressed this by using `q.transformExpressionsDownWithPruning()` that optimizes from top to bottom. That is helpful for the case like in the test `'e =!= ('f === ('g === Not('h)))` It will be optimized as
   ```
   'e =!= ('f === ('g === Not('h))) => 'e === ('f =!= ('g === Not('h))) => 'e === ('f === ('g =!= Not('h))) => 'e === ('f === ('g === Not(Not('h)))) => 'e === ('f === ('g === 'h)) 
   ```
   Only the last `Not(Not('h))` will need to re-visit `BooleanSimplification` one more time. Otherwise it will stay the fist `Not` propagation is done within `NotPropagation`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917099314


   **[Test build #143156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143156/testReport)** for PR 33930 at commit [`938d824`](https://github.com/apache/spark/commit/938d8246bc4d73e1f9507846c00be4eb94e69add).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914852940


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143063/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917140633


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47660/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-933965932


   **[Test build #143825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143825/testReport)** for PR 33930 at commit [`b3f6864`](https://github.com/apache/spark/commit/b3f68642191f6e9dcfebc6a2c1076f9cbc076aca).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915882424


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47620/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916639409


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47642/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928356669


   **[Test build #143652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143652/testReport)** for PR 33930 at commit [`931c906`](https://github.com/apache/spark/commit/931c9061a99e6f63bd9fc3371a9fe7ac189c423c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
dbtsai commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706562712



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+      case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a, Not(b))
+      case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) => EqualNullSafe(Not(a), b)
+
+      // Push `Not` to one side of `EqualTo`/`EqualNullSafe` if it's beneficial.
+      // E.g. Not(EqualTo(x, false)) => EqualTo(x, true)
+      case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))
+      case Not(EqualTo(a, b)) if canSimplifyNot(a) => EqualTo(Not(a), b)

Review comment:
       Let's say if both `a` and `b` can be simplified, does it mean we always try to simplify `Not(b)` first? It becomes ordering dependent, and how do we know which one to simplify first is better?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916831829


   **[Test build #143140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143140/testReport)** for PR 33930 at commit [`33f76fd`](https://github.com/apache/spark/commit/33f76fd5f4c150724dc219586bd9d1088737eb6f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917252293


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143156/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917252865


   **[Test build #143155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143155/testReport)** for PR 33930 at commit [`e4ef486`](https://github.com/apache/spark/commit/e4ef4869b97211597aa59ee6bcf0ec1f494c5eb2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705892117



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -797,6 +831,14 @@ object NullPropagation extends Rule[LogicalPlan] {
       // a null literal.
       case e: NullIntolerant if e.children.exists(isNullLiteral) =>
         Literal.create(null, e.dataType)
+
+      // Using IsNull(e(inputs)) == IsNull(input1) or IsNull(input2) ... rules
+      // E.g. IsNull(Not(null)) == IsNull(null)
+      // E.g. IsNotNull(a === b) == And(IsNotNull(a), IsNotNull(b))
+      case IsNull(e: NullIntolerant) if !e.isInstanceOf[ExtractValue] && e.children.nonEmpty =>

Review comment:
       Why is `ExtractValue` excluded?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705130603



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y
+  // For the case of x = EqualTo(a, b), recursively check each child expression
+  // Extra nullable check is required for EqualNullSafe because
+  // Not(EqualNullSafe(x, null)) is different from EqualNullSafe(x, Not(null))
+  private def canSimplifyNot(x: Expression): Boolean = x match {
+    case Literal(_, BooleanType) | Literal(_, NullType) => true
+    case _: Not | _: IsNull | _: IsNotNull | _: And | _: Or => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual => true
+    case EqualTo(a, b) if canSimplifyNot(a) || canSimplifyNot(b) => true

Review comment:
       I'm not sure we should recursive into `EqualTo`. For example:
   `Not(EqualTo(a, EqualTo(b, EqualTo(c, d))))`, if `d` is optimizable, we will turn it into `EqualTo(Not(a), EqualTo(Not(b), EqualTo(Not(c), optimized_d)))`. We need to evaluate `Not` more times which is a perf degradation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915882372


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47620/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917252293


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143156/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r708742231



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       I will think about how I can achieve this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707829006



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
##########
@@ -330,4 +330,118 @@ class BooleanSimplificationSuite extends PlanTest with ExpressionEvalHelper with
       checkEvaluation(originalExpr, optimizedVal, inputRow)
     }
   }
+
+  test("SPARK-36665: Add more Not operator simplifications") {
+    // Using IsNull(e(inputs)) == IsNull(input1) or IsNull(input2) ... rules
+    checkCondition(IsNull(Not('e)), IsNull('e))
+    checkCondition(IsNotNull(Not('e)), IsNotNull('e))
+

Review comment:
       This test doesn't show the `Or` generated. Could you add another test there are more than one child expression?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706937535



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.

Review comment:
       BTW, should we add some docs here too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-919001346


   **[Test build #143240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143240/testReport)** for PR 33930 at commit [`d177da7`](https://github.com/apache/spark/commit/d177da7cadc9e616f8005a27f5e168d58976ed81).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707843248



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
##########
@@ -330,4 +330,118 @@ class BooleanSimplificationSuite extends PlanTest with ExpressionEvalHelper with
       checkEvaluation(originalExpr, optimizedVal, inputRow)
     }
   }
+
+  test("SPARK-36665: Add more Not operator simplifications") {
+    // Using IsNull(e(inputs)) == IsNull(input1) or IsNull(input2) ... rules
+    checkCondition(IsNull(Not('e)), IsNull('e))
+    checkCondition(IsNotNull(Not('e)), IsNotNull('e))
+

Review comment:
       Ah, the comment is obsolete. Based on earlier reviews, we are not restricting that rules for UnaryExpression that does not create `Or`. I will update it. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960393882


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144886/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r742356732



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.
+  // E.g. Cast is NullIntolerant; however, cast('Infinity' as integer) returns null.
+  // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+  // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] test cases
+  // e with multiple children requires the deterministic check because optimizing IsNull(a > b) to
+  // Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual

Review comment:
       This is due to earlier feedback in this review. Addressed in this part of the comment;
   **"Applying to `EqualTo` is too disruptive for [SPARK-32290] optimization, not supported for now."**




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r742352377



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated

Review comment:
       I re-organized the comment, hopefully it is more understandable now.
   In particular, this part of the comment **"E.g. `Cast` is `NullIntolerant`; however,  cast('Infinity' as integer) is null. Hence, `Cast` is not supported `NullIntolerant`"**




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960296007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915126639


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143068/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915714979


   **[Test build #143096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143096/testReport)** for PR 33930 at commit [`17e7bde`](https://github.com/apache/spark/commit/17e7bde5768695494bf3ba773565623c91ffc12c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
dbtsai commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706561411



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y

Review comment:
       Maybe it's easier to read,
   `  // E.g. let x == Not(y), then canSimplifyNot(x) == true because Not(x) == Not(Not(y)) == y`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915683272


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47599/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-934083992


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143825/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707916583



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -797,6 +834,19 @@ object NullPropagation extends Rule[LogicalPlan] {
       // a null literal.
       case e: NullIntolerant if e.children.exists(isNullLiteral) =>
         Literal.create(null, e.dataType)
+
+      // [SPARK-36665] Unwrap inside of IsNull/IsNotNull if the inside is NullIntolerant
+      // E.g. IsNull(Not(null)) == IsNull(null)
+      // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+      // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+      // UnaryExpression only for now as applying this optimization to other expressions is too
+      // disruptive for some tests (e.g. [SPARK-32290].) TODO remove e.isInstanceOf[UnaryExpression]
+      case IsNull(e: NullIntolerant) if e.isInstanceOf[UnaryExpression] &&

Review comment:
       I wonder if we should check whether `e` is deterministic - are we allowed to skip evaluating it if it is not?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -797,6 +834,19 @@ object NullPropagation extends Rule[LogicalPlan] {
       // a null literal.
       case e: NullIntolerant if e.children.exists(isNullLiteral) =>
         Literal.create(null, e.dataType)
+
+      // [SPARK-36665] Unwrap inside of IsNull/IsNotNull if the inside is NullIntolerant
+      // E.g. IsNull(Not(null)) == IsNull(null)
+      // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+      // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+      // UnaryExpression only for now as applying this optimization to other expressions is too
+      // disruptive for some tests (e.g. [SPARK-32290].) TODO remove e.isInstanceOf[UnaryExpression]
+      case IsNull(e: NullIntolerant) if e.isInstanceOf[UnaryExpression] &&
+        !e.isInstanceOf[ExtractValue] && e.children.nonEmpty =>
+        e.children.map(IsNull(_): Expression).reduceLeft(Or)

Review comment:
       can we simplify this? shouldn't `UnaryExpression` only have one child?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       I wonder how much extra costs will this incur. Since now this transforms the original expression to `EqualTo(a, Not(b))`, we'll need to apply `BooleanSimplification` again to simplify the `Not(b)` (since it is a child of `EqualTo` and the rule runs bottom-up. Does this mean we'd need to run more iterations on the `operatorOptimizationBatch` (see `Optimizer`) until it can reach a fix point?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928359174


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143652/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r721710731



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       thanks for the update! shall we put `NotPropagation` before `BooleanSimplification` so that we don't need the extra pass?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,37 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.
+  // E.g. Cast is NullIntolerant; however, cast('Infinity' as integer) returns true
+  // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+  // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+  // Cannot apply to `EqualTo` as applying this optimization is too disruptive for some tests.
+  // E.g. [SPARK-32290]
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual => true

Review comment:
       suppose we're converting `IsNull(a > b)` to `Or(IsNull(a), IsNull(b))`, what if `b` is non-deterministic? since its evaluation could be skipped.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918804936


   **[Test build #143240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143240/testReport)** for PR 33930 at commit [`d177da7`](https://github.com/apache/spark/commit/d177da7cadc9e616f8005a27f5e168d58976ed81).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r742355019



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -446,6 +446,53 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 }
 
 
+/**
+ * Move/Push `Not` operator if it's beneficial.

Review comment:
       This optimization is actually more than just the case where leaf nodes are boolean or null. So in that sense, the generic name should be okay.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960149664


   Thank you @sunchao updated the comments for final touch-up. @cloud-fan @viirya @dongjoon-hyun if it would be wonderful if you could help us merging if it looks good to you.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960135090


   **[Test build #144886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144886/testReport)** for PR 33930 at commit [`f503822`](https://github.com/apache/spark/commit/f503822a8aed5173f0f2f18c16079c9c95371869).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960393882


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144886/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r743038155



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -813,6 +860,39 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 
+/**
+ * Unwrap the input of IsNull/IsNotNull if the input is NullIntolerant
+ * E.g. IsNull(Not(null)) == IsNull(null)
+ */
+object NullDownPropagation extends Rule[LogicalPlan] {
+  // Not all NullIntolerant can be propagated
+  // Return false if the expression may return null without non-null inputs.
+  // E.g. Cast is NullIntolerant; however, cast('Infinity' as integer) returns null.
+  // Cannot apply to `ExtractValue` as the query planner uses the trait to resolve the columns.
+  // E.g. the planner may resolve column `a` to `a#123`, then IsNull(a#123) cannot be optimized
+  // Applying to `EqualTo` is too disruptive for [SPARK-32290] test cases
+  // e with multiple children requires the deterministic check because optimizing IsNull(a > b) to
+  // Or(IsNull(a), IsNull(b)), for example, may cause skipping the evaluation of b
+  private def supportedNullIntolerant(e: NullIntolerant): Boolean = (e match {
+    case _: Not => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual

Review comment:
       I see, thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918636947


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47727/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918602348


   **[Test build #143224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143224/testReport)** for PR 33930 at commit [`d5eec82`](https://github.com/apache/spark/commit/d5eec8289b627c63f5d691297ce820ce0542c569).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915795815


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47612/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915686606


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47599/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704860599



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using Not(null) == null rules
+      case IsNull(Not(e)) => IsNull(e)
+      case IsNotNull(Not(e)) => IsNotNull(e)

Review comment:
       Updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914990036


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47573/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917140009


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47659/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917251643


   **[Test build #143156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143156/testReport)** for PR 33930 at commit [`938d824`](https://github.com/apache/spark/commit/938d8246bc4d73e1f9507846c00be4eb94e69add).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
dbtsai commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706561411



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y

Review comment:
       Nit, maybe it's easier to read,
   `  // E.g. let x == Not(y), then canSimplifyNot(x) == true because Not(x) == Not(Not(y)) == y`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914988192


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47573/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705893477



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##########
@@ -404,20 +404,25 @@ object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper with Pre
    * which is very time consuming because it's an O(M*N) calculation.
    * But if it's a single column case O(M*N) calculation could be optimized into O(M)
    * using hash lookup instead of loop lookup.
+   *
+   * [SPARK-36665] IsNull(EqualTo(a=b)) gets optimized to Or(IsNull(a), IsNull(b))

Review comment:
       I'm confused, we should only optimize `IsNull(EqualTo(a=b))` if `a` or `b` is optimizable, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705905114



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##########
@@ -404,20 +404,25 @@ object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper with Pre
    * which is very time consuming because it's an O(M*N) calculation.
    * But if it's a single column case O(M*N) calculation could be optimized into O(M)
    * using hash lookup instead of loop lookup.
+   *
+   * [SPARK-36665] IsNull(EqualTo(a=b)) gets optimized to Or(IsNull(a), IsNull(b))
+   * In addition, it is possible that Or(IsNull(a), IsNull(b)) gets optimized to IsNull(a)
+   * when a is nullable and b is not, vice versa
    */
   def unapply(join: Join): Option[ReturnType] = join match {
     case Join(left, right, LeftAnti,
-      Some(Or(e @ EqualTo(leftAttr: Expression, rightAttr: Expression),
-        IsNull(e2 @ EqualTo(_, _)))), _)
+      Some(Or(e @ EqualTo(leftAttr: Expression, rightAttr: Expression), IsNull(e2))), _)

Review comment:
       This means that `ExtractSingleColumnNullAwareAntiJoin ` may match more patterns. I'm really worried about correctness here, or we should have a more explicit way to mark null aware anti-join.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916643333


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47642/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916769207


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143138/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915921356


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143108/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705893786



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##########
@@ -404,20 +404,25 @@ object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper with Pre
    * which is very time consuming because it's an O(M*N) calculation.
    * But if it's a single column case O(M*N) calculation could be optimized into O(M)
    * using hash lookup instead of loop lookup.
+   *
+   * [SPARK-36665] IsNull(EqualTo(a=b)) gets optimized to Or(IsNull(a), IsNull(b))

Review comment:
       shall we limit the optimization to unary expressions only? then we don't need to change here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916835349


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143140/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916624104


   **[Test build #143138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143138/testReport)** for PR 33930 at commit [`ef25595`](https://github.com/apache/spark/commit/ef255953ccb043f473c35b5fb1d756e1d6a976b9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704639466



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using Not(null) == null rules
+      case IsNull(Not(e)) => IsNull(e)
+      case IsNotNull(Not(e)) => IsNotNull(e)
+
+      // Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules
+      case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       For example of the `EqualTo(Not(a), b)` where `b = Not(c)`, it will become `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
   In addition, `if canSimplifyNot(b)` checks if the optimization can converge that avoids the situation two conditions are returning to each other.
   I plan to add those explanations to the comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-917095182


   **[Test build #143155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143155/testReport)** for PR 33930 at commit [`e4ef486`](https://github.com/apache/spark/commit/e4ef4869b97211597aa59ee6bcf0ec1f494c5eb2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705904648



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -797,6 +831,14 @@ object NullPropagation extends Rule[LogicalPlan] {
       // a null literal.
       case e: NullIntolerant if e.children.exists(isNullLiteral) =>
         Literal.create(null, e.dataType)
+
+      // Using IsNull(e(inputs)) == IsNull(input1) or IsNull(input2) ... rules
+      // E.g. IsNull(Not(null)) == IsNull(null)
+      // E.g. IsNotNull(a === b) == And(IsNotNull(a), IsNotNull(b))
+      case IsNull(e: NullIntolerant) if !e.isInstanceOf[ExtractValue] && e.children.nonEmpty =>

Review comment:
       `ExtractValue` is used for the query planner to extract the field. E.g. the planner resolves column `a` to `a#123`
   We cannot optimize `IsNull(a#123)` case.
   I will add this explanation to the comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705598219



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -290,6 +290,21 @@ object OptimizeIn extends Rule[LogicalPlan] {
  * 4. Removes `Not` operator.
  */
 object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
+  // Given argument x, return true if expression Not(x) can be simplified
+  // E.g. when x == Not(y), Not(x) == Not(Not(y)) == y
+  // For the case of x = EqualTo(a, b), recursively check each child expression
+  // Extra nullable check is required for EqualNullSafe because
+  // Not(EqualNullSafe(x, null)) is different from EqualNullSafe(x, Not(null))
+  private def canSimplifyNot(x: Expression): Boolean = x match {
+    case Literal(_, BooleanType) | Literal(_, NullType) => true
+    case _: Not | _: IsNull | _: IsNotNull | _: And | _: Or => true
+    case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual => true
+    case EqualTo(a, b) if canSimplifyNot(a) || canSimplifyNot(b) => true

Review comment:
       Actually the code does not work in the way you described.
   It works like this.
   `Not(EqualTo(a, EqualTo(b, EqualTo(c, d))))` where `d` is optimizable, first `canSimplifyNot(a)` or `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` are called depending on the case matches here
   https://github.com/apache/spark/blob/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L472-L473
   If `a` happens to be optimizable, `canSimplifyNot(a)` returns `true` and it propagates `Not` to `a`. That is the second case in the above two case match. That is fine as `canSimplifyNot(a)=true` promises that `Not(a)` will be optimized for sure.
   
   Otherwise, `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` will call `canSimplifyNot(b)` and `canSimplifyNot(EqualTo(c, d))`, then that will call `canSimplifyNot(c)` and `canSimplifyNot(d)`.
   Since `d` is optimizable in this example, `canSimplifyNot(d)` returns `true` that makes `canSimplifyNot(EqualTo(c, d))` return `true`. Eventually `canSimplifyNot(EqualTo(b, EqualTo(c, d)))` returns `true`
   In conclusion, that will match the first one out of the above two cases (not the second one), i.e. https://github.com/apache/spark/blob/f5d3984c04029bdb7f4f00ef4de25a9e9fbc75cb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L472
   After all it will optimize `Not(EqualTo(a, EqualTo(b, EqualTo(c, d))))` => `EqualTo(a, Not(EqualTo(b, EqualTo(c, d))))` => `EqualTo(a, EqualTo(b, Not(EqualTo(c, d))))` => `EqualTo(a, EqualTo(b, EqualTo(c, Not(d))))` => `EqualTo(a, EqualTo(b, EqualTo(c, e)))` where `e` is optimized `Not(d)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916641589


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47642/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r705931987



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##########
@@ -404,20 +404,25 @@ object ExtractSingleColumnNullAwareAntiJoin extends JoinSelectionHelper with Pre
    * which is very time consuming because it's an O(M*N) calculation.
    * But if it's a single column case O(M*N) calculation could be optimized into O(M)
    * using hash lookup instead of loop lookup.
+   *
+   * [SPARK-36665] IsNull(EqualTo(a=b)) gets optimized to Or(IsNull(a), IsNull(b))
+   * In addition, it is possible that Or(IsNull(a), IsNull(b)) gets optimized to IsNull(a)
+   * when a is nullable and b is not, vice versa
    */
   def unapply(join: Join): Option[ReturnType] = join match {
     case Join(left, right, LeftAnti,
-      Some(Or(e @ EqualTo(leftAttr: Expression, rightAttr: Expression),
-        IsNull(e2 @ EqualTo(_, _)))), _)
+      Some(Or(e @ EqualTo(leftAttr: Expression, rightAttr: Expression), IsNull(e2))), _)

Review comment:
       I understand why you are worried. I reverted back the first case match in the next commit. I hope that makes you feel a bit more comfortable...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-914777321


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-916696166


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47644/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918632670


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47727/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928193939


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48165/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707843248



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
##########
@@ -330,4 +330,118 @@ class BooleanSimplificationSuite extends PlanTest with ExpressionEvalHelper with
       checkEvaluation(originalExpr, optimizedVal, inputRow)
     }
   }
+
+  test("SPARK-36665: Add more Not operator simplifications") {
+    // Using IsNull(e(inputs)) == IsNull(input1) or IsNull(input2) ... rules
+    checkCondition(IsNull(Not('e)), IsNull('e))
+    checkCondition(IsNotNull(Not('e)), IsNotNull('e))
+

Review comment:
       Ah, the comment is obsolete. Based on earlier reviews, we are now restricting that rules for UnaryExpression that does not create `Or`. I will update it. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707997004



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       +1, can we write a recursive method to optimize the entire expression tree with one pas?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       +1, can we write a recursive method to optimize the entire expression tree with one pass?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-919002907


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-915777712


   **[Test build #143108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143108/testReport)** for PR 33930 at commit [`ffc2ac3`](https://github.com/apache/spark/commit/ffc2ac3859c39afa3259f7832ddb834d08f55377).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-928147674


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48165/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930729857


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48251/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960242507


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960135090






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960296007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-960296007


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r721796482



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       Updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-934083263


   **[Test build #143825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143825/testReport)** for PR 33930 at commit [`b3f6864`](https://github.com/apache/spark/commit/b3f68642191f6e9dcfebc6a2c1076f9cbc076aca).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930668328


   **[Test build #143740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143740/testReport)** for PR 33930 at commit [`4ae314d`](https://github.com/apache/spark/commit/4ae314d0d9b094b6cbb7589bdaa483f25856eaff).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918804936


   **[Test build #143240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143240/testReport)** for PR 33930 at commit [`d177da7`](https://github.com/apache/spark/commit/d177da7cadc9e616f8005a27f5e168d58976ed81).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-930567531






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r719870007



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => EqualTo(a, Not(b))

Review comment:
       I addressed this by using `q.transformExpressionsDownWithPruning()` that optimizes from top to bottom. That is helpful for the case like in the test `'e =!= ('f === ('g === Not('h)))` It will be optimized as
   ```
   'e =!= ('f === ('g === Not('h))) => 'e === ('f =!= ('g === Not('h))) => 'e === ('f === ('g =!= Not('h))) => 'e === ('f === ('g === Not(Not('h)))) => 'e === ('f === ('g === 'h)) 
   ```
   Only the last `Not(Not('h))` will need to re-visit `SimplifyBinaryComparison` one more time




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918636972


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47727/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-918826977


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47744/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33930:
URL: https://github.com/apache/spark/pull/33930#issuecomment-927104688


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

Posted by GitBox <gi...@apache.org>.
dbtsai commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r704621317



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,24 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Using Not(null) == null rules
+      case IsNull(Not(e)) => IsNull(e)
+      case IsNotNull(Not(e)) => IsNotNull(e)

Review comment:
       +1 to generalize them in `NullPropagation`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org