You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "wangyum (via GitHub)" <gi...@apache.org> on 2023/04/11 14:04:06 UTC

[GitHub] [spark] wangyum opened a new pull request, #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

wangyum opened a new pull request, #40742:
URL: https://github.com/apache/spark/pull/40742

   ### What changes were proposed in this pull request?
   
   This PR filters out `EqualNullSafe` generated by `UnaryNode.getAllValidConstraints` in `InferFiltersFromConstraints`.
   
   ### Why are the changes needed?
   Avoid Once strategy's idempotence is broken for batch: `Infer Filters`:
   ```sql
   export SPARK_TESTING=1
   
   CREATE TABLE t1 (i INT, j INT, k STRING) USING parquet;
   CREATE TABLE t2 (i INT, j INT, k STRING) USING parquet;
   CREATE TABLE t3 (i INT, j INT, k STRING) USING parquet;
   
   SELECT *
   FROM   (SELECT t1.i, t1.i as t1i
           FROM t1 JOIN t3 ON t1.i = t3.i) t
          JOIN t2 ON t.i = t2.i;
   ```
   
   Before this PR:
   ```
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
    Join Inner, (i#72 = i#78)                                               Join Inner, (i#72 = i#78)
   !:- Project [i#72, i#72 AS t1i#71]                                       :- Filter ((i#72 <=> i#72) AND (t1i#71 <=> t1i#71))
   !:  +- Join Inner, (i#72 = i#75)                                         :  +- Project [i#72, i#72 AS t1i#71]
   !:     :- Project [i#72]                                                 :     +- Join Inner, (i#72 = i#75)
   !:     :  +- Relation spark_catalog.default.t1[i#72,j#73,k#74] parquet   :        :- Filter isnotnull(i#72)
   !:     +- Project [i#75]                                                 :        :  +- Project [i#72]
   !:        +- Relation spark_catalog.default.t3[i#75,j#76,k#77] parquet   :        :     +- Relation spark_catalog.default.t1[i#72,j#73,k#74] parquet
   !+- Relation spark_catalog.default.t2[i#78,j#79,k#80] parquet            :        +- Filter isnotnull(i#75)
   !                                                                        :           +- Project [i#75]
   !                                                                        :              +- Relation spark_catalog.default.t3[i#75,j#76,k#77] parquet
   !                                                                        +- Filter isnotnull(i#78)
   !                                                                           +- Relation spark_catalog.default.t2[i#78,j#79,k#80] parquet
   
   
   org.apache.spark.SparkRuntimeException: Once strategy's idempotence is broken for batch Infer Filters
    Join Inner, (i#72 = i#78)                                                     Join Inner, (i#72 = i#78)
    :- Filter ((i#72 <=> i#72) AND (t1i#71 <=> t1i#71))                           :- Filter ((i#72 <=> i#72) AND (t1i#71 <=> t1i#71))
    :  +- Project [i#72, i#72 AS t1i#71]                                          :  +- Project [i#72, i#72 AS t1i#71]
    :     +- Join Inner, (i#72 = i#75)                                            :     +- Join Inner, (i#72 = i#75)
    :        :- Filter isnotnull(i#72)                                            :        :- Filter isnotnull(i#72)
    :        :  +- Project [i#72]                                                 :        :  +- Project [i#72]
    :        :     +- Relation spark_catalog.default.t1[i#72,j#73,k#74] parquet   :        :     +- Relation spark_catalog.default.t1[i#72,j#73,k#74] parquet
    :        +- Filter isnotnull(i#75)                                            :        +- Filter isnotnull(i#75)
    :           +- Project [i#75]                                                 :           +- Project [i#75]
    :              +- Relation spark_catalog.default.t3[i#75,j#76,k#77] parquet   :              +- Relation spark_catalog.default.t3[i#75,j#76,k#77] parquet
   !+- Filter isnotnull(i#78)                                                     +- Filter (i#78 <=> i#78)
   !   +- Relation spark_catalog.default.t2[i#78,j#79,k#80] parquet                  +- Filter isnotnull(i#78)
   !                                                                                    +- Relation spark_catalog.default.t2[i#78,j#79,k#80] parquet.
   ```
   
   After this PR:
   ```
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
    Join Inner, (i#72 = i#78)                                               Join Inner, (i#72 = i#78)
    :- Project [i#72, i#72 AS t1i#71]                                       :- Project [i#72, i#72 AS t1i#71]
    :  +- Join Inner, (i#72 = i#75)                                         :  +- Join Inner, (i#72 = i#75)
   !:     :- Project [i#72]                                                 :     :- Filter isnotnull(i#72)
   !:     :  +- Relation spark_catalog.default.t1[i#72,j#73,k#74] parquet   :     :  +- Project [i#72]
   !:     +- Project [i#75]                                                 :     :     +- Relation spark_catalog.default.t1[i#72,j#73,k#74] parquet
   !:        +- Relation spark_catalog.default.t3[i#75,j#76,k#77] parquet   :     +- Filter isnotnull(i#75)
   !+- Relation spark_catalog.default.t2[i#78,j#79,k#80] parquet            :        +- Project [i#75]
   !                                                                        :           +- Relation spark_catalog.default.t3[i#75,j#76,k#77] parquet
   !                                                                        +- Filter isnotnull(i#78)
   !                                                                           +- Relation spark_catalog.default.t2[i#78,j#79,k#80] parquet
    
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum closed pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum closed pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`
URL: https://github.com/apache/spark/pull/40742


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165099660


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -1430,6 +1430,10 @@ object InferFiltersFromConstraints extends Rule[LogicalPlan]
       .union(constructIsNotNullConstraints(constraints, plan.output))
       .filter { c =>
         c.references.nonEmpty && c.references.subsetOf(plan.outputSet) && c.deterministic
+      }.filterNot {
+        // Avoid once strategy idempotence is broken.
+        case a EqualNullSafe b => a.semanticEquals(b)

Review Comment:
   Yes. Moved it to `ConstraintHelper.inferAdditionalConstraints`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165107619


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -75,7 +75,10 @@ trait ConstraintHelper {
         inferredConstraints ++= replaceConstraints(predicates - eq, l, r)
       case _ => // No inference
     }
-    inferredConstraints -- constraints
+    (inferredConstraints -- constraints).filterNot {

Review Comment:
   shall we update the pattern match before to add `if !l.semanticEquals(r)`?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -75,7 +75,10 @@ trait ConstraintHelper {
         inferredConstraints ++= replaceConstraints(predicates - eq, l, r)
       case _ => // No inference
     }
-    inferredConstraints -- constraints
+    (inferredConstraints -- constraints).filterNot {

Review Comment:
   shall we update the pattern match above to add `if !l.semanticEquals(r)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1166320395


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,15 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>
-        val candidateConstraints = predicates - eq
+        // Also remove EqualNullSafe with the same l and r to avoid Once strategy's idempotence
+        // is broken.

Review Comment:
   let's mention `l === r and l <=> r` can infer `l <=> l and r <=> r` which is useless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1166314061


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>

Review Comment:
   Yes. It seems ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1163509684


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -1430,6 +1430,10 @@ object InferFiltersFromConstraints extends Rule[LogicalPlan]
       .union(constructIsNotNullConstraints(constraints, plan.output))
       .filter { c =>
         c.references.nonEmpty && c.references.subsetOf(plan.outputSet) && c.deterministic
+      }.filterNot {
+        // Avoid once strategy idempotence is broken.
+        case a EqualNullSafe b => a.semanticEquals(b)

Review Comment:
   can we fix the place that generates this useless predicate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165246208


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>
-        val candidateConstraints = predicates - eq
+        val candidateConstraints = predicates - eq - EqualNullSafe(l, r)

Review Comment:
   Yes. But the current situation is like this: `l <=> l and r <=> r` is inferred from `l === r and l <=> r`. At this time, `l.semanticEquals(r)` is false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165246208


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>
-        val candidateConstraints = predicates - eq
+        val candidateConstraints = predicates - eq - EqualNullSafe(l, r)

Review Comment:
   Yes. But the current situation is like this: `a <=> a and b <=> b` is inferred from `a === b and a <=> b`. At this time, l.semanticEquals(r) is false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on PR #40742:
URL: https://github.com/apache/spark/pull/40742#issuecomment-1504278565

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1166312550


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>
-        val candidateConstraints = predicates - eq
+        val candidateConstraints = predicates - eq - EqualNullSafe(l, r)

Review Comment:
   For example:
   ```
   Join Inner, (a#0 = a#7)
   :- Project [a#0, a#0 AS xa#3]
   :  +- Join Inner, (a#0 = a#4)
   :     :- LocalRelation <empty>, [a#0, b#1, c#2]
   :     +- LocalRelation <empty>, [a#4, b#5, c#6]
   +- LocalRelation <empty>, [a#7, b#8, c#9]
   ```
   After `InferFiltersFromConstraints`:
   ```
   Join Inner, (a#0 = a#7)
   :- Filter ((a#0 <=> a#0) AND (xa#3 <=> xa#3))
   :  +- Project [a#0, a#0 AS xa#3]
   :     +- Join Inner, (a#0 = a#4)
   :        :- Filter isnotnull(a#0)
   :        :  +- LocalRelation <empty>, [a#0, b#1, c#2]
   :        +- Filter isnotnull(a#4)
   :           +- LocalRelation <empty>, [a#4, b#5, c#6]
   +- Filter isnotnull(a#7)
      +- LocalRelation <empty>, [a#7, b#8, c#9]
   ```
   
   And run `InferFiltersFromConstraints` again. The left side constraints:
   ```
   (a#0 <=> xa#3)
   (xa#3 = a#0)
   (a#0 <=> a#0)
   (xa#3 <=> xa#3)
   ```
   The right side constraints:
   ```
   (isnotnull(a#7))
   ```
   Join condition is:
   ```
   (a#0 = a#7)
   ```
   
   Based on these constraints, the inferred result is:
   ```
   (a#7 <=> xa#3)
   (xa#3 = a#7)
   (a#7 <=> a#7)
   ```
   
   `(a#7 <=> a#7)` is a valid constraints for right side. The result:
   
   ```
   Join Inner, (a#0 = a#7)
   :- Filter ((a#0 <=> a#0) AND (xa#3 <=> xa#3))
   :  +- Project [a#0, a#0 AS xa#3]
   :     +- Join Inner, (a#0 = a#4)
   :        :- Filter isnotnull(a#0)
   :        :  +- LocalRelation <empty>, [a#0, b#1, c#2]
   :        +- Filter isnotnull(a#4)
   :           +- LocalRelation <empty>, [a#4, b#5, c#6]
   +- Filter (a#7 <=> a#7)
      +- Filter isnotnull(a#7)
         +- LocalRelation <empty>, [a#7, b#8, c#9]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165106532


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -75,7 +75,10 @@ trait ConstraintHelper {
         inferredConstraints ++= replaceConstraints(predicates - eq, l, r)
       case _ => // No inference
     }
-    inferredConstraints -- constraints
+    (inferredConstraints -- constraints).filterNot {
+      case a EqualNullSafe b => a.semanticEquals(b)

Review Comment:
   Before this PR. the generated `EqualNullSafe` will be optimized by [`SimplifyBinaryComparison`](https://github.com/apache/spark/blob/2de0d45887509fac8d5fc9448764a0e71f618797/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L508).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165437332


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>

Review Comment:
   not related to this PR, but shall we match `Equality` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165219720


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>
-        val candidateConstraints = predicates - eq
+        val candidateConstraints = predicates - eq - EqualNullSafe(l, r)

Review Comment:
   if `l.semanticEquals(r)`, we don't need to infer predicates at all, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on PR #40742:
URL: https://github.com/apache/spark/pull/40742#issuecomment-1509447313

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165439256


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>
-        val candidateConstraints = predicates - eq
+        val candidateConstraints = predicates - eq - EqualNullSafe(l, r)

Review Comment:
   can you explain it a bit more about why it makes the optimizer not idempot?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>
-        val candidateConstraints = predicates - eq
+        val candidateConstraints = predicates - eq - EqualNullSafe(l, r)

Review Comment:
   can you explain it a bit more about why it makes the optimizer not idempotent?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40742: [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40742:
URL: https://github.com/apache/spark/pull/40742#discussion_r1165437812


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala:
##########
@@ -66,13 +66,13 @@ trait ConstraintHelper {
     val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
     predicates.foreach {
       case eq @ EqualTo(l: Attribute, r: Attribute) =>
-        val candidateConstraints = predicates - eq
+        val candidateConstraints = predicates - eq - EqualNullSafe(l, r)

Review Comment:
   ah got it. Let's add a code comment to explain it with this example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org