You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "wangyum (via GitHub)" <gi...@apache.org> on 2023/07/24 14:35:13 UTC

[GitHub] [spark] wangyum opened a new pull request, #42129: [SPARK-44527][SQL] Simplify predicate if its children contain ScalarSubquery with empty output

wangyum opened a new pull request, #42129:
URL: https://github.com/apache/spark/pull/42129

   ### What changes were proposed in this pull request?
   
   This PR enhances `SimplifyBinaryComparison` to simplify predicate if its children contain `ScalarSubquery` with empty output.
   
   ### Why are the changes needed?
   
   Simplify expression to improve query performance.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #42129: [SPARK-44527][SQL] Simplify predicate if its children contain ScalarSubquery with empty output

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #42129:
URL: https://github.com/apache/spark/pull/42129#discussion_r1273383459


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala:
##########
@@ -48,10 +48,7 @@ abstract class PropagateEmptyRelationBase extends Rule[LogicalPlan] with CastSup
   // This tag is used to mark a repartition as a root repartition which is user-specified
   private[sql] val ROOT_REPARTITION = TreeNodeTag[Unit]("ROOT_REPARTITION")
 
-  protected def isEmpty(plan: LogicalPlan): Boolean = plan match {
-    case p: LocalRelation => p.data.isEmpty
-    case _ => false
-  }
+  protected def isEmpty(plan: LogicalPlan): Boolean = SimplifyBinaryComparison.isEmpty(plan)

Review Comment:
   References `SimplifyBinaryComparison` here looks a little strange.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44527][SQL] Replace ScalarSubquery with null if its maxRows is 0 [spark]

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #42129:
URL: https://github.com/apache/spark/pull/42129#discussion_r1359102724


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala:
##########
@@ -88,6 +88,10 @@ object ConstantFolding extends Rule[LogicalPlan] {
           e
       }
 
+    // Replace ScalarSubquery with null if its maxRows is 0
+    case s: ScalarSubquery if s.plan.maxRows.contains(0) =>

Review Comment:
   @jchen5 Do we need to consider `mayHaveCountBug`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #42129: [SPARK-44527][SQL] Replace ScalarSubquery with null if its maxRows is 0

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #42129:
URL: https://github.com/apache/spark/pull/42129#discussion_r1275885210


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala:
##########
@@ -529,6 +530,15 @@ object SimplifyBinaryComparison
         case TrueLiteral EqualNullSafe b if !b.nullable => b
         case a EqualNullSafe FalseLiteral if !a.nullable => Not(a)
         case FalseLiteral EqualNullSafe b if !b.nullable => Not(b)
+
+        case EqualNullSafe(a: ScalarSubquery, b: ScalarSubquery)
+            if a.plan.maxRows.contains(0) && b.plan.maxRows.contains(0) =>
+          TrueLiteral
+        case e: EqualNullSafe => e
+        case BinaryComparison(a: ScalarSubquery, _) if a.plan.maxRows.contains(0) =>
+          Literal(null, BooleanType)

Review Comment:
   Should move it to `ConstantFolding `.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44527][SQL] Replace ScalarSubquery with null if its maxRows is 0 [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #42129: [SPARK-44527][SQL] Replace ScalarSubquery with null if its maxRows is 0
URL: https://github.com/apache/spark/pull/42129


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #42129: [SPARK-44527][SQL] Replace ScalarSubquery with null if its maxRows is 0

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #42129:
URL: https://github.com/apache/spark/pull/42129#discussion_r1276155768


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala:
##########
@@ -88,6 +88,10 @@ object ConstantFolding extends Rule[LogicalPlan] {
           e
       }
 
+    // Replace ScalarSubquery with null if its maxRows is 0
+    case s: ScalarSubquery if s.plan.maxRows.contains(0) =>
+      Literal(null, s.dataType)

Review Comment:
   +1. Move the case here, looks more clear. @wangyum Thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #42129: [SPARK-44527][SQL] Simplify predicate if its children contain ScalarSubquery with empty output

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #42129:
URL: https://github.com/apache/spark/pull/42129#discussion_r1274375153


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala:
##########
@@ -529,6 +530,15 @@ object SimplifyBinaryComparison
         case TrueLiteral EqualNullSafe b if !b.nullable => b
         case a EqualNullSafe FalseLiteral if !a.nullable => Not(a)
         case FalseLiteral EqualNullSafe b if !b.nullable => Not(b)
+
+        case EqualNullSafe(a: ScalarSubquery, b: ScalarSubquery)
+            if a.plan.maxRows.contains(0) && b.plan.maxRows.contains(0) =>
+          TrueLiteral
+        case e: EqualNullSafe => e
+        case BinaryComparison(a: ScalarSubquery, _) if a.plan.maxRows.contains(0) =>
+          Literal(null, BooleanType)

Review Comment:
   Although all the current sub class of `BinaryComparison` are also with NullIntolerant.
   In order to doesn't introduce unexpected bug, shall we check `isInstanceOf[NullIntolerant]`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #42129: [SPARK-44527][SQL] Simplify predicate if its children contain ScalarSubquery with empty output

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #42129:
URL: https://github.com/apache/spark/pull/42129#discussion_r1273383459


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala:
##########
@@ -48,10 +48,7 @@ abstract class PropagateEmptyRelationBase extends Rule[LogicalPlan] with CastSup
   // This tag is used to mark a repartition as a root repartition which is user-specified
   private[sql] val ROOT_REPARTITION = TreeNodeTag[Unit]("ROOT_REPARTITION")
 
-  protected def isEmpty(plan: LogicalPlan): Boolean = plan match {
-    case p: LocalRelation => p.data.isEmpty
-    case _ => false
-  }
+  protected def isEmpty(plan: LogicalPlan): Boolean = SimplifyBinaryComparison.isEmpty(plan)

Review Comment:
   Reference SimplifyBinaryComparison here looks a little strange.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org