You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "jchen5 (via GitHub)" <gi...@apache.org> on 2023/09/28 19:05:47 UTC
[GitHub] [spark] jchen5 commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1340545294


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -1397,6 +1400,10 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
           failOnInvalidOuterReference(l)
           checkPlan(input, aggregated, canContainOuter)
 
+        case o @ Offset(_, input) =>

Review Comment:
   How does this change relate, or is it a separate change to enable offset?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("3.4.0")

Review Comment:
   I think we're on 4.0.0 now.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -1134,8 +1134,11 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
       isLateral: Boolean = false): Unit = {
     // Some query shapes are only supported with the DecorrelateInnerQuery framework.
     // Currently we only use this new framework for scalar and lateral subqueries.

Review Comment:
   Delete this line



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -710,6 +711,12 @@ object DecorrelateInnerQuery extends PredicateHelper {
           case a @ Aggregate(groupingExpressions, aggregateExpressions, child) =>
             val outerReferences = collectOuterReferences(a.expressions)
             val newOuterReferences = parentOuterReferences ++ outerReferences
+            // Find all the aggregate expressions that are subject to the "COUNT bug",
+            // i.e. those that have non-None default result.
+            val countBugSusceptibleAggs = aggregateExpressions.flatMap(_.collect {
+              case a@AggregateExpression(function, _, _, _, _)
+                if function.defaultResult.nonEmpty => a

Review Comment:
   Just checking the function's default result doesn't work for some more complicated cases, such as where there are nested subqueries:
   ```
   select (
     select sum(cnt)
     from (select count(*) cnt from t2 where t1.c1 = t2.c1)
   ) from t1
   ```
   or:
   ```
   select (
      select sum(a) from (
        select a from t2 where t1.c1 = t2.c1 UNION ALL select 1 as a
     )
   ) from t1
   ```
   The subquery is subject to the count bug even though the sum expression at the top defaults to NULL.
   
   We have logic for this at `evalSubqueryOnZeroTups` and `evalAggExprOnZeroTups` used below.
   
   But I'm not sure how that fits into the context here - what case motivated this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org