You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "agubichev (via GitHub)" <gi...@apache.org> on 2023/09/25 23:26:04 UTC

[GitHub] [spark] agubichev opened a new pull request, #43111: Support correlated exists subqueries using DecorrelateInnerQuery framework

agubichev opened a new pull request, #43111:
URL: https://github.com/apache/spark/pull/43111

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1345923815


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala:
##########
@@ -283,6 +305,15 @@ object PullupCorrelatedPredicates extends Rule[LogicalPlan] with PredicateHelper
         } else {
           a
         }
+
+      case l @ Limit(_, _) if predicateMap.nonEmpty =>

Review Comment:
   no, we don't.
   In fact, CheckAnalysis now allows LIMIT in the correlated subqueries as we support them in lateral/scalar/ EXISTs and IN (the latter is done in this PR).
   
   This check just makes sure that the legacy path (aka PullupCorrelatedPredicates) does not allow LIMITs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1342795235


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("4.0.0")
+      .booleanConf
+      .createWithDefault(true)

Review Comment:
   some queries were not supported and are supported with this flag, that's the only result change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1342801676


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala:
##########
@@ -63,6 +64,18 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
     Join(outerPlan, dedupSubplan, joinType, condition, JoinHint(None, subHint))
   }
 
+  private def removeDomainJoins(

Review Comment:
   `rewriteDomainJoinsIfPresent`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] agubichev commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1341562525


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -1397,6 +1400,10 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
           failOnInvalidOuterReference(l)
           checkPlan(input, aggregated, canContainOuter)
 
+        case o @ Offset(_, input) =>

Review Comment:
   some tests with EXISTS had offset, so i decided to include the offset handling.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -1134,8 +1134,11 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
       isLateral: Boolean = false): Unit = {
     // Some query shapes are only supported with the DecorrelateInnerQuery framework.
     // Currently we only use this new framework for scalar and lateral subqueries.

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1343058747


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")

Review Comment:
   Looks like for IN/EXISTS to be decorrelated with DecorrelateInnerQuery you need both flags enabled, which makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1345395781


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala:
##########
@@ -283,6 +305,15 @@ object PullupCorrelatedPredicates extends Rule[LogicalPlan] with PredicateHelper
         } else {
           a
         }
+
+      case l @ Limit(_, _) if predicateMap.nonEmpty =>

Review Comment:
   don't we fail earlier in `CheckAnalysis`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1347540658


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -461,6 +462,22 @@ object DecorrelateInnerQuery extends PredicateHelper {
       p.mapChildren(rewriteDomainJoins(outerPlan, _, conditions))
   }
 
+  private def isCountBugFree(aggregateExpressions: Seq[NamedExpression]): Boolean = {

Review Comment:
   Since this is only used in the new code path, it's fine to improve it when we consolidate the code later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1342986860


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("4.0.0")
+      .booleanConf
+      .createWithDefault(true)

Review Comment:
   I think there might be some correctness fixes here for COUNT bug in IN/EXISTS too, enabled by moving those to DecorrelateInnerQuery, right? I know not every query is fixed but there are some that are fixed, such as the example query you have added to one of your tests `select * from t1 where exists (select count(*) from t2 where t2.c1 = t1.c1);`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("4.0.0")
+      .booleanConf
+      .createWithDefault(true)

Review Comment:
   I think there are some correctness fixes here for COUNT bug in IN/EXISTS too, enabled by moving those to DecorrelateInnerQuery. I know not every query is fixed but there are some that are fixed, such as the example query you have added to one of your tests `select * from t1 where exists (select count(*) from t2 where t2.c1 = t1.c1);`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1342985201


##########
sql/core/src/test/resources/sql-tests/inputs/subquery/exists-subquery/exists-count-bug.sql:
##########
@@ -0,0 +1,21 @@
+create temporary view t1(c1, c2) as values (0, 1), (1, 2);
+create temporary view t2(c1, c2) as values (0, 2), (0, 3);
+create temporary view t3(c1, c2) as values (0, 3), (1, 4), (2, 5);
+
+select * from t1 where exists (select count(*) from t2 where t2.c1 = t1.c1);

Review Comment:
   These queries were runnable before - I just checked on master. They returned wrong results due to the COUNT bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1345926575


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -461,6 +462,23 @@ object DecorrelateInnerQuery extends PredicateHelper {
       p.mapChildren(rewriteDomainJoins(outerPlan, _, conditions))
   }
 
+  private def isCountBugFree(aggregateExpressions: Seq[NamedExpression]): Boolean = {
+    // The COUNT bug only appears if an aggregate expression returns a non-NULL result on an empty
+    // input.
+    // Typical example (hence the name) is COUNT(*) that returns 0 from an empty result.
+    // However, SUM(x) IS NULL is another case that returns 0, and in general any IS/NOT IS and CASE
+    // expressions are suspect (and the combination of those).
+    // For now we conservatively accept only those expressions that are guaranteed to be safe.
+    val exprsRejectEmptyInput = aggregateExpressions.map {

Review Comment:
   For exists and IN we did not detect the count bug before, hence the incorrect results.
   For scalar subqueries, there is some quite convoluted way of detecting a count bug as a post-processing of scalar subquery. I will refactor it to use this function in the future, as it seems easier and more straightforward. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] agubichev commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1341561643


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -710,6 +711,12 @@ object DecorrelateInnerQuery extends PredicateHelper {
           case a @ Aggregate(groupingExpressions, aggregateExpressions, child) =>
             val outerReferences = collectOuterReferences(a.expressions)
             val newOuterReferences = parentOuterReferences ++ outerReferences
+            // Find all the aggregate expressions that are subject to the "COUNT bug",
+            // i.e. those that have non-None default result.
+            val countBugSusceptibleAggs = aggregateExpressions.flatMap(_.collect {
+              case a@AggregateExpression(function, _, _, _, _)
+                if function.defaultResult.nonEmpty => a

Review Comment:
   discussed it offline.
   These are scalar subqueries so outside the scope of the PR, filed https://issues.apache.org/jira/browse/SPARK-45381 to track OSS vs DBR difference in one of your examples



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1343058747


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")

Review Comment:
   Looks like for IN/EXISTS to be decorrelated with DecorrelateInnerQuery you need both flags enabled. Either way makes sense to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #43111: [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework
URL: https://github.com/apache/spark/pull/43111


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1344491645


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("4.0.0")
+      .booleanConf
+      .createWithDefault(true)

Review Comment:
   changed the flag to reflect that there is some legacy behavior. Added tests for that behavior.



##########
sql/core/src/test/resources/sql-tests/inputs/subquery/exists-subquery/exists-count-bug.sql:
##########
@@ -0,0 +1,21 @@
+create temporary view t1(c1, c2) as values (0, 1), (1, 2);
+create temporary view t2(c1, c2) as values (0, 2), (0, 3);
+create temporary view t3(c1, c2) as values (0, 3), (1, 4), (2, 5);
+
+select * from t1 where exists (select count(*) from t2 where t2.c1 = t1.c1);

Review Comment:
   Added tests for the wrong results



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1342014577


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala:
##########
@@ -63,6 +64,18 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
     Join(outerPlan, dedupSubplan, joinType, condition, JoinHint(None, subHint))
   }
 
+  private def removeDomainJoins(

Review Comment:
   nit: `maybeRewriteDomainJoins`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")

Review Comment:
   Does this flag depend on this `DECORRELATE_INNER_QUERY_ENABLED` flag?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("4.0.0")
+      .booleanConf
+      .createWithDefault(true)

Review Comment:
   Will this have any query result changes if we enable this by default?



##########
sql/core/src/test/resources/sql-tests/inputs/subquery/exists-subquery/exists-count-bug.sql:
##########
@@ -0,0 +1,21 @@
+create temporary view t1(c1, c2) as values (0, 1), (1, 2);
+create temporary view t2(c1, c2) as values (0, 2), (0, 3);
+create temporary view t3(c1, c2) as values (0, 3), (1, 4), (2, 5);
+
+select * from t1 where exists (select count(*) from t2 where t2.c1 = t1.c1);

Review Comment:
   are results before and after this PR the same for these queries?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1342801406


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")

Review Comment:
   no, those are independent. The DECORRELATE_INNER_QUERY_ENABLED flag is for scalar/lateral subqueries, and the current one is for IN/EXISTS.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1344491251


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")

Review Comment:
   ack!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1345392903


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -461,6 +462,23 @@ object DecorrelateInnerQuery extends PredicateHelper {
       p.mapChildren(rewriteDomainJoins(outerPlan, _, conditions))
   }
 
+  private def isCountBugFree(aggregateExpressions: Seq[NamedExpression]): Boolean = {
+    // The COUNT bug only appears if an aggregate expression returns a non-NULL result on an empty
+    // input.
+    // Typical example (hence the name) is COUNT(*) that returns 0 from an empty result.
+    // However, SUM(x) IS NULL is another case that returns 0, and in general any IS/NOT IS and CASE
+    // expressions are suspect (and the combination of those).
+    // For now we conservatively accept only those expressions that are guaranteed to be safe.
+    val exprsRejectEmptyInput = aggregateExpressions.map {
+      case _ : AttributeReference => true
+      case Alias(_: AttributeReference, _) => true
+      case Alias(_: Literal, _) => true
+      case Alias(a: AggregateExpression, _) if a.aggregateFunction.defaultResult == None => true
+      case _ => false
+    }
+    exprsRejectEmptyInput.forall(x => x == true)

Review Comment:
   nit:
   ```
   aggregateExpressions.forall ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1345392237


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -461,6 +462,23 @@ object DecorrelateInnerQuery extends PredicateHelper {
       p.mapChildren(rewriteDomainJoins(outerPlan, _, conditions))
   }
 
+  private def isCountBugFree(aggregateExpressions: Seq[NamedExpression]): Boolean = {
+    // The COUNT bug only appears if an aggregate expression returns a non-NULL result on an empty
+    // input.
+    // Typical example (hence the name) is COUNT(*) that returns 0 from an empty result.
+    // However, SUM(x) IS NULL is another case that returns 0, and in general any IS/NOT IS and CASE
+    // expressions are suspect (and the combination of those).
+    // For now we conservatively accept only those expressions that are guaranteed to be safe.
+    val exprsRejectEmptyInput = aggregateExpressions.map {

Review Comment:
   is this new code? how do we detect count bug before?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "andylam-db (via GitHub)" <gi...@apache.org>.
andylam-db commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1358985912


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -5272,6 +5281,9 @@ class SQLConf extends Serializable with Logging with SqlApiConf {
 
   def decorrelateInnerQueryEnabled: Boolean = getConf(SQLConf.DECORRELATE_INNER_QUERY_ENABLED)
 
+  def decorrelateInnerQueryEnabledForExistsIn: Boolean =
+    !getConf(SQLConf.DECORRELATE_EXISTS_IN_SUBQUERY_LEGACY_INCORRECT_COUNT_HANDLING_ENABLED)

Review Comment:
   Should we check whether `decorrelateInnerQueryEnabled` is true here, too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] agubichev commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1341562101


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("3.4.0")

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1343058747


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")

Review Comment:
   Looks like for IN/EXISTS to be decorrelated you need both flags enabled, which makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1345929714


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -461,6 +462,23 @@ object DecorrelateInnerQuery extends PredicateHelper {
       p.mapChildren(rewriteDomainJoins(outerPlan, _, conditions))
   }
 
+  private def isCountBugFree(aggregateExpressions: Seq[NamedExpression]): Boolean = {
+    // The COUNT bug only appears if an aggregate expression returns a non-NULL result on an empty
+    // input.
+    // Typical example (hence the name) is COUNT(*) that returns 0 from an empty result.
+    // However, SUM(x) IS NULL is another case that returns 0, and in general any IS/NOT IS and CASE
+    // expressions are suspect (and the combination of those).
+    // For now we conservatively accept only those expressions that are guaranteed to be safe.
+    val exprsRejectEmptyInput = aggregateExpressions.map {
+      case _ : AttributeReference => true
+      case Alias(_: AttributeReference, _) => true
+      case Alias(_: Literal, _) => true
+      case Alias(a: AggregateExpression, _) if a.aggregateFunction.defaultResult == None => true
+      case _ => false
+    }
+    exprsRejectEmptyInput.forall(x => x == true)

Review Comment:
   neat, thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1347469235


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -461,6 +462,22 @@ object DecorrelateInnerQuery extends PredicateHelper {
       p.mapChildren(rewriteDomainJoins(outerPlan, _, conditions))
   }
 
+  private def isCountBugFree(aggregateExpressions: Seq[NamedExpression]): Boolean = {

Review Comment:
   I think the existing way to detect the count bug is better. It evaluates the `Aggregate` operator with empty input and see if the result is null or not. It's more accurate than a static analysis.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #43111:
URL: https://github.com/apache/spark/pull/43111#issuecomment-1749051836

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1341449925


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -710,6 +711,12 @@ object DecorrelateInnerQuery extends PredicateHelper {
           case a @ Aggregate(groupingExpressions, aggregateExpressions, child) =>
             val outerReferences = collectOuterReferences(a.expressions)
             val newOuterReferences = parentOuterReferences ++ outerReferences
+            // Find all the aggregate expressions that are subject to the "COUNT bug",
+            // i.e. those that have non-None default result.
+            val countBugSusceptibleAggs = aggregateExpressions.flatMap(_.collect {
+              case a@AggregateExpression(function, _, _, _, _)
+                if function.defaultResult.nonEmpty => a

Review Comment:
   Discussed, the logic should work because it'll add count bug handling in the lower subqueries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1340545294


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -1397,6 +1400,10 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
           failOnInvalidOuterReference(l)
           checkPlan(input, aggregated, canContainOuter)
 
+        case o @ Offset(_, input) =>

Review Comment:
   How does this change relate, or is it a separate change to enable offset?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("3.4.0")

Review Comment:
   I think we're on 4.0.0 now.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -1134,8 +1134,11 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
       isLateral: Boolean = false): Unit = {
     // Some query shapes are only supported with the DecorrelateInnerQuery framework.
     // Currently we only use this new framework for scalar and lateral subqueries.

Review Comment:
   Delete this line



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -710,6 +711,12 @@ object DecorrelateInnerQuery extends PredicateHelper {
           case a @ Aggregate(groupingExpressions, aggregateExpressions, child) =>
             val outerReferences = collectOuterReferences(a.expressions)
             val newOuterReferences = parentOuterReferences ++ outerReferences
+            // Find all the aggregate expressions that are subject to the "COUNT bug",
+            // i.e. those that have non-None default result.
+            val countBugSusceptibleAggs = aggregateExpressions.flatMap(_.collect {
+              case a@AggregateExpression(function, _, _, _, _)
+                if function.defaultResult.nonEmpty => a

Review Comment:
   Just checking the function's default result doesn't work for some more complicated cases, such as where there are nested subqueries:
   ```
   select (
     select sum(cnt)
     from (select count(*) cnt from t2 where t1.c1 = t2.c1)
   ) from t1
   ```
   or:
   ```
   select (
      select sum(a) from (
        select a from t2 where t1.c1 = t2.c1 UNION ALL select 1 as a
     )
   ) from t1
   ```
   The subquery is subject to the count bug even though the sum expression at the top defaults to NULL.
   
   We have logic for this at `evalSubqueryOnZeroTups` and `evalAggExprOnZeroTups` used below.
   
   But I'm not sure how that fits into the context here - what case motivated this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1342795994


##########
sql/core/src/test/resources/sql-tests/inputs/subquery/exists-subquery/exists-count-bug.sql:
##########
@@ -0,0 +1,21 @@
+create temporary view t1(c1, c2) as values (0, 1), (1, 2);
+create temporary view t2(c1, c2) as values (0, 2), (0, 3);
+create temporary view t3(c1, c2) as values (0, 3), (1, 4), (2, 5);
+
+select * from t1 where exists (select count(*) from t2 where t2.c1 = t1.c1);

Review Comment:
   before this PR, these queries failed because aggregations were not allowed in correlated EXISTS/IN subqueries



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-36112] [SQL] Support correlated EXISTS and IN subqueries using DecorrelateInnerQuery framework [spark]

Posted by "agubichev (via GitHub)" <gi...@apache.org>.
agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1358990343


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -5272,6 +5281,9 @@ class SQLConf extends Serializable with Logging with SqlApiConf {
 
   def decorrelateInnerQueryEnabled: Boolean = getConf(SQLConf.DECORRELATE_INNER_QUERY_ENABLED)
 
+  def decorrelateInnerQueryEnabledForExistsIn: Boolean =
+    !getConf(SQLConf.DECORRELATE_EXISTS_IN_SUBQUERY_LEGACY_INCORRECT_COUNT_HANDLING_ENABLED)

Review Comment:
   the caller checks it:
   https://github.com/search?q=repo%3Aapache%2Fspark%20decorrelateInnerQueryEnabledForExistsIn&type=code
   
   (first check of the `decorrelate` function, explicit check in CheckAnalysis)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org