You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/29 23:45:33 UTC

[GitHub] [spark] dtenedor opened a new pull request, #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

dtenedor opened a new pull request, #38052:
URL: https://github.com/apache/spark/pull/38052

   ### What changes were proposed in this pull request?
   
   There is a bug in the `MergeScalarSubqueries` rule for queries with subquery expressions nested inside each other, wherein the rule attempts to merge the nested subquery with its enclosing parent subquery. The result is not a valid plan and raises an exception in the optimizer. Here is a minimal reproducing case:
   
   ```
   sql("create table test(col int) using csv")
   checkAnswer(sql("select(select sum((select sum(col) from test)) from test)"), Row(null))
   ```
   
   To fix, we disable the optimization for subqueries with nested subqueries inside them for now.
   
   ### Why are the changes needed?
   
   This fixes a bug.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Updated existing unit tests and added the reproducing case as a new test case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
peter-toth commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984342208


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -534,20 +534,10 @@ class MergeScalarSubqueriesSuite extends PlanTest {
 
   test("Merging subqueries from different places") {
     val subquery1 = ScalarSubquery(testRelation.select(('a + 1).as("a_plus1")))
-    val subquery2 = ScalarSubquery(testRelation.select(('a + 2).as("a_plus2")))

Review Comment:
   Hi @dtenedor, thanks for reporting this bug and pinging me.
   
   I think the test is a valid usecase so I'm thinking about if we could come up with a better fix. I think the root cause of the issue is in `tryMergePlans()`. When we try merging `Project`s or `Aggregate`s and don't take into accout that `np` can contain `ScalarSubqueryReference`. So probably filtering out `ScalarSubqueryReference`s there should fix the bug but keep this usecase working. Also, I think `EXISTS_SUBQUERY`, `IN_SUBQUERY` should not matter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
dtenedor commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984829104


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -560,19 +550,20 @@ class MergeScalarSubqueriesSuite extends PlanTest {
           Literal("a_plus2"), 'a_plus2,
           Literal("b"), 'b
         )).as("mergedValue"))
-    val analyzedMergedSubquery = mergedSubquery.analyze
-    val correctAnswer = WithCTE(
-      testRelation
-        .select(
-          extractorExpression(0, analyzedMergedSubquery.output, 0),
-          extractorExpression(0, analyzedMergedSubquery.output, 1),
-          extractorExpression(0, analyzedMergedSubquery.output, 2))
-        .where(
-          extractorExpression(0, analyzedMergedSubquery.output, 0) +
-          extractorExpression(0, analyzedMergedSubquery.output, 1) +
-          extractorExpression(0, analyzedMergedSubquery.output, 2) === 0),
-      Seq(definitionNode(analyzedMergedSubquery, 0)))
-
-    comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer.analyze)
+    val correctAnswer =

Review Comment:
   This unit test is reverted now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
peter-toth commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r985078417


##########
sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala:
##########
@@ -2157,7 +2157,7 @@ class SubquerySuite extends QueryTest
     }
   }
 
-  test("Merge non-correlated scalar subqueries from different parent plans") {

Review Comment:
   Unfortunately, this is also a valid test and so it shouldn't be altered. The 2 leaf level subqueries are independent and can be merted to compute once. Similarly the 2 parents can be also merged. Admittedly, this test case is not the best, I should have used a different tables than `testData` in the parent queries to make it more simple.
   
   But I don't want to block this fix so if we need it urgently I'm ok with mergint this PR but probably we can come up with a better alternative.



##########
sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala:
##########
@@ -2157,7 +2157,7 @@ class SubquerySuite extends QueryTest
     }
   }
 
-  test("Merge non-correlated scalar subqueries from different parent plans") {

Review Comment:
   I think the issue is that currently we allow merging a subquery to another one that is needed to compute the subquery itself. So before merging we should collect the (transitive) subquery references and don't try merging the new plan to those.
   @dtenedor, let me know if you want to update this PR or I can open one with the fix tomorrow.



##########
sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala:
##########
@@ -2157,7 +2157,7 @@ class SubquerySuite extends QueryTest
     }
   }
 
-  test("Merge non-correlated scalar subqueries from different parent plans") {

Review Comment:
   Unfortunately, this is also a valid test and so it shouldn't be altered. The 2 leaf level subqueries are independent and can be merted to compute once. Similarly the 2 parents can be also merged. Admittedly, this test case is not the best, I should have used a different table than `testData` in the parent queries to make the test more simple.
   
   But I don't want to block this fix so if we need it urgently I'm ok with mergint this PR but probably we can come up with a better alternative.



##########
sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala:
##########
@@ -2157,7 +2157,7 @@ class SubquerySuite extends QueryTest
     }
   }
 
-  test("Merge non-correlated scalar subqueries from different parent plans") {

Review Comment:
   I think the issue is that currently we allow merging a subquery to another one that is needed to compute the subquery itself. So we should collect the (transitive) subquery references in the new plan and don't try merging the new plan to those.
   @dtenedor, let me know if you want to update this PR or I can open one with the fix tomorrow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38052:
URL: https://github.com/apache/spark/pull/38052#issuecomment-1264375031

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
peter-toth commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984342208


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -534,20 +534,10 @@ class MergeScalarSubqueriesSuite extends PlanTest {
 
   test("Merging subqueries from different places") {
     val subquery1 = ScalarSubquery(testRelation.select(('a + 1).as("a_plus1")))
-    val subquery2 = ScalarSubquery(testRelation.select(('a + 2).as("a_plus2")))

Review Comment:
   Hi @dtenedor, thanks for reporting this bug and pinging me.
   
   I think the test is a valid usecase so I'm thinking about if we could come up with a better fix. I think the root cause of the issue is in `tryMergePlans()`. When we try merging `Project`s or `Aggregate` and don't take into accout that `np` can contain `ScalarSubqueryReference`. So probably filtering out `ScalarSubqueryReference`s there should fix the bug but keep this usecase working. Also, I think `EXISTS_SUBQUERY`, `IN_SUBQUERY` should not matter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
dtenedor commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984830906


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -534,20 +534,10 @@ class MergeScalarSubqueriesSuite extends PlanTest {
 
   test("Merging subqueries from different places") {
     val subquery1 = ScalarSubquery(testRelation.select(('a + 1).as("a_plus1")))
-    val subquery2 = ScalarSubquery(testRelation.select(('a + 2).as("a_plus2")))

Review Comment:
   @peter-toth Thanks for the review!
   
   I was able to make the fix you suggested instead, to fix the bug while preserving the functionality of this unit test case. Please look again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
peter-toth commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r985078417


##########
sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala:
##########
@@ -2157,7 +2157,7 @@ class SubquerySuite extends QueryTest
     }
   }
 
-  test("Merge non-correlated scalar subqueries from different parent plans") {

Review Comment:
   Unfortunately, this is also a valid test and so it shouldn't be altered.
   I don't want to block this fix so if we need it urgently I'm ok with mergint this PR but probably we can come up with a better alternative.
   
   I think the issue is that we try merging a new plan that contains a `ScalarSubqueryReference`  that references a cached plan with the same cached plan. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
dtenedor commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984851320


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -534,20 +534,10 @@ class MergeScalarSubqueriesSuite extends PlanTest {
 
   test("Merging subqueries from different places") {
     val subquery1 = ScalarSubquery(testRelation.select(('a + 1).as("a_plus1")))
-    val subquery2 = ScalarSubquery(testRelation.select(('a + 2).as("a_plus2")))

Review Comment:
   🙏 thanks for your attention!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r985131060


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4513,6 +4513,16 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
       }
     }
   }
+
+  test("SPARK-40618: Regression test for merging subquery bug with nested subqueries") {
+    // This test contains a subquery expression with another subquery expression nested inside.
+    // It acts as a regression test to ensure that the MergeScalarSubqueries rule does not attempt
+    // to merge them together.
+    withTable("t") {
+      sql("create table t(col int) using csv")
+      checkAnswer(sql("select(select sum((select sum(col) from t)) from t)"), Row(null))

Review Comment:
   Nit: The following `SubquerySuite` might be a better place to host this test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
dtenedor commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r985959293


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4513,6 +4513,16 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
       }
     }
   }
+
+  test("SPARK-40618: Regression test for merging subquery bug with nested subqueries") {
+    // This test contains a subquery expression with another subquery expression nested inside.
+    // It acts as a regression test to ensure that the MergeScalarSubqueries rule does not attempt
+    // to merge them together.
+    withTable("t") {
+      sql("create table t(col int) using csv")
+      checkAnswer(sql("select(select sum((select sum(col) from t)) from t)"), Row(null))

Review Comment:
   Sounds good, done.



##########
sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala:
##########
@@ -2157,7 +2157,7 @@ class SubquerySuite extends QueryTest
     }
   }
 
-  test("Merge non-correlated scalar subqueries from different parent plans") {

Review Comment:
   @peter-toth Yeah, the bug is in `tryMergePlans` where it tries to merge a subquery plan with another plan that contains the original subquery plan inside, e.g. the regression test `select(select sum((select sum(col) from t)) from t`. I tried doing that by keeping sets of visited plans and checking them before merging, but it got complex since some `ScalarSubqueryReference`s were already converted.
   
   We should probably merge this PR to fix planning errors while keeping most of the optimizations from this rule in the short term. Then we can follow-up to restore the remaining optimization. I put a TODO for that; with the regression test present, it should be safe to proceed from there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
dtenedor commented on PR #38052:
URL: https://github.com/apache/spark/pull/38052#issuecomment-1262944165

   Hi @peter-toth, @sigmod, @gengliangwang could you please help to review this bug fix? 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984111489


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -560,19 +550,20 @@ class MergeScalarSubqueriesSuite extends PlanTest {
           Literal("a_plus2"), 'a_plus2,
           Literal("b"), 'b
         )).as("mergedValue"))
-    val analyzedMergedSubquery = mergedSubquery.analyze
-    val correctAnswer = WithCTE(
-      testRelation
-        .select(
-          extractorExpression(0, analyzedMergedSubquery.output, 0),
-          extractorExpression(0, analyzedMergedSubquery.output, 1),
-          extractorExpression(0, analyzedMergedSubquery.output, 2))
-        .where(
-          extractorExpression(0, analyzedMergedSubquery.output, 0) +
-          extractorExpression(0, analyzedMergedSubquery.output, 1) +
-          extractorExpression(0, analyzedMergedSubquery.output, 2) === 0),
-      Seq(definitionNode(analyzedMergedSubquery, 0)))
-
-    comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer.analyze)
+    val correctAnswer =

Review Comment:
   Nit: It is better to have the `correctAnswer` as a logical plan instead of a string representation. Otherwise, later on if the `toString` representation is changed, we will need to change the test case here as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #38052:
URL: https://github.com/apache/spark/pull/38052#issuecomment-1263993233

   @dtenedor just reminder, could you check the test failure?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r985131060


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4513,6 +4513,16 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
       }
     }
   }
+
+  test("SPARK-40618: Regression test for merging subquery bug with nested subqueries") {
+    // This test contains a subquery expression with another subquery expression nested inside.
+    // It acts as a regression test to ensure that the MergeScalarSubqueries rule does not attempt
+    // to merge them together.
+    withTable("t") {
+      sql("create table t(col int) using csv")
+      checkAnswer(sql("select(select sum((select sum(col) from t)) from t)"), Row(null))

Review Comment:
   Nit: there is a `SubquerySuite` which might be a better place to host this test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
dtenedor commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984830906


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -534,20 +534,10 @@ class MergeScalarSubqueriesSuite extends PlanTest {
 
   test("Merging subqueries from different places") {
     val subquery1 = ScalarSubquery(testRelation.select(('a + 1).as("a_plus1")))
-    val subquery2 = ScalarSubquery(testRelation.select(('a + 2).as("a_plus2")))

Review Comment:
   Thanks for the review!
   
   I was able to make the fix you suggested instead, to fix the bug while preserving the functionality of this unit test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a diff in pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
peter-toth commented on code in PR #38052:
URL: https://github.com/apache/spark/pull/38052#discussion_r984851451


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueriesSuite.scala:
##########
@@ -534,20 +534,10 @@ class MergeScalarSubqueriesSuite extends PlanTest {
 
   test("Merging subqueries from different places") {
     val subquery1 = ScalarSubquery(testRelation.select(('a + 1).as("a_plus1")))
-    val subquery2 = ScalarSubquery(testRelation.select(('a + 2).as("a_plus2")))

Review Comment:
   Looks good, thanks for the fix!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #38052: [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries

Posted by GitBox <gi...@apache.org>.
dtenedor commented on PR #38052:
URL: https://github.com/apache/spark/pull/38052#issuecomment-1264041696

   > @dtenedor just reminder, could you check the test failure?
   @gengliangwang my mistake 😦 I have updated the unit test file. That test should now show that we avoid merging nested subqueries in that case. I updated it accordingly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org