You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "jchen5 (via GitHub)" <gi...@apache.org> on 2023/05/08 21:22:23 UTC

[GitHub] [spark] jchen5 opened a new pull request, #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

jchen5 opened a new pull request, #41094:
URL: https://github.com/apache/spark/pull/41094

   ### What changes were proposed in this pull request?
   Before this PR, IN subquery expressions are incorrectly always marked as non-nullable, even when they are actually nullable. They correctly check the nullability of the left-hand-side, but the right-hand-side of a IN subquery, the ListQuery, is currently defined with nullability = false always. This is incorrect and can lead to incorrect query transformations.
   
   Example: (non_nullable_col IN (select nullable_col)) <=> TRUE . Here the IN expression returns NULL when the nullable_col is null, but our code marks it as non-nullable, and therefore SimplifyBinaryComparison transforms away the <=> TRUE, transforming the expression to non_nullable_col IN (select nullable_col) , which is an incorrect transformation because NULL values of nullable_col now cause the expression to yield NULL instead of FALSE.
   
   This bug can potentially lead to wrong results, but in most cases this doesn't directly cause wrong results end-to-end, because IN subqueries are almost always transformed to semi/anti/existence joins in RewritePredicateSubquery, and this rewrite can also incorrectly discard NULLs, which is another bug. But we can observe it causing wrong behavior in unit tests, and it could easily lead to incorrect query results if there are changes to the surrounding context, so it should be fixed regardless.
   
   This is a long-standing bug that has existed at least since 2016, as long as the ListQuery class has existed.
   
   ### Why are the changes needed?
   Fix correctness bug.
   
   ### Does this PR introduce _any_ user-facing change?
   May change query results to fix correctness bug.
   
   ### How was this patch tested?
   Unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1189236634


##########
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-nulls.sql:
##########
@@ -0,0 +1,15 @@
+create table t0(t0a int not null, t0b int) using parquet;

Review Comment:
   How do I define a column as `not null` in a temp view? Is it possible?
   
   If not, I can drop the tables at the end of the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1189997292


##########
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-nulls.sql:
##########
@@ -0,0 +1,15 @@
+create table t0(t0a int not null, t0b int) using parquet;

Review Comment:
   Thanks. I had been trying to create temp view with `values` and it was always nullable, but this works.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on PR #41094:
URL: https://github.com/apache/spark/pull/41094#issuecomment-1539075096

   @cloud-fan @allisonwang-db 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1189272305


##########
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-nulls.sql:
##########
@@ -0,0 +1,15 @@
+create table t0(t0a int not null, t0b int) using parquet;

Review Comment:
   temp view infers nullability from the input query
   ```
   scala> sql("create temp view v as select 1 as non_nullable, cast(null as int) as nullable")
   res2: org.apache.spark.sql.DataFrame = []
   
   scala> spark.table("v").printSchema
   root
    |-- non_nullable: integer (nullable = false)
    |-- nullable: integer (nullable = true)
   ```
   
   BTW, parquet table can't have non-nullable columns in Spark, so this test doesn't actually work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1187977974


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala:
##########
@@ -228,4 +228,31 @@ class BinaryComparisonSimplificationSuite extends PlanTest {
 
     checkCondition(nullableRelation, $"a" <=> nullLit, $"a".isNull)
   }
+
+  test("IN subquery nullability") {
+    // Should not optimize, since the IN subquery is nullable
+    {

Review Comment:
   Please don't use this verbose redundant `{}` pattern.
   
   You can use `Seq(...).foreach` patterns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1187977974


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala:
##########
@@ -228,4 +228,31 @@ class BinaryComparisonSimplificationSuite extends PlanTest {
 
     checkCondition(nullableRelation, $"a" <=> nullLit, $"a".isNull)
   }
+
+  test("IN subquery nullability") {
+    // Should not optimize, since the IN subquery is nullable
+    {

Review Comment:
   Let's avoid these verbose redundant `{}` copy&paste patterns.
   
   You can use `Seq(...).foreach` patterns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1188621808


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala:
##########
@@ -367,7 +367,7 @@ case class ListQuery(
     plan.output.head.dataType
   }
   override lazy val resolved: Boolean = childrenResolved && plan.resolved && numCols != -1
-  override def nullable: Boolean = false
+  override def nullable: Boolean = childOutputs.exists(_.nullable)

Review Comment:
   InSubquery.nullable derives from the nullability of its children, which are the values (lhs) and ListQuery (rhs). ListQuery.nullable does affect the results of the unit test I added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1187975843


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala:
##########
@@ -228,4 +228,31 @@ class BinaryComparisonSimplificationSuite extends PlanTest {
 
     checkCondition(nullableRelation, $"a" <=> nullLit, $"a".isNull)
   }
+
+  test("IN subquery nullability") {

Review Comment:
   Could you add a test prefix please, @jchen5 ?
   ```scala
   - test("IN subquery nullability") {
   + test("SPARK-43413: IN subquery nullability") {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1188683767


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala:
##########
@@ -367,7 +367,7 @@ case class ListQuery(
     plan.output.head.dataType
   }
   override lazy val resolved: Boolean = childrenResolved && plan.resolved && numCols != -1
-  override def nullable: Boolean = false
+  override def nullable: Boolean = childOutputs.exists(_.nullable)

Review Comment:
   Ok, I can try changing it to throw an exception in ListQuery.nullable and change InSubquery.nullable to check ListQuery.childOutputs directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1195438179


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4199,6 +4199,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_IN_SUBQUERY_NULLABILITY =
+    buildConf("spark.sql.legacy.inSubqueryNullability")
+      .internal()
+      .doc(s"When set to false, IN subquery nullability is correctly calculated based on " +
+        s"both the left and right sides of the IN. When set to true, restores the legacy " +
+        "behavior that does not check the right side's nullability.")
+      .version("3.5.0")
+      .booleanConf
+      .createWithDefault(false)

Review Comment:
   Since this is `false`, we need to add `spark.sql.legacy.inSubqueryNullability` to our SQL migration guide. Could you make a followup, @jchen5 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #41094:
URL: https://github.com/apache/spark/pull/41094#issuecomment-1550014858

   Thank you, @jchen5 and @cloud-fan .
   
   Do you think we can have a backport for branch-3.4 at least?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability
URL: https://github.com/apache/spark/pull/41094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1188378325


##########
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-nulls.sql:
##########
@@ -0,0 +1,15 @@
+create table t0(t0a int not null, t0b int) using parquet;

Review Comment:
   we usually create temp views in the test, as tables are persistent and may affect other test cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1188377181


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala:
##########
@@ -367,7 +367,7 @@ case class ListQuery(
     plan.output.head.dataType
   }
   override lazy val resolved: Boolean = childrenResolved && plan.resolved && numCols != -1
-  override def nullable: Boolean = false
+  override def nullable: Boolean = childOutputs.exists(_.nullable)

Review Comment:
   One problem is `ListQuery` can't be executed alone so its nullability is useless. We can even throw exception here to make sure no one call `def nullable`.
   
   Shall we fix `In#nullable` instead? If the list is a single-element Seq of `ListQuery`, the nullability should be defined differently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1188669618


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala:
##########
@@ -367,7 +367,7 @@ case class ListQuery(
     plan.output.head.dataType
   }
   override lazy val resolved: Boolean = childrenResolved && plan.resolved && numCols != -1
-  override def nullable: Boolean = false
+  override def nullable: Boolean = childOutputs.exists(_.nullable)

Review Comment:
   Yea this fixes the problem, but it confuses people about what `ListQuery.nullable` means. If we can evaluate `ListQuery` alone, it should return a non-nullable struct containing the query output columns. Since we never evaluate `ListQuery` alone, its nullability is nonsense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #41094:
URL: https://github.com/apache/spark/pull/41094#issuecomment-1548842243

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1188621808


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala:
##########
@@ -367,7 +367,7 @@ case class ListQuery(
     plan.output.head.dataType
   }
   override lazy val resolved: Boolean = childrenResolved && plan.resolved && numCols != -1
-  override def nullable: Boolean = false
+  override def nullable: Boolean = childOutputs.exists(_.nullable)

Review Comment:
   InSubquery.nullable derives from the nullability of its children, which are the values (lhs) and ListQuery (rhs). Fixing ListQuery.nullable therefore fixes InSubquery.nullable and fixes the behavior in the unit test I added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1195846275


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4199,6 +4199,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_IN_SUBQUERY_NULLABILITY =
+    buildConf("spark.sql.legacy.inSubqueryNullability")
+      .internal()
+      .doc(s"When set to false, IN subquery nullability is correctly calculated based on " +
+        s"both the left and right sides of the IN. When set to true, restores the legacy " +
+        "behavior that does not check the right side's nullability.")
+      .version("3.5.0")
+      .booleanConf
+      .createWithDefault(false)

Review Comment:
   I think the purpose of this flag is not to let users go back to the wrong result, but just in case the newly added assertion fails. I think it's better to mention this flag in the assert error message, than mentioning it in the migration guide.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1195937039


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4199,6 +4199,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_IN_SUBQUERY_NULLABILITY =
+    buildConf("spark.sql.legacy.inSubqueryNullability")
+      .internal()
+      .doc(s"When set to false, IN subquery nullability is correctly calculated based on " +
+        s"both the left and right sides of the IN. When set to true, restores the legacy " +
+        "behavior that does not check the right side's nullability.")
+      .version("3.5.0")
+      .booleanConf
+      .createWithDefault(false)

Review Comment:
   +100 for @cloud-fan 's holistic and complete solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jchen5 commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

Posted by "jchen5 (via GitHub)" <gi...@apache.org>.
jchen5 commented on code in PR #41094:
URL: https://github.com/apache/spark/pull/41094#discussion_r1196750077


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4199,6 +4199,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_IN_SUBQUERY_NULLABILITY =
+    buildConf("spark.sql.legacy.inSubqueryNullability")
+      .internal()
+      .doc(s"When set to false, IN subquery nullability is correctly calculated based on " +
+        s"both the left and right sides of the IN. When set to true, restores the legacy " +
+        "behavior that does not check the right side's nullability.")
+      .version("3.5.0")
+      .booleanConf
+      .createWithDefault(false)

Review Comment:
   Agreed, created followup here: https://github.com/apache/spark/pull/41202



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org