You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "milastdbx (via GitHub)" <gi...@apache.org> on 2023/12/21 14:55:37 UTC

[PR] Execute immediate VariableReference foldable [spark]

milastdbx opened a new pull request, #44450:
URL: https://github.com/apache/spark/pull/44450

   ### What changes were proposed in this pull request?
   As part of EXECUTE IMMEDIATE statement, we are doing variable resolution, and [previous PR ](https://github.com/databricks/runtime/pull/79845) introduced copy/paste issue from SET variable `canFold = false`. 
   
   This if fine for SET command, but for parameters should be foldable to match regular query behaviour with same pattern.
   
   
   ### Why are the changes needed?
   To align parameterized and non parameterized queries
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Manually
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44450:
URL: https://github.com/apache/spark/pull/44450#discussion_r1435170715


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala:
##########
@@ -177,7 +177,7 @@ class SubstituteExecuteImmediate(val catalogManager: CatalogManager)
 
   private def getVariableReference(expr: Expression, nameParts: Seq[String]): VariableReference = {
     lookupVariable(nameParts) match {
-      case Some(variable) => variable.copy(canFold = false)
+      case Some(variable) => variable.copy()

Review Comment:
   I'm a bit confused. `VariableReference` is immutable, why do we need to copy to avoid sharing the instance?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #44450:
URL: https://github.com/apache/spark/pull/44450#discussion_r1434323554


##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -27,6 +27,27 @@ import org.apache.spark.sql.test.SharedSparkSession
 
 class ParametersSuite extends QueryTest with SharedSparkSession {
 
+  test("SPARK-46481: Test variable folding") {
+    sql("DECLARE a INT = 1").collect()
+    sql("SET VAR a = 1").collect()
+    val expected = sql("SELECT 42 WHERE 1 = 1").queryExecution.executedPlan
+    val variableDirectly = sql("SELECT 42 WHERE 1 = a").queryExecution.executedPlan
+    val parameterizedSpark =
+      spark.sql("SELECT 42 WHERE 1 = ?", Array(1)).queryExecution.executedPlan
+    val parameterizedSql =
+      spark.sql("EXECUTE IMMEDIATE 'SELECT 42 WHERE 1 = ?' USING a").queryExecution.executedPlan
+
+    assert(
+      expected.semanticHash() == variableDirectly.semanticHash(),
+      "Select with variable directly query failed")
+    assert(
+      expected.semanticHash() == parameterizedSpark.semanticHash(),

Review Comment:
   instead of comparing the `executedPlan` and looking at the semantic hashes, can we use `comparePlans` like this example [1] on the logical plans?
   
   [1] https://src.dev.databricks.com/databricks/runtime@master/-/blob/sql/core/src/test/scala/com/databricks/sql/execution/command/DDLParserSuite.scala?L311-331&subtree=true



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "hmagrahi (via GitHub)" <gi...@apache.org>.
hmagrahi commented on code in PR #44450:
URL: https://github.com/apache/spark/pull/44450#discussion_r1435049013


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala:
##########
@@ -177,7 +177,7 @@ class SubstituteExecuteImmediate(val catalogManager: CatalogManager)
 
   private def getVariableReference(expr: Expression, nameParts: Seq[String]): VariableReference = {
     lookupVariable(nameParts) match {
-      case Some(variable) => variable.copy(canFold = false)
+      case Some(variable) => variable.copy()

Review Comment:
   @milastdbx that's clear, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #44450: [SPARK-46481] Execute immediate VariableReference foldable
URL: https://github.com/apache/spark/pull/44450


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "milastdbx (via GitHub)" <gi...@apache.org>.
milastdbx commented on code in PR #44450:
URL: https://github.com/apache/spark/pull/44450#discussion_r1434860929


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala:
##########
@@ -177,7 +177,7 @@ class SubstituteExecuteImmediate(val catalogManager: CatalogManager)
 
   private def getVariableReference(expr: Expression, nameParts: Seq[String]): VariableReference = {
     lookupVariable(nameParts) match {
-      case Some(variable) => variable.copy(canFold = false)
+      case Some(variable) => variable.copy()

Review Comment:
   lookupVariable creates new VariableReference object which encapsulates VariableDefinition object which is stored in catalog.
   
   I think its better to have it copied



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "dbatomic (via GitHub)" <gi...@apache.org>.
dbatomic commented on code in PR #44450:
URL: https://github.com/apache/spark/pull/44450#discussion_r1434319665


##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -27,6 +27,27 @@ import org.apache.spark.sql.test.SharedSparkSession
 
 class ParametersSuite extends QueryTest with SharedSparkSession {
 
+  test("SPARK-46481: Test variable folding") {
+    sql("DECLARE a INT = 1").collect()
+    sql("SET VAR a = 1").collect()
+    val expected = sql("SELECT 42 WHERE 1 = 1").queryExecution.executedPlan
+    val variableDirectly = sql("SELECT 42 WHERE 1 = a").queryExecution.executedPlan
+    val parameterizedSpark =
+      spark.sql("SELECT 42 WHERE 1 = ?", Array(1)).queryExecution.executedPlan
+    val parameterizedSql =
+      spark.sql("EXECUTE IMMEDIATE 'SELECT 42 WHERE 1 = ?' USING a").queryExecution.executedPlan
+
+    assert(
+      expected.semanticHash() == variableDirectly.semanticHash(),
+      "Select with variable directly query failed")

Review Comment:
   Maybe better test error message is "...produced unexpected plan", since nothing failed here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #44450:
URL: https://github.com/apache/spark/pull/44450#issuecomment-1869007051

   the failed tests are unrelated, merging to master, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #44450:
URL: https://github.com/apache/spark/pull/44450#discussion_r1434278900


##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -27,6 +27,27 @@ import org.apache.spark.sql.test.SharedSparkSession
 
 class ParametersSuite extends QueryTest with SharedSparkSession {
 
+  test("SPARK-46481: Test variable folding") {
+    sql("DECLARE a INT = 1").collect()
+    sql("SET VAR a = 1").collect()

Review Comment:
   Is `collect()` here and above really needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #44450:
URL: https://github.com/apache/spark/pull/44450#discussion_r1435172264


##########
sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala:
##########
@@ -18,14 +18,29 @@
 package org.apache.spark.sql
 
 import java.time.{Instant, LocalDate, LocalDateTime, ZoneId}
-
 import org.apache.spark.sql.catalyst.expressions.Literal
 import org.apache.spark.sql.catalyst.parser.ParseException
+import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.functions.{array, call_function, lit, map, map_from_arrays, map_from_entries, str_to_map, struct}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 
-class ParametersSuite extends QueryTest with SharedSparkSession {
+class ParametersSuite extends QueryTest with SharedSparkSession with PlanTest {
+
+  test("SPARK-46481: Test variable folding") {

Review Comment:
   nit: we usually put new test at the end.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46481] Execute immediate VariableReference foldable [spark]

Posted by "hmagrahi (via GitHub)" <gi...@apache.org>.
hmagrahi commented on code in PR #44450:
URL: https://github.com/apache/spark/pull/44450#discussion_r1434603017


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala:
##########
@@ -177,7 +177,7 @@ class SubstituteExecuteImmediate(val catalogManager: CatalogManager)
 
   private def getVariableReference(expr: Expression, nameParts: Seq[String]): VariableReference = {
     lookupVariable(nameParts) match {
-      case Some(variable) => variable.copy(canFold = false)
+      case Some(variable) => variable.copy()

Review Comment:
   I think no need for .copy()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org