You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dtenedor (via GitHub)" <gi...@apache.org> on 2023/03/20 18:20:45 UTC

[GitHub] [spark] dtenedor opened a new pull request, #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

dtenedor opened a new pull request, #40496:
URL: https://github.com/apache/spark/pull/40496

   ### What changes were proposed in this pull request?
   
   This PR enables the new golden file test framework for analysis for all input files.
   
   Background:
   * In https://github.com/apache/spark/pull/40449 we added the ability to exercise the analyzer on the SQL queries in existing golden files in the `sql/core/src/test/resources/sql-tests/inputs` directory, writing separate output test files in the new `sql/core/src/test/resources/sql-tests/analyzer-results` directory in additional to the original output directory for full end-to-end query execution results.
   * That PR also added an allowlist of input files to include in this new dual-run mode.
   * In this PR, we remove that allowlist exercise the new dual-run mode for all the input files. We also extend the analyzer testing to support separate test cases in ANSI-mode, TimestampNTZ, and UDFs.
   
   ### Why are the changes needed?
   
   This improves test coverage and helps prevent against accidental regressions in the future as we edit the code.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   This PR adds testing only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144023834


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##########
@@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging {
    */
   protected def getNormalizedQueryAnalysisResult(
       session: SparkSession, sql: String): (String, Seq[String]) = {
+    // Note that creating the following DataFrame includes eager execution for commands that create
+    // objects such as views. Therefore any following queries that reference these objects should
+    // find them in the catalog.
     val df = session.sql(sql)
     val schema = df.schema.catalogString
-    // Get the answer, but also get rid of the #1234 expression IDs that show up in analyzer plans.
-    (schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+    val analyzed = df.queryExecution.analyzed
+    // Determine if the analyzed plan contains any nondeterministic expressions.
+    var deterministic = true
+    analyzed.transformAllExpressionsWithSubqueries {
+      case expr: CurrentDate =>
+        deterministic = false
+        expr
+      case expr: CurrentTimestampLike =>
+        deterministic = false
+        expr
+      case expr: CurrentUser =>
+        deterministic = false
+        expr
+      case expr: Literal if expr.dataType == DateType || expr.dataType == TimestampType =>
+        deterministic = false
+        expr
+      case expr if !expr.deterministic =>
+        deterministic = false
+        expr
+    }
+    if (deterministic) {
+      // Perform query analysis, but also get rid of the #1234 expression IDs that show up in the
+      // resolved plans.
+      (schema, Seq(replaceNotIncludedMsg(analyzed.toString)))
+    } else {
+      // The analyzed plan is nondeterministic so elide it from the result to keep tests reliable.
+      (schema, Seq("[Analyzer test output redacted due to nondeterminism]"))
+    }

Review Comment:
   Hi @amaliujia I am a bit confused by your question, would you mind to clarify? This redaction is for the generated analyzer test result files only, in order to prevent the tests from becoming flaky from things like `current_date()` changing each day. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1483683148

   ```
   [info] - timestampNTZ/datetime-special.sql_analyzer_test *** FAILED *** (11 milliseconds)
   [info]   timestampNTZ/datetime-special.sql_analyzer_test
   [info]   Expected "...date(999999, 3, 18, [false) AS make_date(999999, 3, 18)#x, make_date(-1, 1, 28, fals]e) AS make_date(-1, ...", but got "...date(999999, 3, 18, [true) AS make_date(999999, 3, 18)#x, make_date(-1, 1, 28, tru]e) AS make_date(-1, ..." Result did not match for query #1
   [info]   select make_date(999999, 3, 18), make_date(-1, 1, 28) (SQLQueryTestSuite.scala:777)
   [info]   org.scalatest.exceptions.TestFailedException:
   ```
   
   @dtenedor
   
   Seems not fixed,  run 
   
   `SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"`
   
   can reproduce the failure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files
URL: https://github.com/apache/spark/pull/40496


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1479017911

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1142891030


##########
sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out:
##########
@@ -0,0 +1,881 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT CAST('1.23' AS int)
+-- !query analysis
+Project [cast(1.23 as int) AS CAST(1.23 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1.23' AS long)
+-- !query analysis
+Project [cast(1.23 as bigint) AS CAST(1.23 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS int)
+-- !query analysis
+Project [cast(-4.56 as int) AS CAST(-4.56 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS long)
+-- !query analysis
+Project [cast(-4.56 as bigint) AS CAST(-4.56 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS int)
+-- !query analysis
+Project [cast(abc as int) AS CAST(abc AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS long)
+-- !query analysis
+Project [cast(abc as bigint) AS CAST(abc AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS float)
+-- !query analysis
+Project [cast(abc as float) AS CAST(abc AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS double)
+-- !query analysis
+Project [cast(abc as double) AS CAST(abc AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1234567890123' AS int)
+-- !query analysis
+Project [cast(1234567890123 as int) AS CAST(1234567890123 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('12345678901234567890123' AS long)
+-- !query analysis
+Project [cast(12345678901234567890123 as bigint) AS CAST(12345678901234567890123 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS int)
+-- !query analysis
+Project [cast( as int) AS CAST( AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS long)
+-- !query analysis
+Project [cast( as bigint) AS CAST( AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS float)
+-- !query analysis
+Project [cast( as float) AS CAST( AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS double)
+-- !query analysis
+Project [cast( as double) AS CAST( AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS int)
+-- !query analysis
+Project [cast(null as int) AS CAST(NULL AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS long)
+-- !query analysis
+Project [cast(null as bigint) AS CAST(NULL AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS int)
+-- !query analysis
+Project [cast(123.a as int) AS CAST(123.a AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS long)
+-- !query analysis
+Project [cast(123.a as bigint) AS CAST(123.a AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS float)
+-- !query analysis
+Project [cast(123.a as float) AS CAST(123.a AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS double)
+-- !query analysis
+Project [cast(123.a as double) AS CAST(123.a AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483648' AS int)
+-- !query analysis
+Project [cast(-2147483648 as int) AS CAST(-2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483649' AS int)
+-- !query analysis
+Project [cast(-2147483649 as int) AS CAST(-2147483649 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483647' AS int)
+-- !query analysis
+Project [cast(2147483647 as int) AS CAST(2147483647 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483648' AS int)
+-- !query analysis
+Project [cast(2147483648 as int) AS CAST(2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775808' AS long)
+-- !query analysis
+Project [cast(-9223372036854775808 as bigint) AS CAST(-9223372036854775808 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775809' AS long)
+-- !query analysis
+Project [cast(-9223372036854775809 as bigint) AS CAST(-9223372036854775809 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775807' AS long)
+-- !query analysis
+Project [cast(9223372036854775807 as bigint) AS CAST(9223372036854775807 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775808' AS long)
+-- !query analysis
+Project [cast(9223372036854775808 as bigint) AS CAST(9223372036854775808 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST('abc' AS binary))
+-- !query analysis
+Project [hex(cast(abc as binary)) AS hex(CAST(abc AS BINARY))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST(CAST(123 AS byte) AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(CAST(123 AS TINYINT) AS BINARY)\"",
+    "srcType" : "\"TINYINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 44,
+    "fragment" : "CAST(CAST(123 AS byte) AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(CAST(-123 AS byte) AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(CAST(-123 AS TINYINT) AS BINARY)\"",
+    "srcType" : "\"TINYINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 45,
+    "fragment" : "CAST(CAST(-123 AS byte) AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123S AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"SMALLINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(123S AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123S AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"SMALLINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 32,
+    "fragment" : "CAST(-123S AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123 AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"INT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 30,
+    "fragment" : "CAST(123 AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123 AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"INT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(-123 AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123L AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"BIGINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(123L AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123L AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"BIGINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 32,
+    "fragment" : "CAST(-123L AS binary)"
+  } ]
+}
+
+
+-- !query
+DESC FUNCTION boolean
+-- !query analysis
+DescribeFunctionCommand org.apache.spark.sql.catalyst.expressions.ExpressionInfo@55f3b40a, false
+
+
+-- !query
+DESC FUNCTION EXTENDED boolean
+-- !query analysis
+DescribeFunctionCommand org.apache.spark.sql.catalyst.expressions.ExpressionInfo@55f3b40a, true

Review Comment:
   Some failed cases are related to `hashcode` and `uuid`, and we should shield these random identifiers
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1483703189

   Looks like @LuciferYang fixed it with https://github.com/apache/spark/pull/40552. Thanks so much for the fix!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1478658342

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144040124


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##########
@@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging {
    */
   protected def getNormalizedQueryAnalysisResult(
       session: SparkSession, sql: String): (String, Seq[String]) = {
+    // Note that creating the following DataFrame includes eager execution for commands that create
+    // objects such as views. Therefore any following queries that reference these objects should
+    // find them in the catalog.
     val df = session.sql(sql)
     val schema = df.schema.catalogString
-    // Get the answer, but also get rid of the #1234 expression IDs that show up in analyzer plans.
-    (schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+    val analyzed = df.queryExecution.analyzed
+    // Determine if the analyzed plan contains any nondeterministic expressions.
+    var deterministic = true
+    analyzed.transformAllExpressionsWithSubqueries {
+      case expr: CurrentDate =>
+        deterministic = false
+        expr
+      case expr: CurrentTimestampLike =>
+        deterministic = false
+        expr
+      case expr: CurrentUser =>
+        deterministic = false
+        expr
+      case expr: Literal if expr.dataType == DateType || expr.dataType == TimestampType =>
+        deterministic = false
+        expr
+      case expr if !expr.deterministic =>
+        deterministic = false
+        expr
+    }
+    if (deterministic) {
+      // Perform query analysis, but also get rid of the #1234 expression IDs that show up in the
+      // resolved plans.
+      (schema, Seq(replaceNotIncludedMsg(analyzed.toString)))
+    } else {
+      // The analyzed plan is nondeterministic so elide it from the result to keep tests reliable.
+      (schema, Seq("[Analyzer test output redacted due to nondeterminism]"))
+    }

Review Comment:
   Oh yeah it's only used in the SQL golden test runner. It's in `SQLQueryTestHelper.scala` :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1478841111

   @dtenedor Wait a few minutes for me to check with Scala 2.13 manually


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1478858139

   > @dtenedor Wait a few minutes for me to check with Scala 2.13 manually
   
   done, should be ok ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1483263177

   @HyukjinKwon I ran the test locally and it passes. Maybe it is fixed at head now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1483097427

   Sure, I can take a look.
   
   On Fri, Mar 24, 2023 at 3:12 AM Hyukjin Kwon ***@***.***>
   wrote:
   
   > I think ANSI test fails after this PR:
   >
   > [info] - timestampNTZ/datetime-special.sql_analyzer_test *** FAILED *** (31 milliseconds)
   > [info]   timestampNTZ/datetime-special.sql_analyzer_test
   > [info]   Expected "...date(999999, 3, 18, [false) AS make_date(999999, 3, 18)#x, make_date(-1, 1, 28, fals]e) AS make_date(-1, ...", but got "...date(999999, 3, 18, [true) AS make_date(999999, 3, 18)#x, make_date(-1, 1, 28, tru]e) AS make_date(-1, ..." Result did not match for query #1
   > [info]   select make_date(999999, 3, 18), make_date(-1, 1, 28) (SQLQueryTestSuite.scala:777)
   > [info]   org.scalatest.exceptions.TestFailedException:
   > [info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   > [info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   > [info]   at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
   > [info]   at org.scalatest.Assertions.assertResult(Assertions.scala:847)
   > [info]   at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
   > [info]   at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
   > [info]   at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$readGoldenFileAndCompareResults$3(SQLQueryTestSuite.scala:777)
   > [info]   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   >
   > https://github.com/apache/spark/actions/runs/4496107425/jobs/7910457741
   >
   > @dtenedor <https://github.com/dtenedor> mind taking a look please? cc
   > @gengliangwang <https://github.com/gengliangwang>
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/spark/pull/40496#issuecomment-1482557118>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AXU4PODW66JIEV5CNJWXEALW5VXQJANCNFSM6AAAAAAWBOMAJQ>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   
   
   -- 
   Please write anonymous feedback for Daniel at any time (form
   <https://docs.google.com/forms/d/e/1FAIpQLSc-Nd9JOncZTMb2hlhj9GxQO0igStEBgtFclLXFsQleamA0ag/viewform?vc=0&c=0&w=1&flr=0>
   ).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1482557118

   I think ANSI test fails after this PR:
   
   ```
   [info] - timestampNTZ/datetime-special.sql_analyzer_test *** FAILED *** (31 milliseconds)
   [info]   timestampNTZ/datetime-special.sql_analyzer_test
   [info]   Expected "...date(999999, 3, 18, [false) AS make_date(999999, 3, 18)#x, make_date(-1, 1, 28, fals]e) AS make_date(-1, ...", but got "...date(999999, 3, 18, [true) AS make_date(999999, 3, 18)#x, make_date(-1, 1, 28, tru]e) AS make_date(-1, ..." Result did not match for query #1
   [info]   select make_date(999999, 3, 18), make_date(-1, 1, 28) (SQLQueryTestSuite.scala:777)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   [info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   [info]   at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
   [info]   at org.scalatest.Assertions.assertResult(Assertions.scala:847)
   [info]   at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
   [info]   at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
   [info]   at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$readGoldenFileAndCompareResults$3(SQLQueryTestSuite.scala:777)
   [info]   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   ```
   
   https://github.com/apache/spark/actions/runs/4496107425/jobs/7910457741
   
   @dtenedor mind taking a look please? cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1476856830

   LGTM if tests pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1142891754


##########
sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out:
##########
@@ -0,0 +1,881 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT CAST('1.23' AS int)
+-- !query analysis
+Project [cast(1.23 as int) AS CAST(1.23 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1.23' AS long)
+-- !query analysis
+Project [cast(1.23 as bigint) AS CAST(1.23 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS int)
+-- !query analysis
+Project [cast(-4.56 as int) AS CAST(-4.56 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS long)
+-- !query analysis
+Project [cast(-4.56 as bigint) AS CAST(-4.56 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS int)
+-- !query analysis
+Project [cast(abc as int) AS CAST(abc AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS long)
+-- !query analysis
+Project [cast(abc as bigint) AS CAST(abc AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS float)
+-- !query analysis
+Project [cast(abc as float) AS CAST(abc AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS double)
+-- !query analysis
+Project [cast(abc as double) AS CAST(abc AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1234567890123' AS int)
+-- !query analysis
+Project [cast(1234567890123 as int) AS CAST(1234567890123 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('12345678901234567890123' AS long)
+-- !query analysis
+Project [cast(12345678901234567890123 as bigint) AS CAST(12345678901234567890123 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS int)
+-- !query analysis
+Project [cast( as int) AS CAST( AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS long)
+-- !query analysis
+Project [cast( as bigint) AS CAST( AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS float)
+-- !query analysis
+Project [cast( as float) AS CAST( AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS double)
+-- !query analysis
+Project [cast( as double) AS CAST( AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS int)
+-- !query analysis
+Project [cast(null as int) AS CAST(NULL AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS long)
+-- !query analysis
+Project [cast(null as bigint) AS CAST(NULL AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS int)
+-- !query analysis
+Project [cast(123.a as int) AS CAST(123.a AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS long)
+-- !query analysis
+Project [cast(123.a as bigint) AS CAST(123.a AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS float)
+-- !query analysis
+Project [cast(123.a as float) AS CAST(123.a AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS double)
+-- !query analysis
+Project [cast(123.a as double) AS CAST(123.a AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483648' AS int)
+-- !query analysis
+Project [cast(-2147483648 as int) AS CAST(-2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483649' AS int)
+-- !query analysis
+Project [cast(-2147483649 as int) AS CAST(-2147483649 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483647' AS int)
+-- !query analysis
+Project [cast(2147483647 as int) AS CAST(2147483647 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483648' AS int)
+-- !query analysis
+Project [cast(2147483648 as int) AS CAST(2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775808' AS long)
+-- !query analysis
+Project [cast(-9223372036854775808 as bigint) AS CAST(-9223372036854775808 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775809' AS long)
+-- !query analysis
+Project [cast(-9223372036854775809 as bigint) AS CAST(-9223372036854775809 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775807' AS long)
+-- !query analysis
+Project [cast(9223372036854775807 as bigint) AS CAST(9223372036854775807 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775808' AS long)
+-- !query analysis
+Project [cast(9223372036854775808 as bigint) AS CAST(9223372036854775808 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST('abc' AS binary))
+-- !query analysis
+Project [hex(cast(abc as binary)) AS hex(CAST(abc AS BINARY))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST(CAST(123 AS byte) AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(CAST(123 AS TINYINT) AS BINARY)\"",
+    "srcType" : "\"TINYINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 44,
+    "fragment" : "CAST(CAST(123 AS byte) AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(CAST(-123 AS byte) AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(CAST(-123 AS TINYINT) AS BINARY)\"",
+    "srcType" : "\"TINYINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 45,
+    "fragment" : "CAST(CAST(-123 AS byte) AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123S AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"SMALLINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(123S AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123S AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"SMALLINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 32,
+    "fragment" : "CAST(-123S AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123 AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"INT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 30,
+    "fragment" : "CAST(123 AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123 AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"INT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(-123 AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123L AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"BIGINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(123L AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123L AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"BIGINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 32,
+    "fragment" : "CAST(-123L AS binary)"
+  } ]
+}
+
+
+-- !query
+DESC FUNCTION boolean
+-- !query analysis
+DescribeFunctionCommand org.apache.spark.sql.catalyst.expressions.ExpressionInfo@55f3b40a, false
+
+
+-- !query
+DESC FUNCTION EXTENDED boolean
+-- !query analysis
+DescribeFunctionCommand org.apache.spark.sql.catalyst.expressions.ExpressionInfo@55f3b40a, true

Review Comment:
   @LuciferYang you're right. Let me try to normalize out those values in the output, or otherwise just exclude them from the analyzer testing. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1478835534

   @HyukjinKwon the tests are passing now, this is ready to merge if you are ready :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144038811


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##########
@@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging {
    */
   protected def getNormalizedQueryAnalysisResult(
       session: SparkSession, sql: String): (String, Seq[String]) = {
+    // Note that creating the following DataFrame includes eager execution for commands that create
+    // objects such as views. Therefore any following queries that reference these objects should
+    // find them in the catalog.
     val df = session.sql(sql)
     val schema = df.schema.catalogString
-    // Get the answer, but also get rid of the #1234 expression IDs that show up in analyzer plans.
-    (schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+    val analyzed = df.queryExecution.analyzed
+    // Determine if the analyzed plan contains any nondeterministic expressions.
+    var deterministic = true
+    analyzed.transformAllExpressionsWithSubqueries {
+      case expr: CurrentDate =>
+        deterministic = false
+        expr
+      case expr: CurrentTimestampLike =>
+        deterministic = false
+        expr
+      case expr: CurrentUser =>
+        deterministic = false
+        expr
+      case expr: Literal if expr.dataType == DateType || expr.dataType == TimestampType =>
+        deterministic = false
+        expr
+      case expr if !expr.deterministic =>
+        deterministic = false
+        expr
+    }
+    if (deterministic) {
+      // Perform query analysis, but also get rid of the #1234 expression IDs that show up in the
+      // resolved plans.
+      (schema, Seq(replaceNotIncludedMsg(analyzed.toString)))
+    } else {
+      // The analyzed plan is nondeterministic so elide it from the result to keep tests reliable.
+      (schema, Seq("[Analyzer test output redacted due to nondeterminism]"))
+    }

Review Comment:
   I was wondering if this code is used in non testing environment and I tried a code search and it seems only being used with test suites.
   
   Basically I was not sure in production if this part is also used then the `if else` actually changes the output.
   
   At least code search shows this is not a concern.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144021973


##########
sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out:
##########
@@ -0,0 +1,881 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT CAST('1.23' AS int)
+-- !query analysis
+Project [cast(1.23 as int) AS CAST(1.23 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1.23' AS long)
+-- !query analysis
+Project [cast(1.23 as bigint) AS CAST(1.23 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS int)
+-- !query analysis
+Project [cast(-4.56 as int) AS CAST(-4.56 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS long)
+-- !query analysis
+Project [cast(-4.56 as bigint) AS CAST(-4.56 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS int)
+-- !query analysis
+Project [cast(abc as int) AS CAST(abc AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS long)
+-- !query analysis
+Project [cast(abc as bigint) AS CAST(abc AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS float)
+-- !query analysis
+Project [cast(abc as float) AS CAST(abc AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS double)
+-- !query analysis
+Project [cast(abc as double) AS CAST(abc AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1234567890123' AS int)
+-- !query analysis
+Project [cast(1234567890123 as int) AS CAST(1234567890123 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('12345678901234567890123' AS long)
+-- !query analysis
+Project [cast(12345678901234567890123 as bigint) AS CAST(12345678901234567890123 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS int)
+-- !query analysis
+Project [cast( as int) AS CAST( AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS long)
+-- !query analysis
+Project [cast( as bigint) AS CAST( AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS float)
+-- !query analysis
+Project [cast( as float) AS CAST( AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS double)
+-- !query analysis
+Project [cast( as double) AS CAST( AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS int)
+-- !query analysis
+Project [cast(null as int) AS CAST(NULL AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS long)
+-- !query analysis
+Project [cast(null as bigint) AS CAST(NULL AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS int)
+-- !query analysis
+Project [cast(123.a as int) AS CAST(123.a AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS long)
+-- !query analysis
+Project [cast(123.a as bigint) AS CAST(123.a AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS float)
+-- !query analysis
+Project [cast(123.a as float) AS CAST(123.a AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS double)
+-- !query analysis
+Project [cast(123.a as double) AS CAST(123.a AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483648' AS int)
+-- !query analysis
+Project [cast(-2147483648 as int) AS CAST(-2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483649' AS int)
+-- !query analysis
+Project [cast(-2147483649 as int) AS CAST(-2147483649 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483647' AS int)
+-- !query analysis
+Project [cast(2147483647 as int) AS CAST(2147483647 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483648' AS int)
+-- !query analysis
+Project [cast(2147483648 as int) AS CAST(2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775808' AS long)
+-- !query analysis
+Project [cast(-9223372036854775808 as bigint) AS CAST(-9223372036854775808 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775809' AS long)
+-- !query analysis
+Project [cast(-9223372036854775809 as bigint) AS CAST(-9223372036854775809 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775807' AS long)
+-- !query analysis
+Project [cast(9223372036854775807 as bigint) AS CAST(9223372036854775807 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775808' AS long)
+-- !query analysis
+Project [cast(9223372036854775808 as bigint) AS CAST(9223372036854775808 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST('abc' AS binary))
+-- !query analysis
+Project [hex(cast(abc as binary)) AS hex(CAST(abc AS BINARY))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST(CAST(123 AS byte) AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(CAST(123 AS TINYINT) AS BINARY)\"",
+    "srcType" : "\"TINYINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 44,
+    "fragment" : "CAST(CAST(123 AS byte) AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(CAST(-123 AS byte) AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(CAST(-123 AS TINYINT) AS BINARY)\"",
+    "srcType" : "\"TINYINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 45,
+    "fragment" : "CAST(CAST(-123 AS byte) AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123S AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"SMALLINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(123S AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123S AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"SMALLINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 32,
+    "fragment" : "CAST(-123S AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123 AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"INT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 30,
+    "fragment" : "CAST(123 AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123 AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"INT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(-123 AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(123L AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(123 AS BINARY)\"",
+    "srcType" : "\"BIGINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 31,
+    "fragment" : "CAST(123L AS binary)"
+  } ]
+}
+
+
+-- !query
+SELECT HEX(CAST(-123L AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+    "config" : "\"spark.sql.ansi.enabled\"",
+    "configVal" : "'false'",
+    "sqlExpr" : "\"CAST(-123 AS BINARY)\"",
+    "srcType" : "\"BIGINT\"",
+    "targetType" : "\"BINARY\""
+  },
+  "queryContext" : [ {
+    "objectType" : "",
+    "objectName" : "",
+    "startIndex" : 12,
+    "stopIndex" : 32,
+    "fragment" : "CAST(-123L AS binary)"
+  } ]
+}
+
+
+-- !query
+DESC FUNCTION boolean
+-- !query analysis
+DescribeFunctionCommand org.apache.spark.sql.catalyst.expressions.ExpressionInfo@55f3b40a, false
+
+
+-- !query
+DESC FUNCTION EXTENDED boolean
+-- !query analysis
+DescribeFunctionCommand org.apache.spark.sql.catalyst.expressions.ExpressionInfo@55f3b40a, true

Review Comment:
   This is done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144020709


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##########
@@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging {
    */
   protected def getNormalizedQueryAnalysisResult(
       session: SparkSession, sql: String): (String, Seq[String]) = {
+    // Note that creating the following DataFrame includes eager execution for commands that create
+    // objects such as views. Therefore any following queries that reference these objects should
+    // find them in the catalog.
     val df = session.sql(sql)
     val schema = df.schema.catalogString
-    // Get the answer, but also get rid of the #1234 expression IDs that show up in analyzer plans.
-    (schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+    val analyzed = df.queryExecution.analyzed
+    // Determine if the analyzed plan contains any nondeterministic expressions.
+    var deterministic = true
+    analyzed.transformAllExpressionsWithSubqueries {
+      case expr: CurrentDate =>
+        deterministic = false
+        expr
+      case expr: CurrentTimestampLike =>
+        deterministic = false
+        expr
+      case expr: CurrentUser =>
+        deterministic = false
+        expr
+      case expr: Literal if expr.dataType == DateType || expr.dataType == TimestampType =>
+        deterministic = false
+        expr
+      case expr if !expr.deterministic =>
+        deterministic = false
+        expr
+    }
+    if (deterministic) {
+      // Perform query analysis, but also get rid of the #1234 expression IDs that show up in the
+      // resolved plans.
+      (schema, Seq(replaceNotIncludedMsg(analyzed.toString)))
+    } else {
+      // The analyzed plan is nondeterministic so elide it from the result to keep tests reliable.
+      (schema, Seq("[Analyzer test output redacted due to nondeterminism]"))
+    }

Review Comment:
   Should this be configured? Otherwise every caller who calls this function will start to use non-deterministic codepath?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org