You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "amaliujia (via GitHub)" <gi...@apache.org> on 2023/03/21 21:45:30 UTC

[GitHub] [spark] amaliujia commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

amaliujia commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144020709


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##########
@@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging {
    */
   protected def getNormalizedQueryAnalysisResult(
       session: SparkSession, sql: String): (String, Seq[String]) = {
+    // Note that creating the following DataFrame includes eager execution for commands that create
+    // objects such as views. Therefore any following queries that reference these objects should
+    // find them in the catalog.
     val df = session.sql(sql)
     val schema = df.schema.catalogString
-    // Get the answer, but also get rid of the #1234 expression IDs that show up in analyzer plans.
-    (schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+    val analyzed = df.queryExecution.analyzed
+    // Determine if the analyzed plan contains any nondeterministic expressions.
+    var deterministic = true
+    analyzed.transformAllExpressionsWithSubqueries {
+      case expr: CurrentDate =>
+        deterministic = false
+        expr
+      case expr: CurrentTimestampLike =>
+        deterministic = false
+        expr
+      case expr: CurrentUser =>
+        deterministic = false
+        expr
+      case expr: Literal if expr.dataType == DateType || expr.dataType == TimestampType =>
+        deterministic = false
+        expr
+      case expr if !expr.deterministic =>
+        deterministic = false
+        expr
+    }
+    if (deterministic) {
+      // Perform query analysis, but also get rid of the #1234 expression IDs that show up in the
+      // resolved plans.
+      (schema, Seq(replaceNotIncludedMsg(analyzed.toString)))
+    } else {
+      // The analyzed plan is nondeterministic so elide it from the result to keep tests reliable.
+      (schema, Seq("[Analyzer test output redacted due to nondeterminism]"))
+    }

Review Comment:
   Should this be configured? Otherwise every caller who calls this function will start to use non-deterministic codepath?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org