You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/28 18:49:26 UTC
[GitHub] [spark] karenfeng commented on a change in pull request #34093: [SPARK-36294][SQL] Refactor fifth set of 20 query execution errors to use error classes

karenfeng commented on a change in pull request #34093:
URL: https://github.com/apache/spark/pull/34093#discussion_r717205325



##########
File path: R/pkg/tests/fulltests/test_sparkSQL.R
##########
@@ -3873,7 +3873,8 @@ test_that("Call DataFrameWriter.save() API in Java without path and check argume
   # It makes sure that we can omit path argument in write.df API and then it calls
   # DataFrameWriter.save() without path.
   expect_error(write.df(df, source = "csv"),
-              "Error in save : illegal argument - Expected exactly one path to be specified")
+              paste("Error in save : org.apache.spark.SparkIllegalArgumentException:",

Review comment:
       Hm... It may be more correct to keep the original behavior. Can you go here https://github.com/apache/spark/blob/e024bdc30620867943b4b926f703f6a5634f9322/R/pkg/R/utils.R#L836 and add another case for `SparkIllegalArgumentException`?

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {

Review comment:
       We can probably simplify this: CANNOT_CLEAR_SOME_DIRECTORY -> CANNOT_CLEAR_DIRECTORY

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -153,9 +215,17 @@
     "message" : [ "Unsupported literal type %s %s" ],
     "sqlState" : "0A000"
   },
+  "UNSUPPORTED_SAVE_MODE" : {
+    "message" : [ "unsupported save mode %s" ],
+    "sqlState" : "0A000"
+  },
   "UNSUPPORTED_SIMPLE_STRING_WITH_NODE_ID" : {
     "message" : [ "%s does not implement simpleStringWithNodeId" ]
   },
+  "UNSUPPORTED_STREAMED_OPERATOR_BY_DATASOURCE" : {

Review comment:
       DATASOURCE -> DATA_SOURCE

##########
File path: core/src/main/scala/org/apache/spark/SparkException.scala
##########
@@ -72,9 +72,11 @@ private[spark] case class ExecutorDeadException(message: String)
 /**
  * Exception thrown when Spark returns different result after upgrading to a new version.
  */
-private[spark] class SparkUpgradeException(version: String, message: String, cause: Throwable)
-  extends RuntimeException("You may get a different result due to the upgrading of Spark" +
-    s" $version: $message", cause)
+private[spark] class SparkUpgradeException(

Review comment:
       I'm not sure if it's safe to modify the existing constructor; can we overload it instead?
   
   

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved." ]
+  },
   "CANNOT_TERMINATE_GENERATOR" : {
     "message" : [ "Cannot terminate expression: %s" ]
   },
+  "CANNOT_UPGRADE_IN_READING_DATES" : {

Review comment:
       This is a little confusing - the issue is not that they cannot upgrade. Maybe `READING_AMBIGUOUS_DATES`?

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved." ]
+  },
   "CANNOT_TERMINATE_GENERATOR" : {
     "message" : [ "Cannot terminate expression: %s" ]
   },
+  "CANNOT_UPGRADE_IN_READING_DATES" : {
+    "message" : [ "You may get a different result due to the upgrading of Spark %s reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set the SQL config '%s' or the datasource option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during reading. To read the datetime values as it is, set the SQL config '%s' or the datasource option '%s' to 'CORRECTED'." ]
+  },
+  "CANNOT_UPGRADE_IN_WRITING_DATES" : {
+    "message" : [ "You may get a different result due to the upgrading of Spark %s writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into %s files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set %s to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during writing, to get maximum interoperability. Or set %s to 'CORRECTED' to write the datetime values as it is, if you are 100%% sure that the written files will only be read by Spark 3.0+ or other systems that use Proleptic Gregorian calendar." ]

Review comment:
       Missing colon here as well

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved." ]
+  },
   "CANNOT_TERMINATE_GENERATOR" : {
     "message" : [ "Cannot terminate expression: %s" ]
   },
+  "CANNOT_UPGRADE_IN_READING_DATES" : {
+    "message" : [ "You may get a different result due to the upgrading of Spark %s reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set the SQL config '%s' or the datasource option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during reading. To read the datetime values as it is, set the SQL config '%s' or the datasource option '%s' to 'CORRECTED'." ]
+  },
+  "CANNOT_UPGRADE_IN_WRITING_DATES" : {

Review comment:
       WRITING_AMBIGUOUS_DATES may be a better descriptor

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved." ]

Review comment:
       Rather than use a literal `\n` in the message, you can split up the message into separate elements in the array and they'll be joined with newlines later:
   ```
       "message" : [ "%s", "It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved." ]
   ```

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved." ]
+  },
   "CANNOT_TERMINATE_GENERATOR" : {
     "message" : [ "Cannot terminate expression: %s" ]
   },
+  "CANNOT_UPGRADE_IN_READING_DATES" : {
+    "message" : [ "You may get a different result due to the upgrading of Spark %s reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set the SQL config '%s' or the datasource option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during reading. To read the datetime values as it is, set the SQL config '%s' or the datasource option '%s' to 'CORRECTED'." ]

Review comment:
       There was a colon here before: `Spark %s: reading dates`

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
     "message" : [ "Found duplicate keys '%s'" ],
     "sqlState" : "23000"
   },
+  "END_OF_STREAM" : {
+    "message" : [ "End of stream" ]
+  },
+  "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+    "message" : [ "Failed to cast value `%s` to `%s` for partition column `%s`" ],
+    "sqlState" : "22023"
+  },
   "FAILED_EXECUTE_UDF" : {
     "message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
   },
+  "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+    "message" : [ "The fallback v1 relation reports inconsistent schema:\n Schema of v2 scan:     %s\nSchema of v1 relation: %s" ]

Review comment:
       Split strings into array elements

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library is compatible with Spark 2.0" ]

Review comment:
       I would consider this to be an unsupported operation with SQLSTATE 0A000

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option to drop a non-empty namespace." ]

Review comment:
       I would consider this a syntax error; can you add SQLSTATE 42000?

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
     "message" : [ "Found duplicate keys '%s'" ],
     "sqlState" : "23000"
   },
+  "END_OF_STREAM" : {
+    "message" : [ "End of stream" ]
+  },
+  "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+    "message" : [ "Failed to cast value `%s` to `%s` for partition column `%s`" ],
+    "sqlState" : "22023"
+  },
   "FAILED_EXECUTE_UDF" : {
     "message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
   },
+  "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+    "message" : [ "The fallback v1 relation reports inconsistent schema:\n Schema of v2 scan:     %s\nSchema of v1 relation: %s" ]
+  },
+  "FAILED_FIND_DATASOURCE" : {

Review comment:
       DATASOURCE -> DATA_SOURCE

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -110,6 +153,10 @@
     "message" : [ "Unknown static partition column: %s" ],
     "sqlState" : "42000"
   },
+  "MISSING_STREAMING_SOURCE_SCHEMA" : {
+    "message" : [ "Schema must be specified when creating a streaming source DataFrame. If some files already exist in the directory, then depending on the file format you may be able to create a static DataFrame on that directory with 'spark.read.load(directory)' and infer schema from it." ],
+    "sqlState" : "3F000"

Review comment:
       Is this the same schema as intended in the SQLSTATE? It seems like that "schema" is referring to schema objects. Maybe this should be 22023?

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
     "message" : [ "Found duplicate keys '%s'" ],
     "sqlState" : "23000"
   },
+  "END_OF_STREAM" : {
+    "message" : [ "End of stream" ]
+  },
+  "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+    "message" : [ "Failed to cast value `%s` to `%s` for partition column `%s`" ],
+    "sqlState" : "22023"
+  },
   "FAILED_EXECUTE_UDF" : {
     "message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
   },
+  "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+    "message" : [ "The fallback v1 relation reports inconsistent schema:\n Schema of v2 scan:     %s\nSchema of v1 relation: %s" ]
+  },
+  "FAILED_FIND_DATASOURCE" : {
+    "message" : [ "Failed to find data source: %s. Please find packages at http://spark.apache.org/third-party-projects.html" ]

Review comment:
       Would this have a SQLSTATE like 22023?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org