You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "nchammas (via GitHub)" <gi...@apache.org> on 2024/01/26 15:44:40 UTC

[PR] [SPARK-46810][DOCS] Explain error class terminology in internal README [spark]

nchammas opened a new pull request, #44902:
URL: https://github.com/apache/spark/pull/44902

   ### What changes were proposed in this pull request?
   
   - Clarify the error class terminology in our internal README per the proposal in SPARK-46810.
   - Improve the formatting of the code snippets in the README.
   
   ### Why are the changes needed?
   
   We should use error class terminology consistently and unambiguously. This change gives developers a quick reference for what's called what.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   No testing needed.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2056955872

   oh, sounds good to do it later. Let's rebase this PR and then merge it ASAP.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "srielau (via GitHub)" <gi...@apache.org>.
srielau commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2029839134

   > @MaxGekk - Is there something I can do to move this along?
   > 
   > Once this is merged in, I am happy to work on [SPARK-47429](https://issues.apache.org/jira/browse/SPARK-47429), as I noted in the comments there. (But it's also fine if you'd prefer to work on it yourself.)
   
   @nchammas Not sure if we can count on a response from @MaxGekk right now. @cloud-fan How can we proceed here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2054360998

   The JSON field still has `"subClass"`, do we need to rename it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2057586475

   @cloud-fan - Ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1568713831


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1,77 +1,132 @@
-# Guidelines
+# Guidelines for Throwing User-Facing Errors
 
-To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE,
-and message parameters rather than an arbitrary error message.
+To throw a user-facing error or exception, developers should specify a standardized SQLSTATE, an error condition, and message parameters rather than an arbitrary error message.
+
+This guide will describe how to do this.
+
+## Error Hierarchy and Terminology
+
+The error hierarchy is as follows:
+1. Error state / SQLSTATE
+2. Error condition
+3. Error sub-condition
+
+The error state / SQLSTATE itself is comprised of two parts:
+1. Error class
+2. Error sub-class
+
+Acceptable values for these various error parts are defined in the following files:
+* `error-classes.json`
+* `error-states.json`
+* `error-conditions.json`
+
+The terms error class, state, and condition come from the SQL standard.
+
+### Illustrative Example
+* Error state / SQLSTATE: `42K01` (Class: `42`; Sub-class: `K01`)
+  * Error condition: `DATATYPE_MISSING_SIZE`
+  * Error condition: `INCOMPLETE_TYPE_DEFINITION`
+    * Error sub-condition: `ARRAY`
+    * Error sub-condition: `MAP`
+    * Error sub-condition: `STRUCT`
+* Error state / SQLSTATE: `42604` (Class: `42`; Sub-class: `604`)
+  * Error condition: `INVALID_ESCAPE_CHAR`
+  * Error condition: `AS_OF_JOIN`
+    * Error sub-condition: `TOLERANCE_IS_NON_NEGATIVE`
+    * Error sub-condition: `TOLERANCE_IS_UNFOLDABLE`
+
+### Inconsistent Use of the Term "Error Class"
+
+Unfortunately, we have historically used the term "error class" inconsistently to refer both to a proper error class like `42` and also to an error condition like `DATATYPE_MISSING_SIZE`.
+
+Fixing this will require renaming `SparkException.errorClass` to `SparkException.errorCondition` and making similar changes to `ErrorClassesJsonReader` and other parts of the codebase. We will address this in [SPARK-47429]. Until that is complete, we will have to live with the fact that a string like `DATATYPE_MISSING_SIZE` is called an "error condition" in our user-facing documentation but an "error class" in the code.
+
+For more details, please see [SPARK-46810][SPARK-46810].

Review Comment:
   Why do we need to write `[SPARK-46810]` twice here? Is it because of the special syntax for `markdown`? Or do we actually want to write: `[SPARK-46810]` `[SPARK-47429]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1568662387


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1,77 +1,132 @@
-# Guidelines
+# Guidelines for Throwing User-Facing Errors
 
-To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE,
-and message parameters rather than an arbitrary error message.
+To throw a user-facing error or exception, developers should specify a standardized SQLSTATE, an error condition, and message parameters rather than an arbitrary error message.
+
+This guide will describe how to do this.
+
+## Error Hierarchy and Terminology
+
+The error hierarchy is as follows:
+1. Error state / SQLSTATE
+2. Error condition
+3. Error sub-condition
+
+The error state / SQLSTATE itself is comprised of two parts:
+1. Error class
+2. Error sub-class
+
+Acceptable values for these various error parts are defined in the following files:
+* `error-classes.json`
+* `error-states.json`
+* `error-conditions.json`
+
+The terms error class, state, and condition come from the SQL standard.
+
+### Illustrative Example
+* Error state / SQLSTATE: `42K01` (Class: `42`; Sub-class: `K01`)
+  * Error condition: `DATATYPE_MISSING_SIZE`
+  * Error condition: `INCOMPLETE_TYPE_DEFINITION`
+    * Error sub-condition: `ARRAY`
+    * Error sub-condition: `MAP`
+    * Error sub-condition: `STRUCT`
+* Error state / SQLSTATE: `42604` (Class: `42`; Sub-class: `604`)
+  * Error condition: `INVALID_ESCAPE_CHAR`
+  * Error condition: `AS_OF_JOIN`
+    * Error sub-condition: `TOLERANCE_IS_NON_NEGATIVE`
+    * Error sub-condition: `TOLERANCE_IS_UNFOLDABLE`
+
+### Inconsistent Use of the Term "Error Class"
+
+Unfortunately, we have historically used the term "error class" inconsistently to refer both to a proper error class like `42` and also to an error condition like `DATATYPE_MISSING_SIZE`.
+
+Fixing this will require renaming `SparkException.errorClass` to `SparkException.errorCondition` and making similar changes to `ErrorClassesJsonReader` and other parts of the codebase. We will address this in [SPARK-47429]. Until that is complete, we will have to live with the fact that a string like `DATATYPE_MISSING_SIZE` is called an "error condition" in our user-facing documentation but an "error class" in the code.
+
+For more details, please see [SPARK-46810][SPARK-46810].

Review Comment:
   @nchammas 
   Why do we need to write `[SPARK-46810]` twice? Is this a special syntax for MD? Or should it be `[SPARK-47429]`?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1568665548


##########
core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala:
##########
@@ -125,23 +128,26 @@ class SparkThrowableSuite extends SparkFunSuite {
       s"Error classes without SQLSTATE: ${errorClassesNoSqlState.mkString(", ")}")
   }
 
-  test("Error category and error state / SQLSTATE invariants") {
-    val errorCategoriesJson = Utils.getSparkClassLoader.getResource("error/error-categories.json")
+  test("Error class and error state / SQLSTATE invariants") {
+    // Unlike in the rest of the codebase, the term "error class" is used here as it is in our
+    // documentation as well as in the SQL standard. We can remove this comment as part of this
+    // ticket: https://issues.apache.org/jira/browse/SPARK-47429
+    val errorClassesJson = Utils.getSparkClassLoader.getResource("error/error-classes.json")
     val errorStatesJson = Utils.getSparkClassLoader.getResource("error/error-states.json")
     val mapper = JsonMapper.builder()
       .addModule(DefaultScalaModule)
       .enable(STRICT_DUPLICATE_DETECTION)
       .build()
-    val errorCategories = mapper.readValue(
-      errorCategoriesJson, new TypeReference[Map[String, String]]() {})
+    val errorClasses = mapper.readValue(
+      errorClassesJson, new TypeReference[Map[String, String]]() {})
     val errorStates = mapper.readValue(
       errorStatesJson, new TypeReference[Map[String, ErrorStateInfo]]() {})
-    val errorClassStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState).toSet
+    val errorConditionStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState).toSet

Review Comment:
   Perhaps calling `errorStates` would better align with the the `file` name ?
   `error/error-classes.json` -> `errorClasses`
   `error/error-states.json` -> `errorStates`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-1996583071

   > Or would you like to break that into its own PR?
   
   Let's do that in a separate PR because I expect massive changes. It should be easier to review only one kind of changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1568662387


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1,77 +1,132 @@
-# Guidelines
+# Guidelines for Throwing User-Facing Errors
 
-To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE,
-and message parameters rather than an arbitrary error message.
+To throw a user-facing error or exception, developers should specify a standardized SQLSTATE, an error condition, and message parameters rather than an arbitrary error message.
+
+This guide will describe how to do this.
+
+## Error Hierarchy and Terminology
+
+The error hierarchy is as follows:
+1. Error state / SQLSTATE
+2. Error condition
+3. Error sub-condition
+
+The error state / SQLSTATE itself is comprised of two parts:
+1. Error class
+2. Error sub-class
+
+Acceptable values for these various error parts are defined in the following files:
+* `error-classes.json`
+* `error-states.json`
+* `error-conditions.json`
+
+The terms error class, state, and condition come from the SQL standard.
+
+### Illustrative Example
+* Error state / SQLSTATE: `42K01` (Class: `42`; Sub-class: `K01`)
+  * Error condition: `DATATYPE_MISSING_SIZE`
+  * Error condition: `INCOMPLETE_TYPE_DEFINITION`
+    * Error sub-condition: `ARRAY`
+    * Error sub-condition: `MAP`
+    * Error sub-condition: `STRUCT`
+* Error state / SQLSTATE: `42604` (Class: `42`; Sub-class: `604`)
+  * Error condition: `INVALID_ESCAPE_CHAR`
+  * Error condition: `AS_OF_JOIN`
+    * Error sub-condition: `TOLERANCE_IS_NON_NEGATIVE`
+    * Error sub-condition: `TOLERANCE_IS_UNFOLDABLE`
+
+### Inconsistent Use of the Term "Error Class"
+
+Unfortunately, we have historically used the term "error class" inconsistently to refer both to a proper error class like `42` and also to an error condition like `DATATYPE_MISSING_SIZE`.
+
+Fixing this will require renaming `SparkException.errorClass` to `SparkException.errorCondition` and making similar changes to `ErrorClassesJsonReader` and other parts of the codebase. We will address this in [SPARK-47429]. Until that is complete, we will have to live with the fact that a string like `DATATYPE_MISSING_SIZE` is called an "error condition" in our user-facing documentation but an "error class" in the code.
+
+For more details, please see [SPARK-46810][SPARK-46810].

Review Comment:
   @nchammas 
   Why do we need to write `[SPARK-46810]` twice? Is this a special syntax for MD? Or should it be `[SPARK-47429]`?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Explain error class terminology in internal README [spark]

Posted by "srielau (via GitHub)" <gi...@apache.org>.
srielau commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1468258626


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -96,21 +132,21 @@ The quality of the error message should match the
 ### SQLSTATE
 
 SQLSTATE is an mandatory portable error identifier across SQL engines.
-SQLSTATE comprises a 2-character class value followed by a 3-character subclass value.
+SQLSTATE comprises a 2-character category followed by a 3-character sub-category.

Review Comment:
   class and subclass are the terms used in the standard. But I'm not married to them....



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-1996210393

   @MaxGekk - Given the recent discussion on SPARK-46810, shall I expand this PR to include renaming `errorClass` across the codebase? Or would you like to break that into its own PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2028802766

   @MaxGekk - Is there something I can do to move this along?
   
   Once this is merged in, I am happy to work on SPARK-47429, as I noted in the comments there. (But it's also fine if you'd prefer to work on it yourself.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Explain error class terminology in internal README [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1468273439


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -96,21 +132,21 @@ The quality of the error message should match the
 ### SQLSTATE
 
 SQLSTATE is an mandatory portable error identifier across SQL engines.
-SQLSTATE comprises a 2-character class value followed by a 3-character subclass value.
+SQLSTATE comprises a 2-character category followed by a 3-character sub-category.

Review Comment:
   Oh. So the SQL standard calls `42` an "error class", and also calls `INCOMPLETE_TYPE_DEFINITION` an "error class"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Explain error class terminology in internal README [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1467816359


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -96,21 +132,21 @@ The quality of the error message should match the
 ### SQLSTATE
 
 SQLSTATE is an mandatory portable error identifier across SQL engines.
-SQLSTATE comprises a 2-character class value followed by a 3-character subclass value.
+SQLSTATE comprises a 2-character category followed by a 3-character sub-category.

Review Comment:
   Note the changes in this section per the proposal in [SPARK-46810](https://issues.apache.org/jira/browse/SPARK-46810).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2002204704

   I've created SPARK-47429 to track renaming `errorClass` to `errorCondition`.
   
   The build is passing, and this is ready for review, @MaxGekk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Explain error class terminology in internal README [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-1912277740

   cc @MaxGekk and @srielau 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2005111322

   Since `error-classes.json` is frequently updated, it will repeatedly cause merge conflicts here. I will refrain from resolving those merge conflicts until this PR is actively under review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1568665548


##########
core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala:
##########
@@ -125,23 +128,26 @@ class SparkThrowableSuite extends SparkFunSuite {
       s"Error classes without SQLSTATE: ${errorClassesNoSqlState.mkString(", ")}")
   }
 
-  test("Error category and error state / SQLSTATE invariants") {
-    val errorCategoriesJson = Utils.getSparkClassLoader.getResource("error/error-categories.json")
+  test("Error class and error state / SQLSTATE invariants") {
+    // Unlike in the rest of the codebase, the term "error class" is used here as it is in our
+    // documentation as well as in the SQL standard. We can remove this comment as part of this
+    // ticket: https://issues.apache.org/jira/browse/SPARK-47429
+    val errorClassesJson = Utils.getSparkClassLoader.getResource("error/error-classes.json")
     val errorStatesJson = Utils.getSparkClassLoader.getResource("error/error-states.json")
     val mapper = JsonMapper.builder()
       .addModule(DefaultScalaModule)
       .enable(STRICT_DUPLICATE_DETECTION)
       .build()
-    val errorCategories = mapper.readValue(
-      errorCategoriesJson, new TypeReference[Map[String, String]]() {})
+    val errorClasses = mapper.readValue(
+      errorClassesJson, new TypeReference[Map[String, String]]() {})
     val errorStates = mapper.readValue(
       errorStatesJson, new TypeReference[Map[String, ErrorStateInfo]]() {})
-    val errorClassStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState).toSet
+    val errorConditionStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState).toSet

Review Comment:
   Perhaps calling `errorStates` would better align with the the `file` name ?
   `error/error-classes.json` -> `errorClasses`
   `error/error-states.json` -> `errorStates`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #44902: [SPARK-46810][DOCS] Align error class terminology with SQL standard
URL: https://github.com/apache/spark/pull/44902


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Explain error class terminology in internal README [spark]

Posted by "srielau (via GitHub)" <gi...@apache.org>.
srielau commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1468693152


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -96,21 +132,21 @@ The quality of the error message should match the
 ### SQLSTATE
 
 SQLSTATE is an mandatory portable error identifier across SQL engines.
-SQLSTATE comprises a 2-character class value followed by a 3-character subclass value.
+SQLSTATE comprises a 2-character category followed by a 3-character sub-category.

Review Comment:
   @nchammas The relevant section is ISO/IEC 9075-2:2016(E) 24.1 SQLSTATE
   The character string value returned in an SQLSTATE parameter comprises a 2-character class code followed by a 3-character subclass code, each with an implementation-defined character set that has a one-octet character encoding form and is restricted to <digit>s and <simple Latin upper case letter>s. Table 38, “SQLSTATE class and subclass codes”, specifies the class code for each condition and the subclass code or codes for each class code.
   Class codes that begin with one of the <digit>s '0', '1', '2', '3', or '4' or one of the <simple Latin upper case letter>s 'A', 'B', 'C', 'D', 'E', 'F', 'G', or 'H' are returned only for conditions defined in ISO/IEC 9075 or in any other International Standard. The range of such class codes is called standard-defined classes. Some such class codes are reserved for use by specific International Standards, as specified elsewhere in this Clause. Subclass codes associated with such classes that also begin with one of those 13 characters are returned only for conditions defined in ISO/IEC 9075 or some other International Standard. The range of such subclass codes is called standard-defined subclasses. Subclass codes associated with such classes that begin with one of the <digit>s '5', '6', '7', '8', or '9' or one of the <simple Latin upper case letter>s 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', or 'Z' are reserved for implementation-defined co
 nditions and are called implementation- defined subclasses.
   Class codes that begin with one of the <digit>s '5', '6', '7', '8', or '9' or one of the <simple Latin upper case letter>s 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', or 'Z' are reserved for implementation-defined exception conditions and are called implementation-defined classes. All subclass codes except '000', which means no subclass, associated with such classes are reserved for implementation-defined conditions and are called implementation-defined subclasses. An implementation-defined completion condition shall be indicated by returning an implementation-defined subclass in conjunction with one of the classes successful completion, warning, or no data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2056915764

   > The JSON field still has `"subClass"`, do we need to rename it?
   
   I was planning to change that as part of SPARK-47429, since that affects the internal API and will likely require a large diff. If you agree with that, I will update the description for SPARK-47429 accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #44902:
URL: https://github.com/apache/spark/pull/44902#issuecomment-2058126794

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on code in PR #44902:
URL: https://github.com/apache/spark/pull/44902#discussion_r1568947251


##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1,77 +1,132 @@
-# Guidelines
+# Guidelines for Throwing User-Facing Errors
 
-To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE,
-and message parameters rather than an arbitrary error message.
+To throw a user-facing error or exception, developers should specify a standardized SQLSTATE, an error condition, and message parameters rather than an arbitrary error message.
+
+This guide will describe how to do this.
+
+## Error Hierarchy and Terminology
+
+The error hierarchy is as follows:
+1. Error state / SQLSTATE
+2. Error condition
+3. Error sub-condition
+
+The error state / SQLSTATE itself is comprised of two parts:
+1. Error class
+2. Error sub-class
+
+Acceptable values for these various error parts are defined in the following files:
+* `error-classes.json`
+* `error-states.json`
+* `error-conditions.json`
+
+The terms error class, state, and condition come from the SQL standard.
+
+### Illustrative Example
+* Error state / SQLSTATE: `42K01` (Class: `42`; Sub-class: `K01`)
+  * Error condition: `DATATYPE_MISSING_SIZE`
+  * Error condition: `INCOMPLETE_TYPE_DEFINITION`
+    * Error sub-condition: `ARRAY`
+    * Error sub-condition: `MAP`
+    * Error sub-condition: `STRUCT`
+* Error state / SQLSTATE: `42604` (Class: `42`; Sub-class: `604`)
+  * Error condition: `INVALID_ESCAPE_CHAR`
+  * Error condition: `AS_OF_JOIN`
+    * Error sub-condition: `TOLERANCE_IS_NON_NEGATIVE`
+    * Error sub-condition: `TOLERANCE_IS_UNFOLDABLE`
+
+### Inconsistent Use of the Term "Error Class"
+
+Unfortunately, we have historically used the term "error class" inconsistently to refer both to a proper error class like `42` and also to an error condition like `DATATYPE_MISSING_SIZE`.
+
+Fixing this will require renaming `SparkException.errorClass` to `SparkException.errorCondition` and making similar changes to `ErrorClassesJsonReader` and other parts of the codebase. We will address this in [SPARK-47429]. Until that is complete, we will have to live with the fact that a string like `DATATYPE_MISSING_SIZE` is called an "error condition" in our user-facing documentation but an "error class" in the code.
+
+For more details, please see [SPARK-46810][SPARK-46810].

Review Comment:
   Nope, I meant to link to SPARK-46810. The first `[SPARK-46810]` is the text I want to link, and the second `[SPARK-46810]` is the reference to the link URL I have defined just below in the same document:
   
   ```md
   [SPARK-46810]: https://issues.apache.org/jira/browse/SPARK-46810
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org