You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "panbingkun (via GitHub)" <gi...@apache.org> on 2023/12/27 03:05:32 UTC

[PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

panbingkun opened a new pull request, #44503:
URL: https://github.com/apache/spark/pull/44503

   ### What changes were proposed in this pull request?
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1436943860


##########
sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala:
##########
@@ -27,7 +27,7 @@ import org.apache.spark.unsafe.types.UTF8String
 /**
  * Object for grouping error messages from (most) exceptions thrown during query execution.
  * This does not include exceptions thrown during the eager execution of commands, which are
- * grouped into [[QueryCompilationErrors]].

Review Comment:
   From the perspective of module dependency, it seems to express the meaning of `CompilationErrors` in the `sql/api` module, instead of `QueryCompilationErrors` in the `sql/catalyst` module.
   
   Or we can write it as: `[[CompilationErrors]] and [[QueryCompilationErrors]]` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1436719503


##########
sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala:
##########
@@ -27,7 +27,7 @@ import org.apache.spark.unsafe.types.UTF8String
 /**
  * Object for grouping error messages from (most) exceptions thrown during query execution.
  * This does not include exceptions thrown during the eager execution of commands, which are
- * grouped into [[QueryCompilationErrors]].

Review Comment:
   In the `sql/api` module, `QueryCompilationErrors` has been renamed to CompilationErrors`, so we will fix it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1436948523


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala:
##########
@@ -569,8 +570,10 @@ abstract class ExternalCatalogSuite extends SparkFunSuite {
         // then be caught and converted to a RuntimeException with a descriptive message.
         case ex: RuntimeException if ex.getMessage.contains("MetaException") =>
           throw new AnalysisException(
-            errorClass = "_LEGACY_ERROR_TEMP_3066",
-            messageParameters = Map("msg" -> ex.getMessage))
+            errorClass = "_LEGACY_ERROR_TEMP_2193",

Review Comment:
   Refer to this:
   https://github.com/apache/spark/blob/b106f80521d609b280950f63ff670c1f39ec3cee/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L383-L384
   
   https://github.com/apache/spark/blob/b106f80521d609b280950f63ff670c1f39ec3cee/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L1644-L1651



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #44503:
URL: https://github.com/apache/spark/pull/44503#issuecomment-1869996244

   @MaxGekk 
   Additionally, I have found two error classes that only appear in UT. In my opinion, both of these error classes should be deleted. Is this appropriate?
   
   1._LEGACY_ERROR_TEMP_3066
   https://github.com/apache/spark/blob/5a5d3c70401f57d642542909f57ddb123afa56cf/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala#L572
   
   2._LEGACY_ERROR_TEMP_3078
   https://github.com/apache/spark/blob/5a5d3c70401f57d642542909f57ddb123afa56cf/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala#L163
   
   https://github.com/apache/spark/blob/5a5d3c70401f57d642542909f57ddb123afa56cf/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala#L2349


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1436945141


##########
sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala:
##########
@@ -27,7 +27,7 @@ import org.apache.spark.unsafe.types.UTF8String
 /**
  * Object for grouping error messages from (most) exceptions thrown during query execution.
  * This does not include exceptions thrown during the eager execution of commands, which are
- * grouped into [[QueryCompilationErrors]].

Review Comment:
   The most primitive PR is here: https://github.com/apache/spark/pull/41794/files#diff-1b8affb347590e3ccabe359b07215c72aead8108ca3a5a020b3cb71e639c117bR25



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1436946366


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala:
##########
@@ -2344,9 +2344,7 @@ class ParquetV2FilterSuite extends ParquetFilterSuite {
 
           checker(stripSparkFilter(query), expected)
 
-        case _ =>
-          throw new AnalysisException(
-            errorClass = "_LEGACY_ERROR_TEMP_3078", messageParameters = Map.empty)
+        case _ => assert(false, "Can not match ParquetTable in the query.")

Review Comment:
   Refer to this: https://github.com/apache/spark/blob/b106f80521d609b280950f63ff670c1f39ec3cee/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala#L73



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1437008913


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1480,12 +1474,6 @@
     },
     "sqlState" : "42K0B"
   },
-  "INCORRECT_END_OFFSET" : {

Review Comment:
   For some reasons there is still a reference to the error class:
   ```
   $ find . -type f  -print0|xargs -0 grep INCORRECT_END_OFFSET
   ./docs/sql-error-conditions-sqlstates.md:  <td><a href="arithmetic-overflow-error-class.md">ARITHMETIC_OVERFLOW</a>, <a href="sql-error-conditions.html#cast_overflow">CAST_OVERFLOW</a>, <a href="sql-error-conditions.html#cast_overflow_in_table_insert">CAST_OVERFLOW_IN_TABLE_INSERT</a>, <a href="sql-error-conditions.html#decimal_precision_exceeds_max_precision">DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION</a>, <a href="sql-error-conditions.html#invalid_index_of_zero">INVALID_INDEX_OF_ZERO</a>, <a href="sql-error-conditions.html#incorrect_end_offset">INCORRECT_END_OFFSET</a>, <a href="sql-error-conditions.html#incorrect_ramp_up_rate">INCORRECT_RAMP_UP_RATE</a>, <a href="invalid-array-index-error-class.md">INVALID_ARRAY_INDEX</a>, <a href="invalid-array-index-in-element-at-error-class.md">INVALID_ARRAY_INDEX_IN_ELEMENT_AT</a>, <a href="sql-error-conditions.html#numeric_out_of_supported_range">NUMERIC_OUT_OF_SUPPORTED_RANGE</a>, <a href="sql-error-conditions.html#numeric_value_out_of_range
 ">NUMERIC_VALUE_OUT_OF_RANGE</a>
   ```



##########
sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala:
##########
@@ -27,7 +27,7 @@ import org.apache.spark.unsafe.types.UTF8String
 /**
  * Object for grouping error messages from (most) exceptions thrown during query execution.
  * This does not include exceptions thrown during the eager execution of commands, which are
- * grouped into [[QueryCompilationErrors]].

Review Comment:
   ok, let's leave the ref to `CompilationErrors` only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #44503:
URL: https://github.com/apache/spark/pull/44503#issuecomment-1870925071

   +1, LGTM. Merging to master.
   Thank you, @panbingkun.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1436895851


##########
sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala:
##########
@@ -27,7 +27,7 @@ import org.apache.spark.unsafe.types.UTF8String
 /**
  * Object for grouping error messages from (most) exceptions thrown during query execution.
  * This does not include exceptions thrown during the eager execution of commands, which are
- * grouped into [[QueryCompilationErrors]].

Review Comment:
   We have both, AFAIK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1437160213


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1480,12 +1474,6 @@
     },
     "sqlState" : "42K0B"
   },
-  "INCORRECT_END_OFFSET" : {

Review Comment:
   Please, remove the reference to the deleted error class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #44503:
URL: https://github.com/apache/spark/pull/44503#issuecomment-1869915277

   cc @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #44503:
URL: https://github.com/apache/spark/pull/44503#issuecomment-1870000996

   > both of these error classes should be deleted
   
   If it is possible to delete them while preserving test logic, let's delete them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1437315924


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1480,12 +1474,6 @@
     },
     "sqlState" : "42K0B"
   },
-  "INCORRECT_END_OFFSET" : {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1436949989


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala:
##########
@@ -160,7 +160,9 @@ class QueryExecutionSuite extends SharedSparkSession {
 
     // Throw an AnalysisException - this should be captured.
     spark.experimental.extraStrategies = Seq[SparkStrategy](
-      (_: LogicalPlan) => throw new AnalysisException("_LEGACY_ERROR_TEMP_3078", Map.empty))
+      (_: LogicalPlan) => throw new AnalysisException(

Review Comment:
   The UT can actually handle any type of `AnalysisException`. I have chosen an error class (`UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY`) that is closer to semantics to replace it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44503:
URL: https://github.com/apache/spark/pull/44503#discussion_r1437314993


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1480,12 +1474,6 @@
     },
     "sqlState" : "42K0B"
   },
-  "INCORRECT_END_OFFSET" : {

Review Comment:
   Okay, let me delete it first, and then I'll try to investigate the root cause again to see if we can automate its discovery.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #44503:
URL: https://github.com/apache/spark/pull/44503#issuecomment-1869914156

   How to find `unused error classes` in `Spark code repo`
   1.Generate the names of `all error classes based` on `error classes. json`.
   2.Use a shell script to locate each of the above `error classes` in the `Spark code repo` and identify any unused error classes.
   3.Manually check again.
   
   During the search process above, some unused error classes were also discovered, as follows:
   - INTERNAL_ERROR_BROADCAST
   - INTERNAL_ERROR_MEMORY
   - INTERNAL_ERROR_STORAGE
   - INTERNAL_ERROR_SHUFFLE
   - INTERNAL_ERROR_EXECUTOR
   
   These error classes did not appear in the `Spark code repo`, but they are temporarily retained as they are internal errors.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #44503: [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file
URL: https://github.com/apache/spark/pull/44503


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org