You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2024/03/11 08:12:24 UTC

[PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

itholic opened a new pull request, #45457:
URL: https://github.com/apache/spark/pull/45457

   
   
   ### What changes were proposed in this pull request?
   
   This PR proposes to introduce `_LEGACY_ERROR_UNKNOWN` for default error class when error class is not defined.
   
   ### Why are the changes needed?
   
   In Spark, when an `errorClass` is not explicitly defined for an exception, the method `getErrorClass` returns null so far.
   
   This behavior can lead to ambiguity and makes debugging more challenging by not providing a clear indication that the error class was not set.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No API changes, but the user-facing error message will contain `_LEGACY_ERROR_UNKNOWN` when error class is not specified.
   
   ### How was this patch tested?
   
   The existing CI should pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45457:
URL: https://github.com/apache/spark/pull/45457#discussion_r1538627497


##########
sql/core/src/test/resources/sql-tests/results/udtf/udtf.sql.out:
##########
@@ -681,98 +679,120 @@ SELECT * FROM InvalidEvalReturnsNoneToNonNullableColumnScalarType(TABLE(t2))
 -- !query schema
 struct<>
 -- !query output
-org.apache.spark.api.python.PythonException
-pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_EXEC_ERROR] User defined table function encountered an error in the 'eval' or 'terminate' method: Column 0 within a returned row had a value of None, either directly or within array/struct/map subfields, but the corresponding column type was declared as non-nullable; please update the UDTF to return a non-None value at this location or otherwise declare the column type as nullable.

Review Comment:
   The deleted error message seems reasonable. Do you know why it is replaced?



##########
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala:
##########
@@ -614,7 +614,7 @@ class PairRDDFunctionsSuite extends SparkFunSuite with SharedSparkContext {
     val e = intercept[SparkException] {
       pairs.saveAsNewAPIHadoopFile[NewFakeFormatWithCallback]("ignored")
     }
-    assert(e.getCause.getMessage contains "failed to write")
+    assert(e.getCause.getMessage contains "Task failed while writing rows")

Review Comment:
   how it happens that you have to change this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on PR #45457:
URL: https://github.com/apache/spark/pull/45457#issuecomment-2010187715

   LGTM once CI passed, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #45457:
URL: https://github.com/apache/spark/pull/45457#discussion_r1538638794


##########
sql/core/src/test/resources/sql-tests/results/udtf/udtf.sql.out:
##########
@@ -681,98 +679,120 @@ SELECT * FROM InvalidEvalReturnsNoneToNonNullableColumnScalarType(TABLE(t2))
 -- !query schema
 struct<>
 -- !query output
-org.apache.spark.api.python.PythonException
-pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_EXEC_ERROR] User defined table function encountered an error in the 'eval' or 'terminate' method: Column 0 within a returned row had a value of None, either directly or within array/struct/map subfields, but the corresponding column type was declared as non-nullable; please update the UDTF to return a non-None value at this location or otherwise declare the column type as nullable.

Review Comment:
   Yeah, I agree that this looks as bit weird.
   
   The reason is that the `UDTF_EXEC_ERROR` is defined from PySpark side, so technically it is `UNCLASSIFIED` from JVM logic as it is not defined from `error-classes.json`.
   
   But the existing error message still shows on user space, such as:
   
   org.apache.spark.SparkException: [UNCLASSIFIED] pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_EXEC_ERROR] User defined table function encountered an error in the 'eval' or 'terminate' method: Column 0 within a returned row had a value of None, either directly or within array/struct/map subfields, but the corresponding column type was declared as non-nullable; please update the UDTF to return a non-None value at this location or otherwise declare the column type as nullable.
   
   I'm not sure if it would be better to keep the existing error message for `PythonException` or mark it as `UNCLASSIFIED`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #45457:
URL: https://github.com/apache/spark/pull/45457#discussion_r1538642083


##########
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala:
##########
@@ -614,7 +614,7 @@ class PairRDDFunctionsSuite extends SparkFunSuite with SharedSparkContext {
     val e = intercept[SparkException] {
       pairs.saveAsNewAPIHadoopFile[NewFakeFormatWithCallback]("ignored")
     }
-    assert(e.getCause.getMessage contains "failed to write")
+    assert(e.getCause.getMessage contains "Task failed while writing rows")

Review Comment:
   That is also my question. I believe this error message should not been affected by current change, but CI keep complaining about this.
   
   So I modified it for testing purposes to see if this would really change the response of CI.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45457:
URL: https://github.com/apache/spark/pull/45457#discussion_r1522555720


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -7902,6 +7902,11 @@
       "Doesn't support month or year interval: <interval>"
     ]
   },
+  "_LEGACY_ERROR_UNKNOWN" : {

Review Comment:
   @itholic Why did you name it with the prefix `_LEGACY_`? Do you plan to eliminate it in the future?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45457:
URL: https://github.com/apache/spark/pull/45457#discussion_r1522991723


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -7902,6 +7902,11 @@
       "Doesn't support month or year interval: <interval>"
     ]
   },
+  "_LEGACY_ERROR_UNKNOWN" : {

Review Comment:
   > Because I thought that not having an error class assigned basically meant it was a LEGACY error
   
   I would say it is true. `SparkException` can still be raised w/ just a message since it is not fully ported on error classes. For instance:
   https://github.com/apache/spark/blob/e170252714e3662c7354321f78a3250114ea7e9e/common/utils/src/main/scala/org/apache/spark/util/SparkThreadUtils.scala#L51-L53



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #45457:
URL: https://github.com/apache/spark/pull/45457#issuecomment-2006373965

   Updated PR title & description. Let me take a look at the CI failure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #45457:
URL: https://github.com/apache/spark/pull/45457#discussion_r1524079365


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -7902,6 +7902,11 @@
       "Doesn't support month or year interval: <interval>"
     ]
   },
+  "_LEGACY_ERROR_UNKNOWN" : {

Review Comment:
   Sounds reasonable to me. Let me address it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #45457:
URL: https://github.com/apache/spark/pull/45457#discussion_r1522570109


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -7902,6 +7902,11 @@
       "Doesn't support month or year interval: <interval>"
     ]
   },
+  "_LEGACY_ERROR_UNKNOWN" : {

Review Comment:
   Because I thought that not having an error class assigned basically meant it was a LEGACY error, but I have not very strong opinion. Do you have any preference? Also cc @srielau FYI



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #45457:
URL: https://github.com/apache/spark/pull/45457#discussion_r1523001153


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -7902,6 +7902,11 @@
       "Doesn't support month or year interval: <interval>"
     ]
   },
+  "_LEGACY_ERROR_UNKNOWN" : {

Review Comment:
   Since we know the cases when the error class is not set, how about just name the error class like `UNCLASSIFIED`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #45457:
URL: https://github.com/apache/spark/pull/45457#issuecomment-2019480547

   Let me mark it as a draft for now, as I haven't been able to find a clear cause as to why the CI is complaining.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org