You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/26 11:14:15 UTC

[GitHub] [spark] grundprinzip commented on pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

grundprinzip commented on PR #39137:
URL: https://github.com/apache/spark/pull/39137#issuecomment-1365099279

   > * I worried that maybe it would not be easy to maintenance when the rules on one side (PySpark vs JVM) were arbitrarily changed in the future. So, I wanted to manage all errors in a single error class file(error-class.json) across the entire Apache Spark project to reduce the management cost.
   
   I think it's fine to reuse the `error-class.json` file. Let's figure out a way to bundle it as part of the PySpark build. If we can't do that, let's add a `python-error-class.json` and verify its integrity in `SparkThrowableSuite` the same way as we do for the `error-class.json`. 
   
   > * I thought I might see an advantage in that we can simply reuse the existing error class as it is without adding a new one when there is a similar error already defined on the JVM side in the future.
   
   I'm not sure how much of a benefit that we get here. For all server side exceptions they're thrown as SparkExceptions anyways and will be properly handled. This is where the `CapturedException` comes in. For client side exceptions, we should use the conceptually same approach but in a language idiomatic way. 
   
   > * Like the functions in `functions.py` , most of PySpark's functions leverage the JVM's logic, so it is assumed that the JVM will run at least once. So I thought that calling the error implemented in is acceptable for the expected overhead.
   
   The functions that are handled in this PR are purely client side and should not require the vm traverse. 
   
   
   Since the primary reason for `SparkThrowableHelper` is to provide a lookup in a JSON file and map parameters correctly, I feel that this behavior should be replicated in all of the clients (PySpark, PySpark with SparkConnect, Scala for Spark Connect etc) but follow the message format.
   
   @HyukjinKwon what is your opinion?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org