You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/26 09:07:07 UTC

[GitHub] [spark] itholic commented on pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

itholic commented on PR #39137:
URL: https://github.com/apache/spark/pull/39137#issuecomment-1365018902

   Thanks @grundprinzip for the review.
   I agree that your comments and feel it's pretty reasonable.
   
   Actually, I once submitted a PR that implemented the framework on PySpark-side (https://github.com/apache/spark/pull/39128) that has no dependency with JVM.
   
   But I closed the previous one and re-open this PR for following reason:
   1. I worried that maybe it would not be easy to maintenance when the rules on one side (PySpark vs JVM) were arbitrarily changed in the future. So, I wanted to manage all errors in a single error class file(error-class.json) across the entire Apache Spark project to reduce the management cost.
   2. I thought I might see an advantage in that we can simply reuse the existing error class as it is without adding a new one when there is a similar error already defined on the JVM side in the future.
   3. Like the functions in `functions.py` , most of PySpark's functions leverage the JVM's logic, so it is assumed that the JVM will run at least once. So I thought that calling the error implemented in is acceptable for the expected overhead.
   
   But regardless of these reasons, I think all of your comments also are pretty reasonable.
   
   So, could you take a roughly look at the changes of the [previous PR](https://github.com/apache/spark/pull/39128) when you find some time??
   
   If the approach of the previous PR which implements separate logic on the PySpark side without relying on the JVM feels more reasonable for you, let me consider the overall design again.
   
   also cc @HyukjinKwon FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org