You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "FurcyPin (via GitHub)" <gi...@apache.org> on 2023/03/07 09:08:47 UTC

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

FurcyPin commented on code in PR #40271:
URL: https://github.com/apache/spark/pull/40271#discussion_r1127557744


##########
python/pyspark/sql/tests/test_functions.py:
##########
@@ -1268,6 +1268,12 @@ def test_bucket(self):
             message_parameters={"arg_name": "numBuckets", "arg_type": "str"},
         )
 
+    def test_no_cast(self):

Review Comment:
   From what I could test, the current behavior when using `typing.cast` is that it has no effect. The SQL cast does not happen, obviously, but no exception is raised either.
   Removing `typing.cast` will break jobs that are currently using it. This can be seen as a regression, but since it breaks a buggy behaviour, it could also be seen as a bugfix ?
   
   I see 4 possible alternatives here:
   1. Remove `typing.cast` (the current commit)
   2. Add a Python version of `functions.cast` with the right behaviour that uses `Column.cast`
   3. Replace the `typing.cast` call with a dummy `functions.cast` that logs a warning
   4. Add `functions.cast` to the Scala/Java API then the Python one (and probably R too ?)
   
   All options have their upsides and downsides:
   
   1. Already implemented, consistent with Scala API, but will crash jobs that mistakenly used `functions.cast`
   2. Simple to implement, inconsistent with Scala API, might silently change the behaviour of jobs that mistakenly used `functions.cast` (because the behaviour will change from "no effect" to "actual cast")
   3. Simple to implement, consistent with Scala API, will not crash jobs, only issue warnings (that users might or might not see)
   4. Longest to implement, consistent with Scala API, might silently change the behaviour of jobs that mistakenly used `functions.cast` (because the behaviour will change from "no effect" to "actual cast")
   
   Personally, I think 1. is okay, but it's your choice.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org