You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/01 21:15:47 UTC

[GitHub] [spark] ueshin commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

ueshin commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700579072



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       Actually the case @itholic raised is a bit tricky.
   
   pandas can return numeric type if there is no error.
   
   ```py
   >>> pd.to_numeric(pd.Series(["1", "2", "3"]), errors="ignore")
   0    1
   1    2
   2    3
   dtype: int64
   ```
   
   whereas the current implementation always returns `StringType`:
   
   ```py
   >>> ps.to_numeric(ps.Series(["1", "2", "3"]), errors="ignore")
   0    1
   1    2
   2    3
   dtype: object
   ```
   
   As Spark can't change the data type depending on whether there is an error or not, we have to check it by ourselves beforehand. (or just we don't support this?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org