You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/01 02:45:01 UTC

[GitHub] [spark] itholic opened a new pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

itholic opened a new pull request #33882:
URL: https://github.com/apache/spark/pull/33882


   ### What changes were proposed in this pull request?
   
   This PR proposes to support `errors` argument for `ps.to_numeric` such as pandas does.
   
   <img width="429" alt="Screen Shot 2021-09-01 at 11 12 44 AM" src="https://user-images.githubusercontent.com/44108233/131600510-d846f8a1-e140-4ec3-a7b3-67ffe11b66f1.png">
   
   
   ### Why are the changes needed?
   
   We should match the behavior to pandas' as much as possible.
   
   Also in the [recent blog post](https://medium.com/@chuck.connell.3/pandas-on-databricks-via-koalas-a-review-9876b0a92541), the author pointed out we're missing this feature.
   
   Seems like it's the kind of feature that commonly used in data science.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Now the `errors` argument is available for `ps.to_numeric`.
   
   ### How was this patch tested?
   
   Unittests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #33882:
URL: https://github.com/apache/spark/pull/33882


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701025572



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,20 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):

Review comment:
       Okay with switching to `raise`. Let's change the docstring about default value then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912237263


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47455/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700698586



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       Yeah, we can don't implement it for now. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912262481


   **[Test build #142963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142963/testReport)** for PR 33882 at commit [`9387a40`](https://github.com/apache/spark/commit/9387a4010ed7ce4494a33a56372a9c8fb7caa800).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912241141






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700717358



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,23 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="coerce"):
     """
     Convert argument to a numeric type.
 
     Parameters
     ----------
     arg : scalar, list, tuple, 1-d array, or Series
+        Argument to be converted.
+    errors : {'ignore', 'raise', 'coerce'}, default 'coerce'
+        * If 'coerce', then invalid parsing will be set as NaN.
+        * If 'ignore', then invalid parsing will return the input.
+        * If 'raise', then invalid parsing will raise an exception.

Review comment:
       also udpate the docs, and remove the ones in `Notes` to here.

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,23 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="coerce"):
     """
     Convert argument to a numeric type.
 
     Parameters
     ----------
     arg : scalar, list, tuple, 1-d array, or Series
+        Argument to be converted.
+    errors : {'ignore', 'raise', 'coerce'}, default 'coerce'
+        * If 'coerce', then invalid parsing will be set as NaN.
+        * If 'ignore', then invalid parsing will return the input.
+        * If 'raise', then invalid parsing will raise an exception.

Review comment:
       also udpate the docs, and move the ones in `Notes` to here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701552455



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2797,7 +2803,7 @@ def to_numeric(arg):
     1    1.0
     2    2.0
     3   -3.0
-    dtype: float32
+    dtype: float64

Review comment:
       do you know why the type of result changes in this case? from a cursory look, it should be the same.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700749810



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,16 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            raise NotImplementedError("'raise' is not implemented yet, when the `arg` is Series.")

Review comment:
       Just implemented `raise`, but I'm not sure if it's intended way mentioned from https://github.com/apache/spark/pull/33882#discussion_r699819419




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912289963


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47463/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700997327



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,20 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):

Review comment:
       Is it okay although pandas use `raise` as default ??
   
   I worry about that maybe the existing pandas users want to `to_numeric` just raise an Exception rather than change the value to `NaN` by default ?? - I have not many context about in real world practice, though - 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700696024



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       I think we don't support this for `Series` for now, otherwise we should check all types in the `Series` and nullability.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912240407


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47455/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700717325



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,16 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            raise NotImplementedError("'raise' is not implemented yet, when the `arg` is Series.")

Review comment:
       Can you implement `raise` though?

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,23 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="coerce"):
     """
     Convert argument to a numeric type.
 
     Parameters
     ----------
     arg : scalar, list, tuple, 1-d array, or Series
+        Argument to be converted.
+    errors : {'ignore', 'raise', 'coerce'}, default 'coerce'
+        * If 'coerce', then invalid parsing will be set as NaN.
+        * If 'ignore', then invalid parsing will return the input.
+        * If 'raise', then invalid parsing will raise an exception.

Review comment:
       also udpate the docs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909847899


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142898/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701587349



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,20 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):
     """
     Convert argument to a numeric type.
 
     Parameters
     ----------
     arg : scalar, list, tuple, 1-d array, or Series
+        Argument to be converted.
+    errors : {'raise', 'coerce'}, default 'raise'
+        * If 'coerce', then invalid parsing will be set as NaN.
+        * If 'raise', then invalid parsing will raise an exception.
+        * If 'ignore', then invalid parsing will return the input.
+
+        .. note:: 'ignore' doesn't work for pandas-on-Spark Series.

Review comment:
       ```suggestion
           .. note:: 'ignore' doesn't work for pandas-on-Spark Series yet.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701587349



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,20 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):
     """
     Convert argument to a numeric type.
 
     Parameters
     ----------
     arg : scalar, list, tuple, 1-d array, or Series
+        Argument to be converted.
+    errors : {'raise', 'coerce'}, default 'raise'
+        * If 'coerce', then invalid parsing will be set as NaN.
+        * If 'raise', then invalid parsing will raise an exception.
+        * If 'ignore', then invalid parsing will return the input.
+
+        .. note:: 'ignore' doesn't work for pandas-on-Spark Series.

Review comment:
       ```suggestion
           .. note:: 'ignore' doesn't work yet when `arg` is pandas-on-Spark Series.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909847899






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699850331



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))
+        elif errors == "raise":
+            raise NotImplementedError("'raise' is not implemented yet, when the `arg` is Series.")

Review comment:
       Sounds good. Let me try




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909847681


   **[Test build #142898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142898/testReport)** for PR 33882 at commit [`73a5af7`](https://github.com/apache/spark/commit/73a5af74e13a96eee2eeff5b99c283716463b1c7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701579757



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2820,21 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            scol = arg.spark.column
+            scol_casted = scol.cast("float")
+            cond = F.when(
+                F.assert_true(~(scol.isNotNull() & scol_casted.isNull())).isNull(), scol_casted

Review comment:
       Thanks for the link!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909861981


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47401/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912285290


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142963/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700579072



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       Actually the case @itholic raised is a bit tricky.
   
   pandas can return numeric type if there is no error.
   
   ```py
   >>> pd.to_numeric(pd.Series(["1", "2", "3"]), errors="ignore")
   0    1
   1    2
   2    3
   dtype: int64
   ```
   
   whereas the current implementation always returns `StringType`:
   
   ```py
   >>> ps.to_numeric(ps.Series(["1", "2", "3"]), errors="ignore")
   0    1
   1    2
   2    3
   dtype: object
   ```
   
   As Spark can't change the data type depending on whether there is an error or not, we have to check it by ourselves beforehand. (or just we don't support this?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912285672


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47463/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912240597


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47450/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909835401


   **[Test build #142898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142898/testReport)** for PR 33882 at commit [`73a5af7`](https://github.com/apache/spark/commit/73a5af74e13a96eee2eeff5b99c283716463b1c7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700997327



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,20 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):

Review comment:
       Is it okay although pandas use `raise` as default ??
   
   I worry about that maybe the existing pandas users want to `to_numeric` just raise an Exception rather than change the value to `NaN` by default ??




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912231488






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699847004



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))

Review comment:
       I changed it to follow pandas' behavior below:
   
   ```python
   >>> pd.to_numeric(pd.Series(['1', '2', '3']))
   0    1
   1    2
   2    3
   dtype: int64
   
   >>> ps.to_numeric(ps.Series(['1', '2', '3']))
   0    1.0
   1    2.0
   2    3.0
   dtype: float32
   ```
   
   But I'll revert this change since we cannot handle the case below properly with this change.
   
   ```python
   >>> pd.to_numeric(pd.Series(['1.0', '2', '-3']))
   0    1.0
   1    2.0
   2   -3.0
   dtype: float64
   
   >>> ps.to_numeric(ps.Series(['1.0', '2', '-3']))
   0    1
   1    2
   2   -3
   dtype: int32
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699855265



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       can you try this:
   
   ```python
   pd.to_numeric(pd.Series([datetime.datetime(1970, 1, 2)]), errors="ignore").to_list()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912200423


   **[Test build #142949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142949/testReport)** for PR 33882 at commit [`0cd7fe1`](https://github.com/apache/spark/commit/0cd7fe1e41d4a7a9e034ae53029ac74debfe13bd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909835401


   **[Test build #142898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142898/testReport)** for PR 33882 at commit [`73a5af7`](https://github.com/apache/spark/commit/73a5af74e13a96eee2eeff5b99c283716463b1c7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701579977



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2797,7 +2803,7 @@ def to_numeric(arg):
     1    1.0
     2    2.0
     3   -3.0
-    dtype: float32
+    dtype: float64

Review comment:
       You're right. It should be `float32`.
   
   I changed it from previous fix, and didn't reverted mistakenly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909835401


   **[Test build #142898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142898/testReport)** for PR 33882 at commit [`73a5af7`](https://github.com/apache/spark/commit/73a5af74e13a96eee2eeff5b99c283716463b1c7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912221439


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142949/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700777772



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2821,26 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            scol = arg.spark.column
+            scol_casted = scol.cast("float")
+            cond = scol.isNotNull() & scol_casted.isNull()
+            # Filter out if there are data that satisfy the condition.
+            sdf = arg._internal.spark_frame.select(scol).filter(cond)
+            head_sdf = sdf.head(1)

Review comment:
       Can we avoid launching a job here? e.g., `assert_true(col.isNotNull) & assert_true(casted_col.isNotNull)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912231287


   **[Test build #142955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142955/testReport)** for PR 33882 at commit [`e3faf4a`](https://github.com/apache/spark/commit/e3faf4a1a57bb13a3bad6f245c2ba95e3f21b28a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912218351


   @itholic don't forgot to update Pr title and description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701532586



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2821,26 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            scol = arg.spark.column
+            scol_casted = scol.cast("float")
+            cond = scol.isNotNull() & scol_casted.isNull()
+            # Filter out if there are data that satisfy the condition.
+            sdf = arg._internal.spark_frame.select(scol).filter(cond)
+            head_sdf = sdf.head(1)

Review comment:
       I think it's not supported the way implementing `assert` with non-boolean objects ??
   
   Maybe because the Spark Column `col.isNotNull()` itself is non-boolean object, so the `assert` way is not supported?
   
   ```python
   >>> assert(scol.isNotNull())
   Traceback (most recent call last):
   ...
   ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700776124



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,20 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):

Review comment:
       can we change the default to `coerce`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909847899


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142898/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700591483



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))
+        elif errors == "raise":
+            raise NotImplementedError("'raise' is not implemented yet, when the `arg` is Series.")

Review comment:
       Btw, we should check the nullability of the original value as well; otherwise it also raises in the case `None` -> numeric.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912222343


   **[Test build #142955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142955/testReport)** for PR 33882 at commit [`e3faf4a`](https://github.com/apache/spark/commit/e3faf4a1a57bb13a3bad6f245c2ba95e3f21b28a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699817733



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))

Review comment:
       why should we change the type?

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       If `scol` is a string column, the output type can be a string column:
   
   ```python
   >>> from pyspark.sql import functions as F
   >>> scol = F.col("a")
   >>> casted_scol = scol.cast("int")
   >>> df = sql("SELECT 'a' as a")
   >>> df.select(F.when(casted_scol.isNull(), scol).otherwise(casted_scol)).printSchema()
   root
    |-- CASE WHEN (CAST(a AS INT) IS NULL) THEN a ELSE CAST(a AS INT) END: string (nullable = true)
   ```
   
   is this correct type?

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))
+        elif errors == "raise":
+            raise NotImplementedError("'raise' is not implemented yet, when the `arg` is Series.")

Review comment:
       Let's implement this case by using `assert_true` expression. e.g.) `assert_true(casted_col.isNotNull())`

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       when errors is `ignore`, it should return the input as is per the documentation. The problem is the type coercion will happen via Spark. For example, it throws an exception if you call `pd.to_numeric` with `datetime.datetime`s whereas pandas returns the input as is.

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       can you try this:
   
   ```python
   pd.to_numeric(pd.Series([datetime.datetime(1970, 1, 2)]), errors="ignore").to_list()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701552078



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,19 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):
     """
     Convert argument to a numeric type.
 
     Parameters
     ----------
     arg : scalar, list, tuple, 1-d array, or Series
+        Argument to be converted.
+    errors : {'raise', 'coerce'}, default 'raise'
+        * If 'coerce', then invalid parsing will be set as NaN.
+        * If 'raise', then invalid parsing will raise an exception.

Review comment:
       sorry last comment. Seems like it's going to work for other types but not for pandas-on-Spark series. Can we add `ignrore` here with the note that it doesn't work for pandas-on-Spark series?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909835401






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701532586



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2821,26 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            scol = arg.spark.column
+            scol_casted = scol.cast("float")
+            cond = scol.isNotNull() & scol_casted.isNull()
+            # Filter out if there are data that satisfy the condition.
+            sdf = arg._internal.spark_frame.select(scol).filter(cond)
+            head_sdf = sdf.head(1)

Review comment:
       I think it's not supported the way implementing `assert` with non-boolean objects ??
   
   I think because the Spark Column `col.isNotNull()` itself is non-boolean object, so the `assert` way is not supported?
   
   ```python
   >>> assert(scol.isNotNull())
   Traceback (most recent call last):
   ...
   ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
   ```

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2821,26 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            scol = arg.spark.column
+            scol_casted = scol.cast("float")
+            cond = scol.isNotNull() & scol_casted.isNull()
+            # Filter out if there are data that satisfy the condition.
+            sdf = arg._internal.spark_frame.select(scol).filter(cond)
+            head_sdf = sdf.head(1)

Review comment:
       I think it's not supported the way implementing `assert` with non-boolean objects ??
   
   Maybe because the Spark Column `col.isNotNull()` itself is non-boolean object, so the `assert` way is not supported?
   
   ```python
   >>> assert(scol.isNotNull())
   Traceback (most recent call last):
   ...
   ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909854297


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47401/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912222343


   **[Test build #142955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142955/testReport)** for PR 33882 at commit [`e3faf4a`](https://github.com/apache/spark/commit/e3faf4a1a57bb13a3bad6f245c2ba95e3f21b28a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699849824



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       Yeah I think so.
   
   If the `errors` is "ignore", then the original data is returned as is:
   
   ```python
   >>> pd.to_numeric(pd.Series(["1", "2", "hello"]), errors="ignore")
   0        1
   1        2
   2    hello
   dtype: object
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700776767



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,20 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):
     """
     Convert argument to a numeric type.
 
     Parameters
     ----------
     arg : scalar, list, tuple, 1-d array, or Series
+        Argument to be converted.
+    errors : {'ignore', 'raise', 'coerce'}, default 'coerce'
+        Note that 'ignore' are not supported yet when the `arg` is Series.
+
+        * If 'coerce', then invalid parsing will be set as NaN.
+        * If 'ignore', then invalid parsing will return the input.
+        * If 'raise', then invalid parsing will raise an exception.

Review comment:
       ```suggestion
       errors : {'raise', 'coerce'}, default 'coerce'
           * If 'coerce', then invalid parsing will be set as NaN.
           * If 'raise', then invalid parsing will raise an exception.
           
           .. note:: pandas support 'ignore' but this is not implemented yet.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912200423


   **[Test build #142949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142949/testReport)** for PR 33882 at commit [`0cd7fe1`](https://github.com/apache/spark/commit/0cd7fe1e41d4a7a9e034ae53029ac74debfe13bd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699854668



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       when errors is `ignore`, it should return the input as is per the documentation. The problem is the type coercion will happen via Spark. For example, it throws an exception if you call `pd.to_numeric` with `datetime.datetime`s whereas pandas returns the input as is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701550997



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2820,21 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            scol = arg.spark.column
+            scol_casted = scol.cast("float")
+            cond = F.when(
+                F.assert_true(~(scol.isNotNull() & scol_casted.isNull())).isNull(), scol_casted

Review comment:
       Can we simplify 
   
   ```
   ~(scol.isNotNull() & scol_casted.isNull())
   ```
   
   to
   
   ```
   scol.isNull() | casted_scol.isNonNull()
   ```
   
   ? See also https://www.dcode.fr/boolean-expressions-calculator




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700749810



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,16 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            raise NotImplementedError("'raise' is not implemented yet, when the `arg` is Series.")

Review comment:
       Just implemented `raise`, but I'm not sure if it's intended solution mentioned from https://github.com/apache/spark/pull/33882#discussion_r699819419




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909861981


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47401/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912276878


   **[Test build #142963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142963/testReport)** for PR 33882 at commit [`9387a40`](https://github.com/apache/spark/commit/9387a4010ed7ce4494a33a56372a9c8fb7caa800).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701550997



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2820,21 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("float"))
+        elif errors == "raise":
+            scol = arg.spark.column
+            scol_casted = scol.cast("float")
+            cond = F.when(
+                F.assert_true(~(scol.isNotNull() & scol_casted.isNull())).isNull(), scol_casted

Review comment:
       Can we simplify 
   
   ```
   ~(scol.isNotNull() & scol_casted.isNull())
   ```
   
   to
   
   ```
   scol.isNull() || casted_scol.isNonNull()
   ```
   
   ? See also https://www.dcode.fr/boolean-expressions-calculator




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699819031



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       If `scol` is a string column, the output type can be a string column:
   
   ```python
   >>> from pyspark.sql import functions as F
   >>> scol = F.col("a")
   >>> casted_scol = scol.cast("int")
   >>> df = sql("SELECT 'a' as a")
   >>> df.select(F.when(casted_scol.isNull(), scol).otherwise(casted_scol)).printSchema()
   root
    |-- CASE WHEN (CAST(a AS INT) IS NULL) THEN a ELSE CAST(a AS INT) END: string (nullable = true)
   ```
   
   is this correct type?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912285290


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142963/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699819419



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))
+        elif errors == "raise":
+            raise NotImplementedError("'raise' is not implemented yet, when the `arg` is Series.")

Review comment:
       Let's implement this case by using `assert_true` expression. e.g.) `assert_true(casted_col.isNotNull())`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912211312


   **[Test build #142949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142949/testReport)** for PR 33882 at commit [`0cd7fe1`](https://github.com/apache/spark/commit/0cd7fe1e41d4a7a9e034ae53029ac74debfe13bd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912220959


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47450/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699817733



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))

Review comment:
       why should we change the type?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912221439


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142949/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912289994


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47463/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912262481


   **[Test build #142963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142963/testReport)** for PR 33882 at commit [`9387a40`](https://github.com/apache/spark/commit/9387a4010ed7ce4494a33a56372a9c8fb7caa800).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r701024810



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2747,13 +2747,20 @@ def merge(
 
 
 @no_type_check
-def to_numeric(arg):
+def to_numeric(arg, errors="raise"):

Review comment:
       Let's make it `coerce` by default and add a note.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912505477


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699849824



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       Yeah I think so.
   
   If the `errors` is "ignore", then the original data is returned as is (if it's unable to cast to numeric):
   
   ```python
   >>> pd.to_numeric(pd.Series(["1", "2", "hello"]), errors="ignore")
   0        1
   1        2
   2    hello
   dtype: object
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r700696024



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       I think it's better to do not support this for `Series` for now, otherwise we should check all types in the `Series` and nullability.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #33882:
URL: https://github.com/apache/spark/pull/33882#discussion_r699847004



##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))

Review comment:
       I changed it to follow pandas' behavior below:
   
   ```python
   >>> pd.to_numeric(pd.Series(['1', '2', '3']))
   0    1
   1    2
   2    3
   dtype: int64
   
   >>> ps.to_numeric(ps.Series(['1', '2', '3']))
   0    1.0
   1    2.0
   2    3.0
   dtype: float32
   ```
   
   But I'll revert this change since we cannot handle the case below properly with this change.
   
   ```python
   >>> pd.to_numeric(pd.Series(['1.0', '2', '-3']))
   0    1.0
   1    2.0
   2   -3.0
   dtype: float64
   
   >>> ps.to_numeric(ps.Series(['1.0', '2', '-3']))
   0    1
   1    2
   2   -3
   dtype: int32
   ```

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       Yeah I think so.
   
   If the `errors` is "ignore", then the original data is returned as is:
   
   ```python
   >>> pd.to_numeric(pd.Series(["1", "2", "hello"]), errors="ignore")
   0        1
   1        2
   2    hello
   dtype: object
   ```

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))

Review comment:
       Yeah I think so.
   
   If the `errors` is "ignore", then the original data is returned as is (if it's unable to cast to numeric):
   
   ```python
   >>> pd.to_numeric(pd.Series(["1", "2", "hello"]), errors="ignore")
   0        1
   1        2
   2    hello
   dtype: object
   ```

##########
File path: python/pyspark/pandas/namespace.py
##########
@@ -2814,9 +2824,18 @@ def to_numeric(arg):
     1.0
     """
     if isinstance(arg, Series):
-        return arg._with_new_scol(arg.spark.column.cast("float"))
+        if errors == "coerce":
+            return arg._with_new_scol(arg.spark.column.cast("int"))
+        elif errors == "ignore":
+            scol = arg.spark.column
+            casted_scol = scol.cast("int")
+            return arg._with_new_scol(F.when(casted_scol.isNull(), scol).otherwise(casted_scol))
+        elif errors == "raise":
+            raise NotImplementedError("'raise' is not implemented yet, when the `arg` is Series.")

Review comment:
       Sounds good. Let me try




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-909847899






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912231488


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142955/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33882: [SPARK-36609][PYTHON] Add `errors` argument for `ps.to_numeric`.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33882:
URL: https://github.com/apache/spark/pull/33882#issuecomment-912289994


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47463/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org