You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2021/07/27 03:08:15 UTC
[spark] branch branch-3.2 updated: [SPARK-36142][PYTHON] Follow Pandas when pow between fractional series with Na and bool literal

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new 139536c  [SPARK-36142][PYTHON] Follow Pandas when pow between fractional series with Na and bool literal
139536c is described below

commit 139536c3ed91c9bfd93e169feeec1fe9dd220aca
Author: Yikun Jiang <yi...@gmail.com>
AuthorDate: Tue Jul 27 12:06:05 2021 +0900

    [SPARK-36142][PYTHON] Follow Pandas when pow between fractional series with Na and bool literal
    
    ### What changes were proposed in this pull request?
    
    Set the result to 1 when the exp with 0(or False).
    
    ### Why are the changes needed?
    Currently, exponentiation between fractional series and bools is not consistent with pandas' behavior.
    ```
     >>> pser = pd.Series([1, 2, np.nan], dtype=float)
     >>> psser = ps.from_pandas(pser)
     >>> pser ** False
     0 1.0
     1 1.0
     2 1.0
     dtype: float64
     >>> psser ** False
     0 1.0
     1 1.0
     2 NaN
     dtype: float64
    ```
    We ought to adjust that.
    
    See more in [SPARK-36142](https://issues.apache.org/jira/browse/SPARK-36142)
    
    ### Does this PR introduce _any_ user-facing change?
    Yes, it introduces a user-facing change, resulting in a different result for pow between fractional Series with missing values and bool literal, the results follow pandas behavior.
    
    ### How was this patch tested?
    - Add test_pow_with_float_nan ut
    - Exsiting test in test_pow
    
    Closes #33521 from Yikun/SPARK-36142.
    
    Authored-by: Yikun Jiang <yi...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
    (cherry picked from commit d52c2de08b60930a129825d15e8f822c07e8bd31)
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/pandas/data_type_ops/num_ops.py            |  6 +++++-
 python/pyspark/pandas/tests/data_type_ops/test_num_ops.py | 12 ++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/data_type_ops/num_ops.py b/python/pyspark/pandas/data_type_ops/num_ops.py
index 8de46cd..43c3b01 100644
--- a/python/pyspark/pandas/data_type_ops/num_ops.py
+++ b/python/pyspark/pandas/data_type_ops/num_ops.py
@@ -79,7 +79,11 @@ class NumericOps(DataTypeOps):
             raise TypeError("Exponentiation can not be applied to given types.")
 
         def pow_func(left: Column, right: Any) -> Column:
-            return F.when(left == 1, left).otherwise(Column.__pow__(left, right))
+            return (
+                F.when(left == 1, left)
+                .when(SF.lit(right) == 0, 1)
+                .otherwise(Column.__pow__(left, right))
+            )
 
         right = transform_boolean_operand_to_numeric(right, spark_type=left.spark.data_type)
         return column_op(pow_func)(left, right)
diff --git a/python/pyspark/pandas/tests/data_type_ops/test_num_ops.py b/python/pyspark/pandas/tests/data_type_ops/test_num_ops.py
index fb2d2be..0dd3501 100644
--- a/python/pyspark/pandas/tests/data_type_ops/test_num_ops.py
+++ b/python/pyspark/pandas/tests/data_type_ops/test_num_ops.py
@@ -183,6 +183,18 @@ class NumOpsTest(PandasOnSparkTestCase, TestCasesUtils):
                 else:
                     self.assertRaises(TypeError, lambda: psser ** psdf[n_col])
 
+    # TODO(SPARK-36031): Merge test_pow_with_nan into test_pow
+    def test_pow_with_float_nan(self):
+        for col in self.numeric_w_nan_df_cols:
+            if col == "float_w_nan":
+                pser, psser = self.numeric_w_nan_pdf[col], self.numeric_w_nan_psdf[col]
+                self.assert_eq(pser ** pser, psser ** psser)
+                self.assert_eq(pser ** pser.astype(bool), psser ** psser.astype(bool))
+                self.assert_eq(pser ** True, psser ** True)
+                self.assert_eq(pser ** False, psser ** False)
+                self.assert_eq(pser ** 1, psser ** 1)
+                self.assert_eq(pser ** 0, psser ** 0)
+
     def test_radd(self):
         pdf, psdf = self.pdf, self.psdf
         for col in self.numeric_df_cols:

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org