You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2021/10/27 05:54:39 UTC

[spark] branch master updated: [SPARK-36348][PYTHON][FOLLOWUP] Complete test_astype for index

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new c7d9bd2  [SPARK-36348][PYTHON][FOLLOWUP] Complete test_astype for index
c7d9bd2 is described below

commit c7d9bd2e70c29781678f7809b6848ba3c5bba4ea
Author: itholic <ha...@databricks.com>
AuthorDate: Wed Oct 27 14:53:51 2021 +0900

    [SPARK-36348][PYTHON][FOLLOWUP] Complete test_astype for index
    
    ### What changes were proposed in this pull request?
    
    This is follow-up for https://github.com/apache/spark/pull/34335.
    
    ### Why are the changes needed?
    
    The previous bug depends on the pandas version, not the Spark version.
    
    So the difference is still alive with pandas < 1.3.
    
    For example,
    
    ```python
    # Spark 3.2 with pandas 1.2.
    >>> pidx = pd.Index([10, 20, 15, 30, 45, None], name="x")
    >>> psidx = ps.Index(pidx)
    
    >>> pidx
    Index([10, 20, 15, 30, 45, None], dtype='object', name='x')
    >>> psidx
    Float64Index([10.0, 20.0, 15.0, 30.0, 45.0, nan], dtype='float64', name='x')
    
    >>> pidx.astype(str)
    Index(['10', '20', '15', '30', '45', 'None'], dtype='object', name='x')
    >>> psidx.astype(str)
    Index(['10.0', '20.0', '15.0', '30.0', '45.0', 'nan'], dtype='object', name='x')
    ```
    
    I think many people are still using pandas < 1.3, so maybe we'd better to separate the test for old version of pandas for now.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, it's test only
    
    ### How was this patch tested?
    
    Unittest
    
    Closes #34397 from itholic/SPARK-36348-followup.
    
    Authored-by: itholic <ha...@databricks.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/pandas/tests/indexes/test_base.py | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/pandas/tests/indexes/test_base.py b/python/pyspark/pandas/tests/indexes/test_base.py
index a7f19a7..e7e5216 100644
--- a/python/pyspark/pandas/tests/indexes/test_base.py
+++ b/python/pyspark/pandas/tests/indexes/test_base.py
@@ -2243,8 +2243,17 @@ class IndexesTest(PandasOnSparkTestCase, TestUtils):
 
         pidx = pd.Index([10, 20, 15, 30, 45, None], name="x")
         psidx = ps.Index(pidx)
-        self.assert_eq(psidx.astype(bool), pidx.astype(bool))
-        self.assert_eq(psidx.astype(str), pidx.astype(str))
+        if LooseVersion(pd.__version__) >= LooseVersion("1.3"):
+            self.assert_eq(psidx.astype(bool), pidx.astype(bool))
+            self.assert_eq(psidx.astype(str), pidx.astype(str))
+        else:
+            self.assert_eq(
+                psidx.astype(bool), ps.Index([True, True, True, True, True, True], name="x")
+            )
+            self.assert_eq(
+                psidx.astype(str),
+                ps.Index(["10.0", "20.0", "15.0", "30.0", "45.0", "nan"], name="x"),
+            )
 
         pidx = pd.Index(["hi", "hi ", " ", " \t", "", None], name="x")
         psidx = ps.Index(pidx)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org