You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2021/10/27 05:54:39 UTC
[spark] branch master updated: [SPARK-36348][PYTHON][FOLLOWUP]
Complete test_astype for index
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new c7d9bd2 [SPARK-36348][PYTHON][FOLLOWUP] Complete test_astype for index
c7d9bd2 is described below
commit c7d9bd2e70c29781678f7809b6848ba3c5bba4ea
Author: itholic <ha...@databricks.com>
AuthorDate: Wed Oct 27 14:53:51 2021 +0900
[SPARK-36348][PYTHON][FOLLOWUP] Complete test_astype for index
### What changes were proposed in this pull request?
This is follow-up for https://github.com/apache/spark/pull/34335.
### Why are the changes needed?
The previous bug depends on the pandas version, not the Spark version.
So the difference is still alive with pandas < 1.3.
For example,
```python
# Spark 3.2 with pandas 1.2.
>>> pidx = pd.Index([10, 20, 15, 30, 45, None], name="x")
>>> psidx = ps.Index(pidx)
>>> pidx
Index([10, 20, 15, 30, 45, None], dtype='object', name='x')
>>> psidx
Float64Index([10.0, 20.0, 15.0, 30.0, 45.0, nan], dtype='float64', name='x')
>>> pidx.astype(str)
Index(['10', '20', '15', '30', '45', 'None'], dtype='object', name='x')
>>> psidx.astype(str)
Index(['10.0', '20.0', '15.0', '30.0', '45.0', 'nan'], dtype='object', name='x')
```
I think many people are still using pandas < 1.3, so maybe we'd better to separate the test for old version of pandas for now.
### Does this PR introduce _any_ user-facing change?
No, it's test only
### How was this patch tested?
Unittest
Closes #34397 from itholic/SPARK-36348-followup.
Authored-by: itholic <ha...@databricks.com>
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
python/pyspark/pandas/tests/indexes/test_base.py | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/python/pyspark/pandas/tests/indexes/test_base.py b/python/pyspark/pandas/tests/indexes/test_base.py
index a7f19a7..e7e5216 100644
--- a/python/pyspark/pandas/tests/indexes/test_base.py
+++ b/python/pyspark/pandas/tests/indexes/test_base.py
@@ -2243,8 +2243,17 @@ class IndexesTest(PandasOnSparkTestCase, TestUtils):
pidx = pd.Index([10, 20, 15, 30, 45, None], name="x")
psidx = ps.Index(pidx)
- self.assert_eq(psidx.astype(bool), pidx.astype(bool))
- self.assert_eq(psidx.astype(str), pidx.astype(str))
+ if LooseVersion(pd.__version__) >= LooseVersion("1.3"):
+ self.assert_eq(psidx.astype(bool), pidx.astype(bool))
+ self.assert_eq(psidx.astype(str), pidx.astype(str))
+ else:
+ self.assert_eq(
+ psidx.astype(bool), ps.Index([True, True, True, True, True, True], name="x")
+ )
+ self.assert_eq(
+ psidx.astype(str),
+ ps.Index(["10.0", "20.0", "15.0", "30.0", "45.0", "nan"], name="x"),
+ )
pidx = pd.Index(["hi", "hi ", " ", " \t", "", None], name="x")
psidx = ps.Index(pidx)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org