You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2022/11/01 06:23:42 UTC
[spark] branch master updated: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e24d22e1c9a [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas
e24d22e1c9a is described below
commit e24d22e1c9afa8d2190d2ca44a16deae58e0fee8
Author: itholic <ha...@databricks.com>
AuthorDate: Tue Nov 1 15:23:27 2022 +0900
[SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas
### What changes were proposed in this pull request?
This PR proposes to make the manual tests for `DataFrame.corrwith` back into formal approach, if the pandas version is not 1.5.0.
### Why are the changes needed?
There was a regression introduced by pandas 1.5.0 (https://github.com/pandas-dev/pandas/issues/48826), and seems it's resolved now.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
The fixed test should pass the CI.
Closes #38455 from itholic/SPARK-40827.
Authored-by: itholic <ha...@databricks.com>
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
python/pyspark/pandas/tests/test_dataframe.py | 6 ++++--
python/pyspark/pandas/tests/test_ops_on_diff_frames.py | 10 ++++++----
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/python/pyspark/pandas/tests/test_dataframe.py b/python/pyspark/pandas/tests/test_dataframe.py
index b5466b467d8..4e80c680b6e 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -6091,10 +6091,12 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils):
def _test_corrwith(self, psdf, psobj):
pdf = psdf._to_pandas()
pobj = psobj._to_pandas()
- # Regression in pandas 1.5.0 when other is Series and method is "pearson" or "spearman"
+ # There was a regression in pandas 1.5.0
+ # when other is Series and method is "pearson" or "spearman", and fixed in pandas 1.5.1
+ # Therefore, we only test the pandas 1.5.0 in different way.
# See https://github.com/pandas-dev/pandas/issues/48826 for the reported issue,
# and https://github.com/pandas-dev/pandas/pull/46174 for the initial PR that causes.
- if LooseVersion(pd.__version__) >= LooseVersion("1.5.0") and isinstance(pobj, pd.Series):
+ if LooseVersion(pd.__version__) == LooseVersion("1.5.0") and isinstance(pobj, pd.Series):
methods = ["kendall"]
else:
methods = ["pearson", "spearman", "kendall"]
diff --git a/python/pyspark/pandas/tests/test_ops_on_diff_frames.py b/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
index ce1ffb34765..71c393dcf34 100644
--- a/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
+++ b/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
@@ -1866,12 +1866,13 @@ class OpsOnDiffFramesEnabledTest(PandasOnSparkTestCase, SQLTestUtils):
self._test_corrwith((df1 + 1), df2.B)
self._test_corrwith((df1 + 1), (df2.B + 2))
- # Regression in pandas 1.5.0
+ # There was a regression in pandas 1.5.0, and fixed in pandas 1.5.1.
+ # Therefore, we only test the pandas 1.5.0 in different way.
# See https://github.com/pandas-dev/pandas/issues/49141 for the reported issue,
# and https://github.com/pandas-dev/pandas/pull/46174 for the initial PR that causes.
df_bool = ps.DataFrame({"A": [True, True, False, False], "B": [True, False, False, True]})
ser_bool = ps.Series([True, True, False, True])
- if LooseVersion(pd.__version__) >= LooseVersion("1.5.0"):
+ if LooseVersion(pd.__version__) == LooseVersion("1.5.0"):
expected = ps.Series([0.5773502691896257, 0.5773502691896257], index=["B", "A"])
self.assert_eq(df_bool.corrwith(ser_bool), expected, almost=True)
else:
@@ -1883,10 +1884,11 @@ class OpsOnDiffFramesEnabledTest(PandasOnSparkTestCase, SQLTestUtils):
self._test_corrwith(self.psdf3, self.psdf4)
self._test_corrwith(self.psdf1, self.psdf1.a)
- # Regression in pandas 1.5.0
+ # There was a regression in pandas 1.5.0, and fixed in pandas 1.5.1.
+ # Therefore, we only test the pandas 1.5.0 in different way.
# See https://github.com/pandas-dev/pandas/issues/49141 for the reported issue,
# and https://github.com/pandas-dev/pandas/pull/46174 for the initial PR that causes.
- if LooseVersion(pd.__version__) >= LooseVersion("1.5.0"):
+ if LooseVersion(pd.__version__) == LooseVersion("1.5.0"):
expected = ps.Series([-0.08827348295047496, 0.4413674147523748], index=["b", "a"])
self.assert_eq(self.psdf1.corrwith(self.psdf2.b), expected, almost=True)
else:
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org