You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ru...@apache.org on 2022/09/30 01:46:19 UTC
[spark] branch master updated: [SPARK-40589][PS][TEST] Fix test for `DataFrame.corr_with` skip the pandas regression
This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e617503c3f0 [SPARK-40589][PS][TEST] Fix test for `DataFrame.corr_with` skip the pandas regression
e617503c3f0 is described below
commit e617503c3f06be9eea0af529bab7984fc07e87a2
Author: itholic <ha...@databricks.com>
AuthorDate: Fri Sep 30 09:45:57 2022 +0800
[SPARK-40589][PS][TEST] Fix test for `DataFrame.corr_with` skip the pandas regression
### What changes were proposed in this pull request?
This PR proposes to skip the `DataFrame.corr_with` test when the `other` is `pyspark.pandas.Series` and the `method` is "spearman" or "pearson", since there is regression in pandas 1.5.0 for that cases.
### Why are the changes needed?
There are some regressions in pandas 1.5.0, so we're not going to match the behavior for those cases.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manually tested with pandas 1.5.0, confirmed the test pass.
Closes #38031 from itholic/SPARK-40589.
Authored-by: itholic <ha...@databricks.com>
Signed-off-by: Ruifeng Zheng <ru...@apache.org>
---
python/pyspark/pandas/tests/test_dataframe.py | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/python/pyspark/pandas/tests/test_dataframe.py b/python/pyspark/pandas/tests/test_dataframe.py
index 5da0974c906..dfac3c6d1b8 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -6076,7 +6076,14 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils):
def _test_corrwith(self, psdf, psobj):
pdf = psdf.to_pandas()
pobj = psobj.to_pandas()
- for method in ["pearson", "spearman", "kendall"]:
+ # Regression in pandas 1.5.0 when other is Series and method is "pearson" or "spearman"
+ # See https://github.com/pandas-dev/pandas/issues/48826 for the reported issue,
+ # and https://github.com/pandas-dev/pandas/pull/46174 for the initial PR that causes.
+ if LooseVersion(pd.__version__) >= LooseVersion("1.5.0") and isinstance(pobj, pd.Series):
+ methods = ["kendall"]
+ else:
+ methods = ["pearson", "spearman", "kendall"]
+ for method in methods:
for drop in [True, False]:
p_corr = pdf.corrwith(pobj, drop=drop, method=method)
ps_corr = psdf.corrwith(psobj, drop=drop, method=method)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org