You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ru...@apache.org on 2022/09/30 01:46:19 UTC
[spark] branch master updated: [SPARK-40589][PS][TEST] Fix test for `DataFrame.corr_with` skip the pandas regression

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new e617503c3f0 [SPARK-40589][PS][TEST] Fix test for `DataFrame.corr_with` skip the pandas regression
e617503c3f0 is described below

commit e617503c3f06be9eea0af529bab7984fc07e87a2
Author: itholic <ha...@databricks.com>
AuthorDate: Fri Sep 30 09:45:57 2022 +0800

    [SPARK-40589][PS][TEST] Fix test for `DataFrame.corr_with` skip the pandas regression
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to skip the `DataFrame.corr_with` test when the `other` is `pyspark.pandas.Series` and the `method` is "spearman" or "pearson", since there is regression in pandas 1.5.0 for that cases.
    
    ### Why are the changes needed?
    
    There are some regressions in pandas 1.5.0, so we're not going to match the behavior for those cases.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Manually tested with pandas 1.5.0, confirmed the test pass.
    
    Closes #38031 from itholic/SPARK-40589.
    
    Authored-by: itholic <ha...@databricks.com>
    Signed-off-by: Ruifeng Zheng <ru...@apache.org>
---
 python/pyspark/pandas/tests/test_dataframe.py | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/tests/test_dataframe.py b/python/pyspark/pandas/tests/test_dataframe.py
index 5da0974c906..dfac3c6d1b8 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -6076,7 +6076,14 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils):
     def _test_corrwith(self, psdf, psobj):
         pdf = psdf.to_pandas()
         pobj = psobj.to_pandas()
-        for method in ["pearson", "spearman", "kendall"]:
+        # Regression in pandas 1.5.0 when other is Series and method is "pearson" or "spearman"
+        # See https://github.com/pandas-dev/pandas/issues/48826 for the reported issue,
+        # and https://github.com/pandas-dev/pandas/pull/46174 for the initial PR that causes.
+        if LooseVersion(pd.__version__) >= LooseVersion("1.5.0") and isinstance(pobj, pd.Series):
+            methods = ["kendall"]
+        else:
+            methods = ["pearson", "spearman", "kendall"]
+        for method in methods:
             for drop in [True, False]:
                 p_corr = pdf.corrwith(pobj, drop=drop, method=method)
                 ps_corr = psdf.corrwith(psobj, drop=drop, method=method)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org