You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ru...@apache.org on 2023/09/27 07:06:57 UTC
[spark] branch master updated: [SPARK-45308][PS][TESTS] Enable `GroupbySplitApplyTests.test_split_apply_combine_on_series` for pandas 2.0.0

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 0b057c807f1 [SPARK-45308][PS][TESTS] Enable `GroupbySplitApplyTests.test_split_apply_combine_on_series` for pandas 2.0.0
0b057c807f1 is described below

commit 0b057c807f193ef9d09bb973411b06bf33438987
Author: Haejoon Lee <ha...@databricks.com>
AuthorDate: Wed Sep 27 15:06:41 2023 +0800

    [SPARK-45308][PS][TESTS] Enable `GroupbySplitApplyTests.test_split_apply_combine_on_series` for pandas 2.0.0
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to enable `GroupbySplitApplyTests.test_split_apply_combine_on_series`.
    
    ### Why are the changes needed?
    
    Similar to https://github.com/apache/spark/pull/43002, this test is skipped since Pandas 2.0.0 upgrade, but the root cause of the test failure is classified as regression from Pandas. So we can manually make the test pass for now and will update the test when Pandas regression is resolved.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, it's test-only.
    
    ### How was this patch tested?
    
    To update the existing test.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #43096 from itholic/SPARK-45308.
    
    Authored-by: Haejoon Lee <ha...@databricks.com>
    Signed-off-by: Ruifeng Zheng <ru...@apache.org>
---
 python/pyspark/pandas/tests/groupby/test_split_apply.py | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/pandas/tests/groupby/test_split_apply.py b/python/pyspark/pandas/tests/groupby/test_split_apply.py
index 070fa01a868..a3ef8c73de4 100644
--- a/python/pyspark/pandas/tests/groupby/test_split_apply.py
+++ b/python/pyspark/pandas/tests/groupby/test_split_apply.py
@@ -40,18 +40,18 @@ class GroupbySplitApplyMixin:
     def psdf(self):
         return ps.from_pandas(self.pdf)
 
-    @unittest.skipIf(
-        LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
-        "TODO(SPARK-43445): Enable GroupBySlowTests.test_split_apply_combine_on_series "
-        "for pandas 2.0.0.",
-    )
     def test_split_apply_combine_on_series(self):
+        # TODO(SPARK-45228): Enabling string type columns for `test_split_apply_combine_on_series`
+        #  when Pandas regression is fixed
+        # There is a regression in Pandas 2.1.0,
+        # so we should manually cast to float until the regression is fixed.
+        # See https://github.com/pandas-dev/pandas/issues/55194.
         pdf = pd.DataFrame(
             {
                 "a": [1, 2, 6, 4, 4, 6, 4, 3, 7],
                 "b": [4, 2, 7, 3, 3, 1, 1, 1, 2],
                 "c": [4, 2, 7, 3, None, 1, 1, 1, 2],
-                "d": list("abcdefght"),
+                # "d": list("abcdefght"),
             },
             index=[0, 1, 3, 5, 6, 8, 9, 9, 9],
         )


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org