You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2022/04/14 01:27:39 UTC

[spark] branch master updated: [SPARK-38857][PYTHON] series name should be preserved in series.mode()

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 5d763eb63b6 [SPARK-38857][PYTHON] series name should be preserved in series.mode()
5d763eb63b6 is described below

commit 5d763eb63b67d4fee5972559ddfe0ff3e0e8e210
Author: Yikun Jiang <yi...@gmail.com>
AuthorDate: Thu Apr 14 10:27:19 2022 +0900

    [SPARK-38857][PYTHON] series name should be preserved in series.mode()
    
    ### What changes were proposed in this pull request?
    
    series name is preserved in `series.mode`.
    
    ### Why are the changes needed?
    
    series name should be preserved in series.mode() to follow pandas 1.4.x behavior.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes, if series set name, it will be preserved in series.mode()
    
    ### How was this patch tested?
    
    UT test both in before and after 1.4.x
    
    Closes #36159 from Yikun/SPARK-38857.
    
    Authored-by: Yikun Jiang <yi...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/pandas/series.py            | 8 ++++++--
 python/pyspark/pandas/tests/test_series.py | 7 ++++++-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py
index da1d41c2abe..f4638fe22de 100644
--- a/python/pyspark/pandas/series.py
+++ b/python/pyspark/pandas/series.py
@@ -4523,6 +4523,9 @@ class Series(Frame, IndexOpsMixin, Generic[T]):
 
         Always returns Series even if only one value is returned.
 
+        .. versionchanged:: 3.4.0
+           Series name is preserved to follow pandas 1.4+ behavior.
+
         Parameters
         ----------
         dropna : bool, default True
@@ -4597,8 +4600,9 @@ class Series(Frame, IndexOpsMixin, Generic[T]):
             F.col(SPARK_DEFAULT_INDEX_NAME).alias(SPARK_DEFAULT_SERIES_NAME)
         )
         internal = InternalFrame(spark_frame=sdf, index_spark_columns=None, column_labels=[None])
-
-        return first_series(DataFrame(internal))
+        ser_mode = first_series(DataFrame(internal))
+        ser_mode.name = self.name
+        return ser_mode
 
     def keys(self) -> "ps.Index":
         """
diff --git a/python/pyspark/pandas/tests/test_series.py b/python/pyspark/pandas/tests/test_series.py
index 76d35c51196..68fed26324d 100644
--- a/python/pyspark/pandas/tests/test_series.py
+++ b/python/pyspark/pandas/tests/test_series.py
@@ -2121,7 +2121,12 @@ class SeriesTest(PandasOnSparkTestCase, SQLTestUtils):
 
         pser.name = "x"
         psser = ps.from_pandas(pser)
-        self.assert_eq(psser.mode(), pser.mode())
+        if LooseVersion(pd.__version__) < LooseVersion("1.4"):
+            # Due to pandas bug: https://github.com/pandas-dev/pandas/issues/46737
+            psser.name = None
+            self.assert_eq(psser.mode(), pser.mode())
+        else:
+            self.assert_eq(psser.mode(), pser.mode())
         self.assert_eq(
             psser.mode(dropna=False).sort_values().reset_index(drop=True),
             pser.mode(dropna=False).sort_values().reset_index(drop=True),


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org