You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/03 18:40:50 UTC

[GitHub] [spark] ueshin commented on a change in pull request #35706: [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map`

ueshin commented on a change in pull request #35706:
URL: https://github.com/apache/spark/pull/35706#discussion_r818958752



##########
File path: python/pyspark/pandas/series.py
##########
@@ -1045,8 +1048,18 @@ def map(self, arg: Union[Dict, Callable]) -> "Series":
         2      I am a None
         3    I am a rabbit
         dtype: object
+
+        To avoid applying the function to missing values (and keep them as NaN)
+        na_action='ignore' can be used:
+
+        >>> s.map('I am a {}'.format, na_action='ignore')
+        0       I am a cat
+        1       I am a dog
+        2             None
+        3    I am a rabbit
+        dtype: object

Review comment:
       We might also want to have an example taking `pd.Series` as `arg`.

##########
File path: python/pyspark/pandas/tests/test_series.py
##########
@@ -1161,13 +1161,29 @@ def test_append(self):
     def test_map(self):
         pser = pd.Series(["cat", "dog", None, "rabbit"])
         psser = ps.from_pandas(pser)
-        # Currently Koalas doesn't return NaN as pandas does.
+
+        # dict correspondence
+        # Currently pandas API on Spark doesn't return NaN as pandas does.
         self.assert_eq(psser.map({}), pser.map({}).replace({pd.np.nan: None}))

Review comment:
       nit: shall we use `np.nan` instead of `pd.np.nan` while we are here?

##########
File path: python/pyspark/pandas/series.py
##########
@@ -992,8 +993,10 @@ def map(self, arg: Union[Dict, Callable]) -> "Series":
 
         Parameters
         ----------
-        arg : function or dict
+        arg : function, dict or pd.Series

Review comment:
       If we accept `pd.Series`, we might also want to accept `ps.Series`, which could be another future work.

##########
File path: python/pyspark/pandas/tests/test_series.py
##########
@@ -1161,13 +1161,29 @@ def test_append(self):
     def test_map(self):
         pser = pd.Series(["cat", "dog", None, "rabbit"])
         psser = ps.from_pandas(pser)
-        # Currently Koalas doesn't return NaN as pandas does.
+
+        # dict correspondence
+        # Currently pandas API on Spark doesn't return NaN as pandas does.
         self.assert_eq(psser.map({}), pser.map({}).replace({pd.np.nan: None}))
 
         d = defaultdict(lambda: "abc")
         self.assertTrue("abc" in repr(psser.map(d)))
         self.assert_eq(psser.map(d), pser.map(d))
 
+        # series correspondence
+        pser_to_apply = pd.Series(["one", "two", "four"], index=["cat", "dog", "rabbit"])
+        self.assert_eq(psser.map(pser_to_apply), pser.map(pser_to_apply))
+        self.assert_eq(
+            psser.map(pser_to_apply, na_action="ignore"),
+            pser.map(pser_to_apply, na_action="ignore"),

Review comment:
       When the correspondence is a `pd.Series`, seems like `na_action` doesn't have any effect?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org