You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/28 23:53:02 UTC

[GitHub] [spark] itholic commented on a change in pull request #35868: [SPARK-38576][PYTHON] Implement `numeric_only` parameter for `DataFrame/Series.rank` to rank numeric columns only

itholic commented on a change in pull request #35868:
URL: https://github.com/apache/spark/pull/35868#discussion_r836940823



##########
File path: python/pyspark/pandas/series.py
##########
@@ -3561,6 +3563,8 @@ def rank(self, method: str = "average", ascending: bool = True) -> "Series":
             * dense: like 'min', but rank always increases by 1 between groups
         ascending : boolean, default True
             False for ranks by high (1) to low (N)
+        numeric_only : bool, optional
+            Rank only numeric columns if set to True.

Review comment:
       and how about refine this description something like: "For Series objects, it returns empty Series if set to True when Series is not numeric type ." ?
   
   I think pandas document for Series doesn't look like proper enough.

##########
File path: python/pyspark/pandas/frame.py
##########
@@ -10260,14 +10262,16 @@ def rank(self, method: str = "average", ascending: bool = True) -> "DataFrame":
             * dense: like 'min', but rank always increases by 1 between groups
         ascending : boolean, default True
             False for ranks by high (1) to low (N)
+        numeric_only : bool, optional
+            Rank only numeric columns if set to True.

Review comment:
       nit: in pandas documents, it's mentioned as "For DataFrame objects, rank only numeric columns if set to True.".
   
   Can we match the description as same as pandas ??

##########
File path: python/pyspark/pandas/frame.py
##########
@@ -10260,14 +10262,16 @@ def rank(self, method: str = "average", ascending: bool = True) -> "DataFrame":
             * dense: like 'min', but rank always increases by 1 between groups
         ascending : boolean, default True
             False for ranks by high (1) to low (N)
+        numeric_only : bool, optional
+            Rank only numeric columns if set to True.
 
         Returns
         -------
         ranks : same type as caller
 
         Examples
         --------
-        >>> df = ps.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 3, 2, 1]}, columns= ['A', 'B'])
+        >>> df = ps.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 3, 2, 1]}, columns=['A', 'B'])

Review comment:
       +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org