You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/03 16:25:03 UTC

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36414: [SPARK-39077][PYTHON] Implement `skipna` of common statistical functions of DataFrame and Series

xinrong-databricks commented on code in PR #36414:
URL: https://github.com/apache/spark/pull/36414#discussion_r863955870


##########
python/pyspark/pandas/series.py:
##########
@@ -6859,13 +6860,16 @@ def _reduce_for_stat_function(
         sfun : the stats function to be used for aggregation
         name : original pandas API name.
         axis : used only for sanity check because series only support index axis.
-        numeric_only : not used by this implementation, but passed down by stats functions
+        numeric_only : not used by this implementation, but passed down by stats functions.
         """
         axis = validate_axis(axis)
         if axis == 1:
             raise NotImplementedError("Series does not support columns axis.")
 
-        scol = sfun(self)
+        if skipna:
+            scol = sfun(self)
+        else:
+            scol = F.first(F.lit(np.nan))

Review Comment:
   Good catch! Fixed. Test cases when Series has no NAs and `skipna` is False are added as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org