You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/23 00:16:26 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #37974: [SPARK-40542][PS] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers

zhengruifeng opened a new pull request, #37974:
URL: https://github.com/apache/spark/pull/37974

   ### What changes were proposed in this pull request?
   add a new `std` expression to support arbitary integral `ddof`
   
   
   ### Why are the changes needed?
   for API coverage
   
   ### Does this PR introduce _any_ user-facing change?
   yes, it accept `ddof` other than {0, 1}
   
   before
   ```
   In [4]: df = ps.DataFrame({'a': [1, 2, 3, np.nan], 'b': [0.1, 0.2, 0.3, np.nan]}, columns=['a', 'b'])
   
   In [5]: df.std(ddof=2)
   ---------------------------------------------------------------------------
   AssertionError                            Traceback (most recent call last)
   Cell In [5], line 1
   ----> 1 df.std(ddof=2)
   
   File ~/Dev/spark/python/pyspark/pandas/generic.py:1866, in Frame.std(self, axis, skipna, ddof, numeric_only)
      1803 def std(
      1804     self,
      1805     axis: Optional[Axis] = None,
      (...)
      1808     numeric_only: bool = None,
      1809 ) -> Union[Scalar, "Series"]:
      1810     """
      1811     Return sample standard deviation.
      1812 
      (...)
      1864     0.816496580927726
      1865     """
   -> 1866     assert ddof in (0, 1)
      1868     axis = validate_axis(axis)
      1870     if numeric_only is None and axis == 0:
   
   AssertionError: 
   ```
   
   after:
   ```
   In [3]: df = ps.DataFrame({'a': [1, 2, 3, np.nan], 'b': [0.1, 0.2, 0.3, np.nan]}, columns=['a', 'b'])
   
   In [4]: df.std(ddof=2)
   Out[4]:                                                                         
   a    1.414214
   b    0.141421
   dtype: float64
   
   In [5]: df.to_pandas().std(ddof=2)
   /Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   Out[5]: 
   a    1.414214
   b    0.141421
   dtype: float64
   
   
   ```
   
   ### How was this patch tested?
   added testsuites


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on PR #37974:
URL: https://github.com/apache/spark/pull/37974#issuecomment-1255731663

   Merged into master, thanks @HyukjinKwon for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers

Posted by GitBox <gi...@apache.org>.

zhengruifeng closed pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers
URL: https://github.com/apache/spark/pull/37974


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org