You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/23 00:16:26 UTC
[GitHub] [spark] zhengruifeng opened a new pull request, #37974: [SPARK-40542][PS] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers
zhengruifeng opened a new pull request, #37974:
URL: https://github.com/apache/spark/pull/37974
### What changes were proposed in this pull request?
add a new `std` expression to support arbitary integral `ddof`
### Why are the changes needed?
for API coverage
### Does this PR introduce _any_ user-facing change?
yes, it accept `ddof` other than {0, 1}
before
```
In [4]: df = ps.DataFrame({'a': [1, 2, 3, np.nan], 'b': [0.1, 0.2, 0.3, np.nan]}, columns=['a', 'b'])
In [5]: df.std(ddof=2)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In [5], line 1
----> 1 df.std(ddof=2)
File ~/Dev/spark/python/pyspark/pandas/generic.py:1866, in Frame.std(self, axis, skipna, ddof, numeric_only)
1803 def std(
1804 self,
1805 axis: Optional[Axis] = None,
(...)
1808 numeric_only: bool = None,
1809 ) -> Union[Scalar, "Series"]:
1810 """
1811 Return sample standard deviation.
1812
(...)
1864 0.816496580927726
1865 """
-> 1866 assert ddof in (0, 1)
1868 axis = validate_axis(axis)
1870 if numeric_only is None and axis == 0:
AssertionError:
```
after:
```
In [3]: df = ps.DataFrame({'a': [1, 2, 3, np.nan], 'b': [0.1, 0.2, 0.3, np.nan]}, columns=['a', 'b'])
In [4]: df.std(ddof=2)
Out[4]:
a 1.414214
b 0.141421
dtype: float64
In [5]: df.to_pandas().std(ddof=2)
/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
warnings.warn(message, PandasAPIOnSparkAdviceWarning)
Out[5]:
a 1.414214
b 0.141421
dtype: float64
```
### How was this patch tested?
added testsuites
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #37974:
URL: https://github.com/apache/spark/pull/37974#issuecomment-1255731663
Merged into master, thanks @HyukjinKwon for review
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers
Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers
URL: https://github.com/apache/spark/pull/37974
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org