You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/20 21:21:34 UTC
[GitHub] [spark] khalidmammadov opened a new pull request, #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
khalidmammadov opened a new pull request, #37592:
URL: https://github.com/apache/spark/pull/37592
### What changes were proposed in this pull request?
Docstring improvements
### Why are the changes needed?
To help users to understand pyspark API
### Does this PR introduce _any_ user-facing change?
Yes, documentation
### How was this patch tested?
`bundle exec jekyll serve --host 0.0.0.0`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950953306
##########
python/pyspark/sql/functions.py:
##########
@@ -1315,6 +1710,26 @@ def stddev_pop(col: "ColumnOrName") -> Column:
the expression in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ standard deviation of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(stddev_pop(df.id)).show()
+ +-----------------+
+ | stddev_pop(id)|
+ +-----------------+
+ |1.707825127659933|
Review Comment:
ditto
##########
python/pyspark/sql/functions.py:
##########
@@ -1305,6 +1680,26 @@ def stddev_samp(col: "ColumnOrName") -> Column:
the expression in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ standard deviation of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(stddev_samp(df.id)).show()
+ +------------------+
+ | stddev_samp(id)|
+ +------------------+
+ |1.8708286933869707|
Review Comment:
ditto
##########
python/pyspark/sql/functions.py:
##########
@@ -1295,6 +1650,26 @@ def stddev(col: "ColumnOrName") -> Column:
Aggregate function: alias for stddev_samp.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ standard deviation of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(stddev(df.id)).show()
+ +------------------+
+ | stddev_samp(id)|
+ +------------------+
+ |1.8708286933869707|
Review Comment:
ditto
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950953141
##########
python/pyspark/sql/functions.py:
##########
@@ -1343,6 +1798,26 @@ def var_pop(col: "ColumnOrName") -> Column:
Aggregate function: returns the population variance of the values in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ variance of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(var_pop(df.id)).show()
+ +------------------+
+ | var_pop(id)|
+ +------------------+
+ |2.9166666666666665|
Review Comment:
I think we should probably do ELLIPSIS since float representation is flaky in Python (and JDK 11/17 IIRC)
##########
python/pyspark/sql/functions.py:
##########
@@ -1352,6 +1827,26 @@ def skewness(col: "ColumnOrName") -> Column:
Aggregate function: returns the skewness of the values in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ skewness of given column.
+
+ Examples
+ --------
+ >>> df = spark.createDataFrame([[1],[1],[2]], ["c"])
+ >>> df.select(skewness(df.c)).show()
+ +------------------+
+ | skewness(c)|
+ +------------------+
+ |0.7071067811865475|
Review Comment:
ditto
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950903436
##########
python/pyspark/sql/functions.py:
##########
@@ -6158,12 +6764,14 @@ def _test() -> None:
import doctest
from pyspark.sql import Row, SparkSession
import pyspark.sql.functions
+ import math
globs = pyspark.sql.functions.__dict__.copy()
spark = SparkSession.builder.master("local[4]").appName("sql.functions tests").getOrCreate()
sc = spark.sparkContext
globs["sc"] = sc
globs["spark"] = spark
+ globs["math"] = math
Review Comment:
removed, I can look into removal of "sc" in the next PRs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1221438918
Thanks for doing this, @khalidmammadov
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950745297
##########
python/pyspark/sql/functions.py:
##########
@@ -994,12 +1014,22 @@ def cot(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
- Angle in radians
+ angle in radians.
Review Comment:
Why these changes? seemed better before
##########
python/pyspark/sql/functions.py:
##########
@@ -1142,13 +1384,23 @@ def sinh(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
- hyperbolic angle
+ hyperbolic angle in radians
Review Comment:
I dont' believe hyperbolic angles are measured in radians
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
URL: https://github.com/apache/spark/pull/37592
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950903322
##########
python/pyspark/sql/functions.py:
##########
@@ -994,12 +1014,22 @@ def cot(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
- Angle in radians
+ angle in radians.
Review Comment:
There are different conventions in this file and most prominent ones is lower case and dot at the end. Tried to be consistent. Can revert if needed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1221625649
> Thanks for doing this, @khalidmammadov
my pleasure
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950952825
##########
python/pyspark/sql/functions.py:
##########
@@ -1037,6 +1084,22 @@ def expm1(col: "ColumnOrName") -> Column:
Computes the exponential of the given value minus one.
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate exponential for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ exponential less one.
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(expm1(lit(1))).first() # doctest: +ELLIPSIS
+ Row(EXPM1(1)=1.71828...)
Review Comment:
This is a nice trick. In this way, the test won't be flaky even when the precision changes. If you use `show()`, it might break because of dashes (e.g., if 1.321 become 1.32). BTW you can remove `# doctest: +ELLIPSIS` since that's defined in the bottom of this file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950903329
##########
python/pyspark/sql/functions.py:
##########
@@ -1142,13 +1384,23 @@ def sinh(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
- hyperbolic angle
+ hyperbolic angle in radians
Review Comment:
my bad, reverted
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1221462892
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950766376
##########
python/pyspark/sql/functions.py:
##########
@@ -6158,12 +6764,14 @@ def _test() -> None:
import doctest
from pyspark.sql import Row, SparkSession
import pyspark.sql.functions
+ import math
globs = pyspark.sql.functions.__dict__.copy()
spark = SparkSession.builder.master("local[4]").appName("sql.functions tests").getOrCreate()
sc = spark.sparkContext
globs["sc"] = sc
globs["spark"] = spark
+ globs["math"] = math
Review Comment:
Let's import this within the docstring. I think we should ideally remove all these globals (except `spark`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1223372735
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r952037018
##########
python/pyspark/sql/functions.py:
##########
@@ -994,12 +1007,19 @@ def cot(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
- Angle in radians
+ angle in radians.
Returns
-------
:class:`~pyspark.sql.Column`
- Cotangent of the angle.
+ cotangent of the angle.
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(cot(lit(math.radians(45)))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(cot(lit(math.radians(45)))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -962,6 +962,13 @@ def cos(col: "ColumnOrName") -> Column:
-------
:class:`~pyspark.sql.Column`
cosine of the angle, as if computed by `java.lang.Math.cos()`.
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(cos(lit(math.pi))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(cos(lit(math.pi))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1055,6 +1138,23 @@ def log(col: "ColumnOrName") -> Column:
Computes the natural logarithm of the given value.
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate natural logarithm for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ natural logarithm of the given value.
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(log(lit(math.e))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(log(lit(math.e))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1142,13 +1352,19 @@ def sinh(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
- hyperbolic angle
+ hyperbolic angle.
Returns
-------
:class:`~pyspark.sql.Column`
hyperbolic sine of the given value,
as if computed by `java.lang.Math.sinh()`
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(sinh(lit(1.1))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(sinh(lit(1.1))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1064,15 +1164,57 @@ def log10(col: "ColumnOrName") -> Column:
Computes the logarithm of the given value in Base 10.
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate logarithm for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ logarithm of the given value in Base 10.
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(log10(lit(100))).show()
+ +----------+
+ |LOG10(100)|
+ +----------+
+ | 2.0|
+ +----------+
"""
return _invoke_function_over_columns("log10", col)
def log1p(col: "ColumnOrName") -> Column:
"""
- Computes the natural logarithm of the given value plus one.
+ Computes the natural logarithm of the "given value plus one".
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate natural logarithm for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ natural logarithm of the "given value plus one".
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(log1p(lit(math.e))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(log1p(lit(math.e))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1305,6 +1677,22 @@ def stddev_samp(col: "ColumnOrName") -> Column:
the expression in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ standard deviation of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(stddev_samp(df.id)).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(stddev_samp(df.id)).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1315,6 +1703,22 @@ def stddev_pop(col: "ColumnOrName") -> Column:
the expression in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ standard deviation of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(stddev_pop(df.id)).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(stddev_pop(df.id)).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1343,6 +1787,22 @@ def var_pop(col: "ColumnOrName") -> Column:
Aggregate function: returns the population variance of the values in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ variance of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(var_pop(df.id)).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(var_pop(df.id)).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1352,6 +1812,22 @@ def skewness(col: "ColumnOrName") -> Column:
Aggregate function: returns the skewness of the values in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ skewness of given column.
+
+ Examples
+ --------
+ >>> df = spark.createDataFrame([[1],[1],[2]], ["c"])
+ >>> df.select(skewness(df.c)).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(skewness(df.c)).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1064,15 +1164,57 @@ def log10(col: "ColumnOrName") -> Column:
Computes the logarithm of the given value in Base 10.
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate logarithm for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ logarithm of the given value in Base 10.
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(log10(lit(100))).show()
+ +----------+
+ |LOG10(100)|
+ +----------+
+ | 2.0|
+ +----------+
"""
return _invoke_function_over_columns("log10", col)
def log1p(col: "ColumnOrName") -> Column:
"""
- Computes the natural logarithm of the given value plus one.
+ Computes the natural logarithm of the "given value plus one".
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate natural logarithm for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ natural logarithm of the "given value plus one".
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(log1p(lit(math.e))).first() # doctest: +ELLIPSIS
+ Row(LOG1P(2.71828...)=1.31326...)
+
+ Same as:
+
+ >>> df.select(log(lit(math.e+1))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(log(lit(math.e+1))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -981,6 +988,12 @@ def cosh(col: "ColumnOrName") -> Column:
-------
:class:`~pyspark.sql.Column`
hyperbolic cosine of the angle, as if computed by `java.lang.Math.cosh()`
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(cosh(lit(1))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(cosh(lit(1))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1013,12 +1033,19 @@ def csc(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
- Angle in radians
+ angle in radians.
Returns
-------
:class:`~pyspark.sql.Column`
- Cosecant of the angle.
+ cosecant of the angle.
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(csc(lit(math.radians(90)))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(csc(lit(math.radians(90)))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1124,11 +1326,19 @@ def sin(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
Returns
-------
:class:`~pyspark.sql.Column`
sine of the angle, as if computed by `java.lang.Math.sin()`
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(sin(lit(math.radians(90)))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(sin(lit(math.radians(90)))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1102,6 +1271,12 @@ def sec(col: "ColumnOrName") -> Column:
-------
:class:`~pyspark.sql.Column`
Secant of the angle.
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(sec(lit(1.5))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(sec(lit(1.5))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1037,6 +1084,22 @@ def expm1(col: "ColumnOrName") -> Column:
Computes the exponential of the given value minus one.
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate exponential for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ exponential less one.
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(expm1(lit(1))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(expm1(lit(1))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1188,6 +1411,13 @@ def tanh(col: "ColumnOrName") -> Column:
:class:`~pyspark.sql.Column`
hyperbolic tangent of the given value
as if computed by `java.lang.Math.tanh()`
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(tanh(lit(math.radians(90)))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(tanh(lit(math.radians(90)))).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1295,6 +1651,22 @@ def stddev(col: "ColumnOrName") -> Column:
Aggregate function: alias for stddev_samp.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ standard deviation of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(stddev(df.id)).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(stddev(df.id)).first()
```
##########
python/pyspark/sql/functions.py:
##########
@@ -1168,6 +1384,13 @@ def tan(col: "ColumnOrName") -> Column:
-------
:class:`~pyspark.sql.Column`
tangent of the given value, as if computed by `java.lang.Math.tan()`
+
+ Examples
+ --------
+ >>> import math
+ >>> df = spark.range(1)
+ >>> df.select(tan(lit(math.radians(45)))).first() # doctest: +ELLIPSIS
Review Comment:
```suggestion
>>> df.select(tan(lit(math.radians(45)))).first()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950952825
##########
python/pyspark/sql/functions.py:
##########
@@ -1037,6 +1084,22 @@ def expm1(col: "ColumnOrName") -> Column:
Computes the exponential of the given value minus one.
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate exponential for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ exponential less one.
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(expm1(lit(1))).first() # doctest: +ELLIPSIS
+ Row(EXPM1(1)=1.71828...)
Review Comment:
This is a nice trick.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950952825
##########
python/pyspark/sql/functions.py:
##########
@@ -1037,6 +1084,22 @@ def expm1(col: "ColumnOrName") -> Column:
Computes the exponential of the given value minus one.
.. versionadded:: 1.4.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to calculate exponential for.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ exponential less one.
+
+ Examples
+ --------
+ >>> df = spark.range(1)
+ >>> df.select(expm1(lit(1))).first() # doctest: +ELLIPSIS
+ Row(EXPM1(1)=1.71828...)
Review Comment:
This is a nice trick. In this way, the test won't be flaky even when the precision changes. If you use `show()`, it might break. BTW you can remove `# doctest: +ELLIPSIS` since that's defined in the bottom of this file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1223372856
Thanks for working on this @khalidmammadov
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1223593052
> Thanks for working on this @khalidmammadov
No worries, I will pick up next batch then from functions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950953141
##########
python/pyspark/sql/functions.py:
##########
@@ -1343,6 +1798,26 @@ def var_pop(col: "ColumnOrName") -> Column:
Aggregate function: returns the population variance of the values in a group.
.. versionadded:: 1.6.0
+
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ target column to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ variance of given column.
+
+ Examples
+ --------
+ >>> df = spark.range(6)
+ >>> df.select(var_pop(df.id)).show()
+ +------------------+
+ | var_pop(id)|
+ +------------------+
+ |2.9166666666666665|
Review Comment:
I think we should probably do ELLIPSIS (with the trick you did w/ `Row(...)`) since float representation is flaky in Python (and JDK 11/17 IIRC)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org