You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/20 21:21:34 UTC

[GitHub] [spark] khalidmammadov opened a new pull request, #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

khalidmammadov opened a new pull request, #37592:
URL: https://github.com/apache/spark/pull/37592

   ### What changes were proposed in this pull request?
   Docstring improvements
   
   
   ### Why are the changes needed?
   To help users to understand pyspark API
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, documentation
   
   
   ### How was this patch tested?
   `bundle exec jekyll serve --host 0.0.0.0`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950953306


##########
python/pyspark/sql/functions.py:
##########
@@ -1315,6 +1710,26 @@ def stddev_pop(col: "ColumnOrName") -> Column:
     the expression in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        standard deviation of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(stddev_pop(df.id)).show()
+    +-----------------+
+    |   stddev_pop(id)|
+    +-----------------+
+    |1.707825127659933|

Review Comment:
   ditto



##########
python/pyspark/sql/functions.py:
##########
@@ -1305,6 +1680,26 @@ def stddev_samp(col: "ColumnOrName") -> Column:
     the expression in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        standard deviation of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(stddev_samp(df.id)).show()
+    +------------------+
+    |   stddev_samp(id)|
+    +------------------+
+    |1.8708286933869707|

Review Comment:
   ditto



##########
python/pyspark/sql/functions.py:
##########
@@ -1295,6 +1650,26 @@ def stddev(col: "ColumnOrName") -> Column:
     Aggregate function: alias for stddev_samp.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        standard deviation of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(stddev(df.id)).show()
+    +------------------+
+    |   stddev_samp(id)|
+    +------------------+
+    |1.8708286933869707|

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950953141


##########
python/pyspark/sql/functions.py:
##########
@@ -1343,6 +1798,26 @@ def var_pop(col: "ColumnOrName") -> Column:
     Aggregate function: returns the population variance of the values in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        variance of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(var_pop(df.id)).show()
+    +------------------+
+    |       var_pop(id)|
+    +------------------+
+    |2.9166666666666665|

Review Comment:
   I think we should probably do ELLIPSIS since float representation is flaky in Python (and JDK 11/17 IIRC)



##########
python/pyspark/sql/functions.py:
##########
@@ -1352,6 +1827,26 @@ def skewness(col: "ColumnOrName") -> Column:
     Aggregate function: returns the skewness of the values in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        skewness of given column.
+
+    Examples
+    --------
+    >>> df = spark.createDataFrame([[1],[1],[2]], ["c"])
+    >>> df.select(skewness(df.c)).show()
+    +------------------+
+    |       skewness(c)|
+    +------------------+
+    |0.7071067811865475|

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] khalidmammadov commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

khalidmammadov commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950903436


##########
python/pyspark/sql/functions.py:
##########
@@ -6158,12 +6764,14 @@ def _test() -> None:
     import doctest
     from pyspark.sql import Row, SparkSession
     import pyspark.sql.functions
+    import math
 
     globs = pyspark.sql.functions.__dict__.copy()
     spark = SparkSession.builder.master("local[4]").appName("sql.functions tests").getOrCreate()
     sc = spark.sparkContext
     globs["sc"] = sc
     globs["spark"] = spark
+    globs["math"] = math

Review Comment:
   removed, I can look into removal of "sc" in the next PRs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1221438918

   Thanks for doing this, @khalidmammadov 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] srowen commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

srowen commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950745297


##########
python/pyspark/sql/functions.py:
##########
@@ -994,12 +1014,22 @@ def cot(col: "ColumnOrName") -> Column:
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
-        Angle in radians
+        angle in radians.

Review Comment:
   Why these changes? seemed better before



##########
python/pyspark/sql/functions.py:
##########
@@ -1142,13 +1384,23 @@ def sinh(col: "ColumnOrName") -> Column:
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
-        hyperbolic angle
+        hyperbolic angle in radians

Review Comment:
   I dont' believe hyperbolic angles are measured in radians



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon closed pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)
URL: https://github.com/apache/spark/pull/37592


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] khalidmammadov commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

khalidmammadov commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950903322


##########
python/pyspark/sql/functions.py:
##########
@@ -994,12 +1014,22 @@ def cot(col: "ColumnOrName") -> Column:
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
-        Angle in radians
+        angle in radians.

Review Comment:
   There are different conventions in this file and most prominent ones is lower case and dot at the end. Tried to be consistent. Can revert if needed  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] khalidmammadov commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

khalidmammadov commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1221625649

   > Thanks for doing this, @khalidmammadov
   
   my pleasure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950952825


##########
python/pyspark/sql/functions.py:
##########
@@ -1037,6 +1084,22 @@ def expm1(col: "ColumnOrName") -> Column:
     Computes the exponential of the given value minus one.
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate exponential for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        exponential less one.
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(expm1(lit(1))).first()  # doctest: +ELLIPSIS
+    Row(EXPM1(1)=1.71828...)

Review Comment:
   This is a nice trick. In this way, the test won't be flaky even when the precision changes. If you use `show()`, it might break because of dashes (e.g., if 1.321 become 1.32). BTW you can remove `# doctest: +ELLIPSIS` since that's defined in the bottom of this file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] khalidmammadov commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

khalidmammadov commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950903329


##########
python/pyspark/sql/functions.py:
##########
@@ -1142,13 +1384,23 @@ def sinh(col: "ColumnOrName") -> Column:
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
-        hyperbolic angle
+        hyperbolic angle in radians

Review Comment:
   my bad, reverted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1221462892

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950766376


##########
python/pyspark/sql/functions.py:
##########
@@ -6158,12 +6764,14 @@ def _test() -> None:
     import doctest
     from pyspark.sql import Row, SparkSession
     import pyspark.sql.functions
+    import math
 
     globs = pyspark.sql.functions.__dict__.copy()
     spark = SparkSession.builder.master("local[4]").appName("sql.functions tests").getOrCreate()
     sc = spark.sparkContext
     globs["sc"] = sc
     globs["spark"] = spark
+    globs["math"] = math

Review Comment:
   Let's import this within the docstring. I think we should ideally remove all these globals (except `spark`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1223372735

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r952037018


##########
python/pyspark/sql/functions.py:
##########
@@ -994,12 +1007,19 @@ def cot(col: "ColumnOrName") -> Column:
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
-        Angle in radians
+        angle in radians.
 
     Returns
     -------
     :class:`~pyspark.sql.Column`
-        Cotangent of the angle.
+        cotangent of the angle.
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(cot(lit(math.radians(45)))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(cot(lit(math.radians(45)))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -962,6 +962,13 @@ def cos(col: "ColumnOrName") -> Column:
     -------
     :class:`~pyspark.sql.Column`
         cosine of the angle, as if computed by `java.lang.Math.cos()`.
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(cos(lit(math.pi))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(cos(lit(math.pi))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1055,6 +1138,23 @@ def log(col: "ColumnOrName") -> Column:
     Computes the natural logarithm of the given value.
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate natural logarithm for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        natural logarithm of the given value.
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(log(lit(math.e))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(log(lit(math.e))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1142,13 +1352,19 @@ def sinh(col: "ColumnOrName") -> Column:
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
-        hyperbolic angle
+        hyperbolic angle.
 
     Returns
     -------
     :class:`~pyspark.sql.Column`
         hyperbolic sine of the given value,
         as if computed by `java.lang.Math.sinh()`
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(sinh(lit(1.1))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(sinh(lit(1.1))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1064,15 +1164,57 @@ def log10(col: "ColumnOrName") -> Column:
     Computes the logarithm of the given value in Base 10.
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate logarithm for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        logarithm of the given value in Base 10.
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(log10(lit(100))).show()
+    +----------+
+    |LOG10(100)|
+    +----------+
+    |       2.0|
+    +----------+
     """
     return _invoke_function_over_columns("log10", col)
 
 
 def log1p(col: "ColumnOrName") -> Column:
     """
-    Computes the natural logarithm of the given value plus one.
+    Computes the natural logarithm of the "given value plus one".
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate natural logarithm for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        natural logarithm of the "given value plus one".
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(log1p(lit(math.e))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(log1p(lit(math.e))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1305,6 +1677,22 @@ def stddev_samp(col: "ColumnOrName") -> Column:
     the expression in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        standard deviation of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(stddev_samp(df.id)).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(stddev_samp(df.id)).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1315,6 +1703,22 @@ def stddev_pop(col: "ColumnOrName") -> Column:
     the expression in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        standard deviation of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(stddev_pop(df.id)).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(stddev_pop(df.id)).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1343,6 +1787,22 @@ def var_pop(col: "ColumnOrName") -> Column:
     Aggregate function: returns the population variance of the values in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        variance of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(var_pop(df.id)).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(var_pop(df.id)).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1352,6 +1812,22 @@ def skewness(col: "ColumnOrName") -> Column:
     Aggregate function: returns the skewness of the values in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        skewness of given column.
+
+    Examples
+    --------
+    >>> df = spark.createDataFrame([[1],[1],[2]], ["c"])
+    >>> df.select(skewness(df.c)).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(skewness(df.c)).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1064,15 +1164,57 @@ def log10(col: "ColumnOrName") -> Column:
     Computes the logarithm of the given value in Base 10.
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate logarithm for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        logarithm of the given value in Base 10.
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(log10(lit(100))).show()
+    +----------+
+    |LOG10(100)|
+    +----------+
+    |       2.0|
+    +----------+
     """
     return _invoke_function_over_columns("log10", col)
 
 
 def log1p(col: "ColumnOrName") -> Column:
     """
-    Computes the natural logarithm of the given value plus one.
+    Computes the natural logarithm of the "given value plus one".
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate natural logarithm for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        natural logarithm of the "given value plus one".
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(log1p(lit(math.e))).first()  # doctest: +ELLIPSIS
+    Row(LOG1P(2.71828...)=1.31326...)
+
+    Same as:
+
+    >>> df.select(log(lit(math.e+1))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(log(lit(math.e+1))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -981,6 +988,12 @@ def cosh(col: "ColumnOrName") -> Column:
     -------
     :class:`~pyspark.sql.Column`
         hyperbolic cosine of the angle, as if computed by `java.lang.Math.cosh()`
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(cosh(lit(1))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(cosh(lit(1))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1013,12 +1033,19 @@ def csc(col: "ColumnOrName") -> Column:
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
-        Angle in radians
+        angle in radians.
 
     Returns
     -------
     :class:`~pyspark.sql.Column`
-        Cosecant of the angle.
+        cosecant of the angle.
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(csc(lit(math.radians(90)))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(csc(lit(math.radians(90)))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1124,11 +1326,19 @@ def sin(col: "ColumnOrName") -> Column:
     Parameters
     ----------
     col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
 
     Returns
     -------
     :class:`~pyspark.sql.Column`
         sine of the angle, as if computed by `java.lang.Math.sin()`
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(sin(lit(math.radians(90)))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(sin(lit(math.radians(90)))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1102,6 +1271,12 @@ def sec(col: "ColumnOrName") -> Column:
     -------
     :class:`~pyspark.sql.Column`
         Secant of the angle.
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(sec(lit(1.5))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(sec(lit(1.5))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1037,6 +1084,22 @@ def expm1(col: "ColumnOrName") -> Column:
     Computes the exponential of the given value minus one.
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate exponential for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        exponential less one.
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(expm1(lit(1))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(expm1(lit(1))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1188,6 +1411,13 @@ def tanh(col: "ColumnOrName") -> Column:
     :class:`~pyspark.sql.Column`
         hyperbolic tangent of the given value
         as if computed by `java.lang.Math.tanh()`
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(tanh(lit(math.radians(90)))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(tanh(lit(math.radians(90)))).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1295,6 +1651,22 @@ def stddev(col: "ColumnOrName") -> Column:
     Aggregate function: alias for stddev_samp.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        standard deviation of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(stddev(df.id)).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(stddev(df.id)).first()
   ```



##########
python/pyspark/sql/functions.py:
##########
@@ -1168,6 +1384,13 @@ def tan(col: "ColumnOrName") -> Column:
     -------
     :class:`~pyspark.sql.Column`
         tangent of the given value, as if computed by `java.lang.Math.tan()`
+
+    Examples
+    --------
+    >>> import math
+    >>> df = spark.range(1)
+    >>> df.select(tan(lit(math.radians(45)))).first()  # doctest: +ELLIPSIS

Review Comment:
   ```suggestion
       >>> df.select(tan(lit(math.radians(45)))).first()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950952825


##########
python/pyspark/sql/functions.py:
##########
@@ -1037,6 +1084,22 @@ def expm1(col: "ColumnOrName") -> Column:
     Computes the exponential of the given value minus one.
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate exponential for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        exponential less one.
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(expm1(lit(1))).first()  # doctest: +ELLIPSIS
+    Row(EXPM1(1)=1.71828...)

Review Comment:
   This is a nice trick.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950952825


##########
python/pyspark/sql/functions.py:
##########
@@ -1037,6 +1084,22 @@ def expm1(col: "ColumnOrName") -> Column:
     Computes the exponential of the given value minus one.
 
     .. versionadded:: 1.4.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        column to calculate exponential for.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        exponential less one.
+
+    Examples
+    --------
+    >>> df = spark.range(1)
+    >>> df.select(expm1(lit(1))).first()  # doctest: +ELLIPSIS
+    Row(EXPM1(1)=1.71828...)

Review Comment:
   This is a nice trick. In this way, the test won't be flaky even when the precision changes. If you use `show()`, it might break. BTW you can remove `# doctest: +ELLIPSIS` since that's defined in the bottom of this file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1223372856

   Thanks for working on this @khalidmammadov 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] khalidmammadov commented on pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

khalidmammadov commented on PR #37592:
URL: https://github.com/apache/spark/pull/37592#issuecomment-1223593052

   > Thanks for working on this @khalidmammadov
   
   No worries, I will pick up next batch then from functions 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37592: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 2, 32 functions)

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37592:
URL: https://github.com/apache/spark/pull/37592#discussion_r950953141


##########
python/pyspark/sql/functions.py:
##########
@@ -1343,6 +1798,26 @@ def var_pop(col: "ColumnOrName") -> Column:
     Aggregate function: returns the population variance of the values in a group.
 
     .. versionadded:: 1.6.0
+
+    Parameters
+    ----------
+    col : :class:`~pyspark.sql.Column` or str
+        target column to compute on.
+
+    Returns
+    -------
+    :class:`~pyspark.sql.Column`
+        variance of given column.
+
+    Examples
+    --------
+    >>> df = spark.range(6)
+    >>> df.select(var_pop(df.id)).show()
+    +------------------+
+    |       var_pop(id)|
+    +------------------+
+    |2.9166666666666665|

Review Comment:
   I think we should probably do ELLIPSIS (with the trick you did w/ `Row(...)`) since float representation is flaky in Python (and JDK 11/17 IIRC)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org