You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/09/04 00:43:29 UTC
[spark] branch master updated: [SPARK-45038][PYTHON][DOCS] Refine docstring of `max`
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 3ca57ae7a9b [SPARK-45038][PYTHON][DOCS] Refine docstring of `max`
3ca57ae7a9b is described below
commit 3ca57ae7a9bc2053807e0d0f04c59104037137e4
Author: allisonwang-db <al...@databricks.com>
AuthorDate: Mon Sep 4 09:43:17 2023 +0900
[SPARK-45038][PYTHON][DOCS] Refine docstring of `max`
### What changes were proposed in this pull request?
This PR refines the docstring for function `max` by adding more examples.
### Why are the changes needed?
To improve PySpark documentations.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
doctest
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #42758 from allisonwang-db/spark-45038-refine-max.
Authored-by: allisonwang-db <al...@databricks.com>
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
python/pyspark/sql/functions.py | 78 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 75 insertions(+), 3 deletions(-)
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index fb02cb0cc98..47d928fe59a 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1217,22 +1217,94 @@ def max(col: "ColumnOrName") -> Column:
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
- target column to compute on.
+ The target column on which the maximum value is computed.
Returns
-------
:class:`~pyspark.sql.Column`
- column for computed results.
+ A column that contains the maximum value computed.
+
+ See Also
+ --------
+ :meth:`pyspark.sql.functions.min`
+ :meth:`pyspark.sql.functions.avg`
+ :meth:`pyspark.sql.functions.sum`
+
+ Notes
+ -----
+ - Null values are ignored during the computation.
+ - NaN values are larger than any other numeric value.
Examples
--------
+ Example 1: Compute the maximum value of a numeric column
+
+ >>> import pyspark.sql.functions as sf
>>> df = spark.range(10)
- >>> df.select(max(col("id"))).show()
+ >>> df.select(sf.max(df.id)).show()
+-------+
|max(id)|
+-------+
| 9|
+-------+
+
+ Example 2: Compute the maximum value of a string column
+
+ >>> import pyspark.sql.functions as sf
+ >>> df = spark.createDataFrame([("A",), ("B",), ("C",)], ["value"])
+ >>> df.select(sf.max(df.value)).show()
+ +----------+
+ |max(value)|
+ +----------+
+ | C|
+ +----------+
+
+ Example 3: Compute the maximum value of a column in a grouped DataFrame
+
+ >>> import pyspark.sql.functions as sf
+ >>> df = spark.createDataFrame([("A", 1), ("A", 2), ("B", 3), ("B", 4)], ["key", "value"])
+ >>> df.groupBy("key").agg(sf.max(df.value)).show()
+ +---+----------+
+ |key|max(value)|
+ +---+----------+
+ | A| 2|
+ | B| 4|
+ +---+----------+
+
+ Example 4: Compute the maximum value of multiple columns in a grouped DataFrame
+
+ >>> import pyspark.sql.functions as sf
+ >>> df = spark.createDataFrame(
+ ... [("A", 1, 2), ("A", 2, 3), ("B", 3, 4), ("B", 4, 5)], ["key", "value1", "value2"])
+ >>> df.groupBy("key").agg(sf.max("value1"), sf.max("value2")).show()
+ +---+-----------+-----------+
+ |key|max(value1)|max(value2)|
+ +---+-----------+-----------+
+ | A| 2| 3|
+ | B| 4| 5|
+ +---+-----------+-----------+
+
+ Example 5: Compute the maximum value of a column with null values
+
+ >>> import pyspark.sql.functions as sf
+ >>> df = spark.createDataFrame([(1,), (2,), (None,)], ["value"])
+ >>> df.select(sf.max(df.value)).show()
+ +----------+
+ |max(value)|
+ +----------+
+ | 2|
+ +----------+
+
+ Example 6: Compute the maximum value of a column with "NaN" values
+
+ >>> import pyspark.sql.functions as sf
+ >>> df = spark.createDataFrame([(1.1,), (float("nan"),), (3.3,)], ["value"])
+ >>> df.select(sf.max(df.value)).show()
+ +----------+
+ |max(value)|
+ +----------+
+ | NaN|
+ +----------+
"""
return _invoke_function_over_columns("max", col)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org