You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2022/08/26 05:17:34 UTC

[spark] branch master updated: [SPARK-40010][PYTHON][DOCS][FOLLOWUP] Make pyspark.sql.window examples self-contained (part 2)

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 68c47d5e74b [SPARK-40010][PYTHON][DOCS][FOLLOWUP] Make pyspark.sql.window examples self-contained (part 2)
68c47d5e74b is described below

commit 68c47d5e74b8be481318388b3cc8b40ead35beea
Author: Qian.Sun <qi...@gmail.com>
AuthorDate: Fri Aug 26 14:17:07 2022 +0900

    [SPARK-40010][PYTHON][DOCS][FOLLOWUP] Make pyspark.sql.window examples self-contained (part 2)
    
    ### What changes were proposed in this pull request?
    
    As mentioned [here](https://issues.apache.org/jira/browse/SPARK-40148),  we need have examples for several API such as orderBy in `pyspark.sql.window`.
    
    ### Why are the changes needed?
    
    To make the documentation more readable and able to copy and paste directly in PySpark shell.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, Documentation changes only.
    
    ### How was this patch tested?
    
    Github Action about python doctest.
    
    Closes #37657 from dcoliversun/SPARK-40010.
    
    Authored-by: Qian.Sun <qi...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/sql/window.py | 88 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/python/pyspark/sql/window.py b/python/pyspark/sql/window.py
index 7bb59f36289..898cdfec14a 100644
--- a/python/pyspark/sql/window.py
+++ b/python/pyspark/sql/window.py
@@ -79,6 +79,44 @@ class Window:
         ----------
         cols : str, :class:`Column` or list
             names of columns or expressions
+
+        Returns
+        -------
+        :class: `WindowSpec`
+            A :class:`WindowSpec` with the partitioning defined.
+
+        Examples
+        --------
+        >>> from pyspark.sql import Window
+        >>> from pyspark.sql.functions import row_number
+        >>> df = spark.createDataFrame(
+        ...      [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])
+        >>> df.show()
+        +---+--------+
+        | id|category|
+        +---+--------+
+        |  1|       a|
+        |  1|       a|
+        |  2|       a|
+        |  1|       b|
+        |  2|       b|
+        |  3|       b|
+        +---+--------+
+
+        Show row number order by ``id`` in partition ``category``.
+
+        >>> window = Window.partitionBy("category").orderBy("id")
+        >>> df.withColumn("row_number", row_number().over(window)).show()
+        +---+--------+----------+
+        | id|category|row_number|
+        +---+--------+----------+
+        |  1|       a|         1|
+        |  1|       a|         2|
+        |  2|       a|         3|
+        |  1|       b|         1|
+        |  2|       b|         2|
+        |  3|       b|         3|
+        +---+--------+----------+
         """
         sc = SparkContext._active_spark_context
         assert sc is not None and sc._jvm is not None
@@ -95,6 +133,44 @@ class Window:
         ----------
         cols : str, :class:`Column` or list
             names of columns or expressions
+
+        Returns
+        -------
+        :class: `WindowSpec`
+            A :class:`WindowSpec` with the ordering defined.
+
+        Examples
+        --------
+        >>> from pyspark.sql import Window
+        >>> from pyspark.sql.functions import row_number
+        >>> df = spark.createDataFrame(
+        ...      [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])
+        >>> df.show()
+        +---+--------+
+        | id|category|
+        +---+--------+
+        |  1|       a|
+        |  1|       a|
+        |  2|       a|
+        |  1|       b|
+        |  2|       b|
+        |  3|       b|
+        +---+--------+
+
+        Show row number order by ``category`` in partition ``id``.
+
+        >>> window = Window.partitionBy("id").orderBy("category")
+        >>> df.withColumn("row_number", row_number().over(window)).show()
+        +---+--------+----------+
+        | id|category|row_number|
+        +---+--------+----------+
+        |  1|       a|         1|
+        |  1|       a|         2|
+        |  1|       b|         3|
+        |  2|       a|         1|
+        |  2|       b|         2|
+        |  3|       b|         1|
+        +---+--------+----------+
         """
         sc = SparkContext._active_spark_context
         assert sc is not None and sc._jvm is not None
@@ -134,6 +210,12 @@ class Window:
             The frame is unbounded if this is ``Window.unboundedFollowing``, or
             any value greater than or equal to 9223372036854775807.
 
+        Returns
+        -------
+        :class: `WindowSpec`
+            A :class:`WindowSpec` with the frame boundaries defined,
+            from `start` (inclusive) to `end` (inclusive).
+
         Examples
         --------
         >>> from pyspark.sql import Window
@@ -214,6 +296,12 @@ class Window:
             The frame is unbounded if this is ``Window.unboundedFollowing``, or
             any value greater than or equal to min(sys.maxsize, 9223372036854775807).
 
+        Returns
+        -------
+        :class: `WindowSpec`
+            A :class:`WindowSpec` with the frame boundaries defined,
+            from `start` (inclusive) to `end` (inclusive).
+
         Examples
         --------
         >>> from pyspark.sql import Window


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org