You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/01/25 22:02:53 UTC
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
GitHub user huaxingao opened a pull request:
https://github.com/apache/spark/pull/20400
[SPARK-23084][PYTHON]Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark
## What changes were proposed in this pull request?
Added unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark, also updated the rangeBetween API
## How was this patch tested?
did unit test on my local. Please let me know if I need to add unit test in tests.py
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/huaxingao/spark spark_23084
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20400.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20400
----
commit 0690596029d1c061c16564f12416311e7624e5b2
Author: Huaxin Gao <hu...@...>
Date: 2018-01-25T21:50:21Z
[SPARK-23084][PYTHON]Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86676/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86919/testReport)** for PR 20400 at commit [`25fee39`](https://github.com/apache/spark/commit/25fee3901cfba3599330da394e437c91a9783368).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86897/testReport)** for PR 20400 at commit [`45545d6`](https://github.com/apache/spark/commit/45545d65ce9ecc077bf842602ecc465ceeeda061).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #87061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87061/testReport)** for PR 20400 at commit [`f82c7d1`](https://github.com/apache/spark/commit/f82c7d11d12611eed5ac5bd9f4b8a4e6516fdf84).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165284238
--- Diff: python/pyspark/sql/window.py ---
@@ -120,20 +122,46 @@ def rangeBetween(start, end):
and "5" means the five off after the current row.
We recommend users use ``Window.unboundedPreceding``, ``Window.unboundedFollowing``,
- and ``Window.currentRow`` to specify special boundary values, rather than using integral
- values directly.
+ ``Window.currentRow``, ``pyspark.sql.functions.unboundedPreceding``,
+ ``pyspark.sql.functions.unboundedFollowing`` and ``pyspark.sql.functions.currentRow``
+ to specify special boundary values, rather than using integral values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ a column returned by ``pyspark.sql.functions.unboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ a column returned by ``pyspark.sql.functions.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+
+ >>> from pyspark.sql import functions as F, SparkSession, Window
+ >>> spark = SparkSession.builder.getOrCreate()
+ >>> df = spark.createDataFrame(
+ ... [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])
+ >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(
+ ... F.currentRow(), F.lit(1))
+ >>> df.withColumn("sum", F.sum("id").over(window)).show()
+ +---+--------+---+
+ | id|category|sum|
+ +---+--------+---+
+ | 1| b| 3|
+ | 2| b| 5|
+ | 3| b| 3|
+ | 1| a| 4|
+ | 1| a| 4|
+ | 2| a| 2|
+ +---+--------+---+
+ <BLANKLINE>
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, (int, long)) and isinstance(end, (int, long)):
--- End diff --
Is it possibly that we mix int and Column in the parameters?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164950347
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +126,20 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
--- End diff --
remove this line?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87061/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164966938
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +126,20 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
--- End diff --
@jiangxb1987 Sorry for the extra line. Will remove.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86662/testReport)** for PR 20400 at commit [`0690596`](https://github.com/apache/spark/commit/0690596029d1c061c16564f12416311e7624e5b2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/260/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/450/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164965129
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,45 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+ >>> df = spark.createDataFrame([(5,)])
+ >>> df.select(unboundedPreceding()).show
+ <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
+ """
+ sc = SparkContext._active_spark_context
+ return Column(sc._jvm.functions.unboundedPreceding())
+
+
+@since(2.3)
+def unboundedFollowing():
+ """
+ Window function: returns the special frame boundary that represents the last row
+ in the window partition.
+ >>> df = spark.createDataFrame([(5,)])
--- End diff --
Will add a newline. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86897/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164972094
--- Diff: python/pyspark/sql/window.py ---
@@ -212,16 +218,20 @@ def rangeBetween(self, start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+
--- End diff --
Can we have a doctest resembling this -
https://github.com/jiangxb1987/spark/blob/cec519b8cfbf1ed2a3107056ef5281a5be75ec54/sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala#L214-L240
?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165258763
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,48 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+
+ >>> df = spark.createDataFrame([(5,)])
+ >>> df.select(unboundedPreceding()).columns[0]
+ 'UNBOUNDED PRECEDING'
--- End diff --
I guess I will just remove these tests since we already have a doc test in window.py.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164078885
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,36 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
--- End diff --
Wait .. why do we this naming convention here? I thought we decided to stick to this_namimg_convention. Seems Scala side has this one which seems added in 2.3 -
https://github.com/apache/spark/blob/d5861aba9d80ca15ad3f22793b79822e470d6913/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L786-L788
@cloud-fan and @rxin, was this just a mistake or did I miss something?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/791/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:
https://github.com/apache/spark/pull/20400
@HyukjinKwon Thanks a lot for your help!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165254925
--- Diff: python/pyspark/sql/window.py ---
@@ -217,11 +242,16 @@ def rangeBetween(self, start, end):
:param end: boundary end, inclusive.
The frame is unbounded if this is ``Window.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+
--- End diff --
I think extra line was requested to be removed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20400
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165839717
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,36 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+ """
+ sc = SparkContext._active_spark_context
+ return Column(sc._jvm.functions.unboundedPreceding())
+
+
+@since(2.3)
+def unboundedFollowing():
+ """
+ Window function: returns the special frame boundary that represents the last row
+ in the window partition.
+ """
+ sc = SparkContext._active_spark_context
+ return Column(sc._jvm.functions.unboundedFollowing())
+
+
+@since(2.3)
--- End diff --
Shall we just go with 2.4?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164261678
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, int) and isinstance(end, int):
--- End diff --
Yup, but I think we can still have `long` type one:
```
>>> long(1)
1L
>>> isinstance(long(1), int)
False
```
You can simply do like `isinstance(long(1), (int, long))` with
```
if sys.version >= '3':
long = int
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20400
cc @jiangxb1987
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164950417
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,45 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+ >>> df = spark.createDataFrame([(5,)])
+ >>> df.select(unboundedPreceding()).show
+ <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
--- End diff --
Seems this print out the function `show`. Is this intentional?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165254437
--- Diff: python/pyspark/sql/window.py ---
@@ -129,11 +131,34 @@ def rangeBetween(start, end):
:param end: boundary end, inclusive.
The frame is unbounded if this is ``Window.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+
+ >>> from pyspark.sql import functions as F, SparkSession, Window
+ >>> spark = SparkSession.builder.getOrCreate()
+ >>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"),
+ ... (3, "b")], ["id", "category"])
+ >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(F.currentRow(),
+ ... F.lit(1))
--- End diff --
ditto:
```python
>>> window = Window.orderBy("id").partitionBy("category").rangeBetween(
... F.currentRow(), F.lit(1))
```
or line break or anything complying pep8.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20400
LGTM only one nit
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164261850
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, int) and isinstance(end, int):
+ if start <= Window._PRECEDING_THRESHOLD:
+ start = Window.unboundedPreceding
--- End diff --
@jiangxb1987
Do you mean to change to
```
if isinstance(start, int) and isinstance(end, int):
if start == Window._PRECEDING_THRESHOLD:
# Window._PRECEDING_THRESHOLD == Long.MinValue
start = Window.unboundedPreceding
if end == Window._FOLLOWING_THRESHOLD:
# Window._FOLLOWING_THRESHOLD == Long.MaxValue
end = Window.unboundedFollowing
```
I ran python tests, tests.py failed at
```
with patch("sys.maxsize", 2 ** 127 - 1):
importlib.reload(window)
self.assertTrue(rows_frame_match())
self.assertTrue(range_frame_match())
```
So I guess I will keep
```if start <= Window._PRECEDING_THRESHOLD```
and
``` if end >= Window._FOLLOWING_THRESHOLD```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86795/testReport)** for PR 20400 at commit [`bbf8778`](https://github.com/apache/spark/commit/bbf8778a963a5e0b8de1b5ab1fddf4cafe13c180).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/584/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164965040
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,45 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+ >>> df = spark.createDataFrame([(5,)])
+ >>> df.select(unboundedPreceding()).show
+ <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
--- End diff --
@HyukjinKwon Thanks for your comment.
Yes, it is intentional. I am trying to print out something that contains "UNBOUNDED PRECEDING" when calling the method unboundedPreceding(), so I will know this method gets executed correctly. I couldn't figure out a better way to do this. Please let me know if you have a better way.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/359/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165254380
--- Diff: python/pyspark/sql/window.py ---
@@ -129,11 +131,34 @@ def rangeBetween(start, end):
:param end: boundary end, inclusive.
The frame is unbounded if this is ``Window.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+
+ >>> from pyspark.sql import functions as F, SparkSession, Window
+ >>> spark = SparkSession.builder.getOrCreate()
+ >>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"),
+ ... (3, "b")], ["id", "category"])
--- End diff --
I think we better format it as ..
```python
>>> df = spark.createDataFrame(
... [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164950531
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,45 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+ >>> df = spark.createDataFrame([(5,)])
+ >>> df.select(unboundedPreceding()).show
+ <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
+ """
+ sc = SparkContext._active_spark_context
+ return Column(sc._jvm.functions.unboundedPreceding())
+
+
+@since(2.3)
+def unboundedFollowing():
+ """
+ Window function: returns the special frame boundary that represents the last row
+ in the window partition.
+ >>> df = spark.createDataFrame([(5,)])
--- End diff --
I believe we didn't claim we follow PEP 257 yet but I believe it would be good to have a newline between doctest and the description at least, if you don't mind.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165893442
--- Diff: python/pyspark/sql/window.py ---
@@ -208,20 +236,27 @@ def rangeBetween(self, start, end):
and "5" means the five off after the current row.
We recommend users use ``Window.unboundedPreceding``, ``Window.unboundedFollowing``,
- and ``Window.currentRow`` to specify special boundary values, rather than using integral
- values directly.
+ ``Window.currentRow``, ``pyspark.sql.functions.unboundedPreceding``,
+ ``pyspark.sql.functions.unboundedFollowing`` and ``pyspark.sql.functions.currentRow``
+ to specify special boundary values, rather than using integral values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ a column returned by ``pyspark.sql.functions.unboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ a column returned by ``pyspark.sql.functions.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, (int, long)) and isinstance(end, (int, long)):
+ if start <= Window._PRECEDING_THRESHOLD:
+ start = Window.unboundedPreceding
+ if end >= Window._FOLLOWING_THRESHOLD:
+ end = Window.unboundedFollowing
+ elif isinstance(start, Column) and isinstance(end, Column):
+ start = start._jc
+ end = end._jc
return WindowSpec(self._jspec.rangeBetween(start, end))
--- End diff --
@HyukjinKwon Thanks. Will make changes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/468/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164207282
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, int) and isinstance(end, int):
--- End diff --
This is different from the WindowSpec interface, we support Long values:
```
def rangeBetween(start: Long, end: Long): WindowSpec
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165229511
--- Diff: python/pyspark/sql/window.py ---
@@ -212,16 +218,20 @@ def rangeBetween(self, start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+
--- End diff --
@HyukjinKwon Thank you very much for your comments. I will make the changes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86919/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164207466
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, int) and isinstance(end, int):
+ if start <= Window._PRECEDING_THRESHOLD:
+ start = Window.unboundedPreceding
--- End diff --
This is not standard behavior, we may consider change this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165270774
--- Diff: python/pyspark/sql/window.py ---
@@ -129,11 +131,34 @@ def rangeBetween(start, end):
:param end: boundary end, inclusive.
The frame is unbounded if this is ``Window.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+
+ >>> from pyspark.sql import functions as F, SparkSession, Window
+ >>> spark = SparkSession.builder.getOrCreate()
+ >>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"),
+ ... (3, "b")], ["id", "category"])
+ >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(F.currentRow(),
+ ... F.lit(1))
+ >>> df.withColumn("sum", F.sum("id").over(window)).show()
+ +---+--------+---+
+ | id|category|sum|
+ +---+--------+---+
+ | 1| b| 3|
+ | 2| b| 5|
+ | 3| b| 3|
+ | 1| a| 4|
+ | 1| a| 4|
+ | 2| a| 2|
+ +---+--------+---+
+ <BLANKLINE>
--- End diff --
Seems to me this <BLANKLINE> is required.
I will change the rest except this one.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/249/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164054210
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, int) and isinstance(end, int):
--- End diff --
Sure, we need some tests ..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165229664
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +126,20 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+ any value greater than or equal to 9223372036854775807.
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, (int, long)) and isinstance(end, (int, long)):
+ if start <= Window._PRECEDING_THRESHOLD:
+ start = Window.unboundedPreceding
+ if end >= Window._FOLLOWING_THRESHOLD:
+ end = Window.unboundedFollowing
--- End diff --
will change. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164052902
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,36 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+ """
+ sc = SparkContext._active_spark_context
+ return Column(sc._jvm.functions.unboundedPreceding())
+
+
+@since(2.3)
+def unboundedFollowing():
+ """
+ Window function: returns the special frame boundary that represents the last row
+ in the window partition.
+ """
+ sc = SparkContext._active_spark_context
+ return Column(sc._jvm.functions.unboundedFollowing())
+
+
+@since(2.3)
+def currentRow():
+ """
+ Window function: returns the special frame boundary that represents the current row
+ in the window partition.
+ """
+ sc = SparkContext._active_spark_context
+ return Column(sc._jvm.functions.currentRow())
--- End diff --
Yea, let's add doctests ..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165254262
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,48 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+
+ >>> df = spark.createDataFrame([(5,)])
+ >>> df.select(unboundedPreceding()).columns[0]
+ 'UNBOUNDED PRECEDING'
--- End diff --
Shall we just put these tests as separate tests in `tests.py`? I don't think this is quite useful as an example in doc to be honest ..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164261413
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, int) and isinstance(end, int):
--- End diff --
@HyukjinKwon @jiangxb1987
Thank you very much for your comments.
It seems to me that int and long are "unified" in python 2. I tried the following:
```
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> value = 9223372036854775807
>>> isinstance(value, int)
True
```
It seems to me that we don't have to do long for Python 2.
I guess I will keep
```if isinstance(start, int) and isinstance(end, int)```
?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164973111
--- Diff: python/pyspark/sql/window.py ---
@@ -212,16 +218,20 @@ def rangeBetween(self, start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
--- End diff --
Wait .. why do we expose `org.apache.spark.sql.catalyst` path in Python doc .. ? In addition, this package is meant to be internal if I haven't missed something .. ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/251/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20400
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164248098
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, int) and isinstance(end, int):
--- End diff --
oh, thanks @HyukjinKwon , I'm not familiar with python :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86662/testReport)** for PR 20400 at commit [`0690596`](https://github.com/apache/spark/commit/0690596029d1c061c16564f12416311e7624e5b2).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20400
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164971807
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +126,20 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+ any value greater than or equal to 9223372036854775807.
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, (int, long)) and isinstance(end, (int, long)):
+ if start <= Window._PRECEDING_THRESHOLD:
+ start = Window.unboundedPreceding
+ if end >= Window._FOLLOWING_THRESHOLD:
+ end = Window.unboundedFollowing
--- End diff --
Shall we add a logic like:
```
elif isinstance(start, Column) and isinstance(end, Column):
start = start._jc
end = end._jc
```
too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86665/testReport)** for PR 20400 at commit [`2c66156`](https://github.com/apache/spark/commit/2c661561df80b551559d0685602e37fa62bf5d1e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86795/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #87061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87061/testReport)** for PR 20400 at commit [`f82c7d1`](https://github.com/apache/spark/commit/f82c7d11d12611eed5ac5bd9f4b8a4e6516fdf84).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164083276
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,36 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
--- End diff --
Oh! I am so sorry. I meant https://issues.apache.org/jira/browse/SPARK-10621 and this one seems a different case. Please ignore my comment above.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20400
Yup, we could also string as a column but I was thinking of matching the
signature with the Scala one for now, just for consistency ..
On 1 Feb 2018 5:24 pm, "Liang-Chi Hsieh" <no...@github.com> wrote:
*@viirya* commented on this pull request.
------------------------------
In python/pyspark/sql/window.py
<https://github.com/apache/spark/pull/20400#discussion_r165284238>:
> """
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, (int, long)) and isinstance(end, (int, long)):
Is it possibly that we mix int and Column in the parameters?
------------------------------
In python/pyspark/sql/window.py
<https://github.com/apache/spark/pull/20400#discussion_r165284328>:
> any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, (int, long)) and isinstance(end, (int, long)):
ditto.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/apache/spark/pull/20400#pullrequestreview-93195310>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGLXhdPpCrSrpXKsNU3qTUy17dXMGLmfks5tQXTDgaJpZM4Rtjrt>
.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165256200
--- Diff: python/pyspark/sql/window.py ---
@@ -217,11 +242,16 @@ def rangeBetween(self, start, end):
:param end: boundary end, inclusive.
--- End diff --
I think we should also update the description above `We recommend user ...` in a similar way. Linking is optionally better but just ` ``...`` ` is fine enough to me.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86662/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86897/testReport)** for PR 20400 at commit [`45545d6`](https://github.com/apache/spark/commit/45545d65ce9ecc077bf842602ecc465ceeeda061).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87307/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86665/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20400
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165284328
--- Diff: python/pyspark/sql/window.py ---
@@ -208,20 +236,27 @@ def rangeBetween(self, start, end):
and "5" means the five off after the current row.
We recommend users use ``Window.unboundedPreceding``, ``Window.unboundedFollowing``,
- and ``Window.currentRow`` to specify special boundary values, rather than using integral
- values directly.
+ ``Window.currentRow``, ``pyspark.sql.functions.unboundedPreceding``,
+ ``pyspark.sql.functions.unboundedFollowing`` and ``pyspark.sql.functions.currentRow``
+ to specify special boundary values, rather than using integral values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ a column returned by ``pyspark.sql.functions.unboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ a column returned by ``pyspark.sql.functions.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, (int, long)) and isinstance(end, (int, long)):
--- End diff --
ditto.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86919/testReport)** for PR 20400 at commit [`25fee39`](https://github.com/apache/spark/commit/25fee3901cfba3599330da394e437c91a9783368).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165254861
--- Diff: python/pyspark/sql/window.py ---
@@ -129,11 +131,34 @@ def rangeBetween(start, end):
:param end: boundary end, inclusive.
The frame is unbounded if this is ``Window.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
+
+ >>> from pyspark.sql import functions as F, SparkSession, Window
+ >>> spark = SparkSession.builder.getOrCreate()
+ >>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"),
+ ... (3, "b")], ["id", "category"])
+ >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(F.currentRow(),
+ ... F.lit(1))
+ >>> df.withColumn("sum", F.sum("id").over(window)).show()
+ +---+--------+---+
+ | id|category|sum|
+ +---+--------+---+
+ | 1| b| 3|
+ | 2| b| 5|
+ | 3| b| 3|
+ | 1| a| 4|
+ | 1| a| 4|
+ | 2| a| 2|
+ +---+--------+---+
+ <BLANKLINE>
--- End diff --
Hm, do we need `BLANKLINE`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86795/testReport)** for PR 20400 at commit [`bbf8778`](https://github.com/apache/spark/commit/bbf8778a963a5e0b8de1b5ab1fddf4cafe13c180).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #87307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87307/testReport)** for PR 20400 at commit [`f82c7d1`](https://github.com/apache/spark/commit/f82c7d11d12611eed5ac5bd9f4b8a4e6516fdf84).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165840412
--- Diff: python/pyspark/sql/window.py ---
@@ -208,20 +236,27 @@ def rangeBetween(self, start, end):
and "5" means the five off after the current row.
We recommend users use ``Window.unboundedPreceding``, ``Window.unboundedFollowing``,
- and ``Window.currentRow`` to specify special boundary values, rather than using integral
- values directly.
+ ``Window.currentRow``, ``pyspark.sql.functions.unboundedPreceding``,
+ ``pyspark.sql.functions.unboundedFollowing`` and ``pyspark.sql.functions.currentRow``
+ to specify special boundary values, rather than using integral values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ a column returned by ``pyspark.sql.functions.unboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ a column returned by ``pyspark.sql.functions.unboundedFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, (int, long)) and isinstance(end, (int, long)):
+ if start <= Window._PRECEDING_THRESHOLD:
+ start = Window.unboundedPreceding
+ if end >= Window._FOLLOWING_THRESHOLD:
+ end = Window.unboundedFollowing
+ elif isinstance(start, Column) and isinstance(end, Column):
+ start = start._jc
+ end = end._jc
return WindowSpec(self._jspec.rangeBetween(start, end))
--- End diff --
`doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)` and let's remove `<BLANKLINE>` at https://github.com/apache/spark/pull/20400/files#r165254861
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164245801
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
any value greater than or equal to min(sys.maxsize, 9223372036854775807).
"""
- if start <= Window._PRECEDING_THRESHOLD:
- start = Window.unboundedPreceding
- if end >= Window._FOLLOWING_THRESHOLD:
- end = Window.unboundedFollowing
+ if isinstance(start, int) and isinstance(end, int):
--- End diff --
FYI, Python 3 doesn't have `long` and it was merged to `int`. We should do `long` here and assign `int` to `long` in Python 3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164968965
--- Diff: python/pyspark/sql/functions.py ---
@@ -809,6 +809,45 @@ def ntile(n):
return Column(sc._jvm.functions.ntile(int(n)))
+@since(2.3)
+def unboundedPreceding():
+ """
+ Window function: returns the special frame boundary that represents the first row
+ in the window partition.
+ >>> df = spark.createDataFrame([(5,)])
+ >>> df.select(unboundedPreceding()).show
+ <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
--- End diff --
I think we should have a working example here in doc tests here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r165256128
--- Diff: python/pyspark/sql/window.py ---
@@ -217,11 +242,16 @@ def rangeBetween(self, start, end):
:param end: boundary end, inclusive.
The frame is unbounded if this is ``Window.unboundedFollowing``, or
--- End diff --
Can we rewrite to reflect the current change here? For example, something like ..
```
``Window.unboundedFollowing`` or a column returned by ``pyspark.sql.functions.unboundedFollowing``?
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20400
LGTM too except the three comments above.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86665/testReport)** for PR 20400 at commit [`2c66156`](https://github.com/apache/spark/commit/2c661561df80b551559d0685602e37fa62bf5d1e).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #87307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87307/testReport)** for PR 20400 at commit [`f82c7d1`](https://github.com/apache/spark/commit/f82c7d11d12611eed5ac5bd9f4b8a4e6516fdf84).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86676/testReport)** for PR 20400 at commit [`b856860`](https://github.com/apache/spark/commit/b85686099c2effe99319d05422c510658becd8fc).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20400
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20400#discussion_r164205837
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +124,19 @@ def rangeBetween(start, end):
values directly.
:param start: boundary start, inclusive.
- The frame is unbounded if this is ``Window.unboundedPreceding``, or
+ The frame is unbounded if this is ``Window.unboundedPreceding``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
any value less than or equal to max(-sys.maxsize, -9223372036854775808).
:param end: boundary end, inclusive.
- The frame is unbounded if this is ``Window.unboundedFollowing``, or
+ The frame is unbounded if this is ``Window.unboundedFollowing``,
+ ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
--- End diff --
nit: `UnboundedPFollowing` -> `UnboundedFollowing `
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20400
**[Test build #86676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86676/testReport)** for PR 20400 at commit [`b856860`](https://github.com/apache/spark/commit/b85686099c2effe99319d05422c510658becd8fc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org