You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/29 02:42:34 UTC
[GitHub] [spark] HyukjinKwon opened a new pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
HyukjinKwon opened a new pull request #29899:
URL: https://github.com/apache/spark/pull/29899
### What changes were proposed in this pull request?
`nth_value` was added at SPARK-27951. This PR adds the corresponding PySpark API.
### Why are the changes needed?
To support the consistent APIs
### Does this PR introduce _any_ user-facing change?
Yes, it introduces a new PySpark function API.
### How was this patch tested?
Unittest was added.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700430638
Merged to master.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700409209
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496355785
##########
File path: python/pyspark/sql/functions.py
##########
@@ -934,6 +934,25 @@ def lead(col, offset=1, default=None):
return Column(sc._jvm.functions.lead(_to_java_column(col), offset, default))
+@since(3.1)
+def nth_value(col, offset, ignoreNulls=False):
+ """
+ Window function: returns the value that is the `offset`\\th row of the window frame
+ (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+
+ It will return the `offset`\\th non-null value it sees when `ignoreNulls` is set to
+ true. If all values are null, then null is returned.
+
+ This is equivalent to the nth_value function in SQL.
+
+ :param col: name of column or expression
+ :param offset: number of row to use as the value
+ :param ignoreNulls: default value
Review comment:
an optional specification that indicates the NthValue should skip null values in the determination of which row to use.
##########
File path: python/pyspark/sql/functions.py
##########
@@ -934,6 +934,25 @@ def lead(col, offset=1, default=None):
return Column(sc._jvm.functions.lead(_to_java_column(col), offset, default))
+@since(3.1)
+def nth_value(col, offset, ignoreNulls=False):
+ """
+ Window function: returns the value that is the `offset`\\th row of the window frame
Review comment:
`\\th`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700401132
**[Test build #129204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129204/testReport)** for PR 29899 at commit [`179bdea`](https://github.com/apache/spark/commit/179bdea5475d55c41103170d94811c43c3dc8f48).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700443463
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33825/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496348050
##########
File path: python/pyspark/sql/functions.py
##########
@@ -934,6 +934,25 @@ def lead(col, offset=1, default=None):
return Column(sc._jvm.functions.lead(_to_java_column(col), offset, default))
+@since(3.1)
+def nth_value(col, offset, ignoreNulls=False):
+ """
+ Window function: returns the value that is the `offset`th row of the window frame
+ (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+
+ It will return the `offset`th non-null value it sees when `ignoreNulls` is set to
+ true. If all values are null, then null is returned.
+
+ This is equivalent to the nth_value function in SQL.
+
+ :param col: name of column or expression
+ :param offset: number of row to use as the value
+ :param ignoreNulls: default value
+ """
+ sc = SparkContext._active_spark_context
+ return Column(sc._jvm.functions.nth_value(_to_java_column(col), offset, ignoreNulls))
Review comment:
Ideally we should also do the type check. But let me just match it with other window functions for now, and do it separately together with other window functions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700430638
Merged to master.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700431700
Thank you @dongjoon-hyun.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700425862
**[Test build #129210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129210/testReport)** for PR 29899 at commit [`98b04a2`](https://github.com/apache/spark/commit/98b04a2546418be49e2a105d8909043bc3113f7b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700408943
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33819/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700397766
**[Test build #129207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129207/testReport)** for PR 29899 at commit [`dccf85d`](https://github.com/apache/spark/commit/dccf85dafcf979deb1e008cbb5299e5893404de4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700396896
cc @beliefer and @cloud-fan FYI
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496346470
##########
File path: python/pyspark/sql/functions.pyi
##########
@@ -85,6 +85,9 @@ def lag(
def lead(
col: ColumnOrName, offset: int = ..., default: Optional[Any] = ...
) -> Column: ...
+def nth_value(
+ col: ColumnOrName, offset: int, ignoreNulls: Optional[bool] = ...
Review comment:
I was about to suggest to make it accept `Column` in the original PR but .. let's leave it matched for now - I think Window one might have to be treated differently given that it has different SQL syntax.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700419123
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700404950
**[Test build #129207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129207/testReport)** for PR 29899 at commit [`dccf85d`](https://github.com/apache/spark/commit/dccf85dafcf979deb1e008cbb5299e5893404de4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700418910
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33822/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700393912
**[Test build #129204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129204/testReport)** for PR 29899 at commit [`179bdea`](https://github.com/apache/spark/commit/179bdea5475d55c41103170d94811c43c3dc8f48).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700409209
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700412707
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33822/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700419470
**[Test build #129210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129210/testReport)** for PR 29899 at commit [`98b04a2`](https://github.com/apache/spark/commit/98b04a2546418be49e2a105d8909043bc3113f7b).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700404872
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33819/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #29899:
URL: https://github.com/apache/spark/pull/29899
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700433263
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33825/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700444599
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700403909
@HyukjinKwon Thank you! I learned more about PySpark.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496365651
##########
File path: python/pyspark/sql/functions.py
##########
@@ -934,6 +934,25 @@ def lead(col, offset=1, default=None):
return Column(sc._jvm.functions.lead(_to_java_column(col), offset, default))
+@since(3.1)
+def nth_value(col, offset, ignoreNulls=False):
+ """
+ Window function: returns the value that is the `offset`\\th row of the window frame
Review comment:
Yeah, it should have `\\` to distinguish `th` from the previous backquote.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700496796
Thanks @zero323 please go ahead!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496346470
##########
File path: python/pyspark/sql/functions.pyi
##########
@@ -85,6 +85,9 @@ def lag(
def lead(
col: ColumnOrName, offset: int = ..., default: Optional[Any] = ...
) -> Column: ...
+def nth_value(
+ col: ColumnOrName, offset: int, ignoreNulls: Optional[bool] = ...
Review comment:
I was about to suggest to make it accept `Column` in the original PR but .. let's leave it matched for now - I think Window once might have to be treated differently given that it has different SQL syntax.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700426470
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700444599
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700397766
**[Test build #129207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129207/testReport)** for PR 29899 at commit [`dccf85d`](https://github.com/apache/spark/commit/dccf85dafcf979deb1e008cbb5299e5893404de4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700406036
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700419470
**[Test build #129210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129210/testReport)** for PR 29899 at commit [`98b04a2`](https://github.com/apache/spark/commit/98b04a2546418be49e2a105d8909043bc3113f7b).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700401706
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700394479
cc @zero323 would you mind taking a look when you find some time?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700431300
Merged to master.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zero323 commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700488786
> cc @zero323 would you mind taking a look when you find some time?
Looks good. I am going to add this to R as well, unless there someone already works on that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496347604
##########
File path: python/pyspark/sql/functions.pyi
##########
@@ -85,6 +85,9 @@ def lag(
def lead(
col: ColumnOrName, offset: int = ..., default: Optional[Any] = ...
Review comment:
I'll take a look separately for these primitive types vs `Column` issue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700426470
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700401706
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700419123
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700393912
**[Test build #129204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129204/testReport)** for PR 29899 at commit [`179bdea`](https://github.com/apache/spark/commit/179bdea5475d55c41103170d94811c43c3dc8f48).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700406036
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org