You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/29 02:42:34 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

HyukjinKwon opened a new pull request #29899:
URL: https://github.com/apache/spark/pull/29899


   ### What changes were proposed in this pull request?
   
   `nth_value` was added at SPARK-27951. This PR adds the corresponding PySpark API.
   
   ### Why are the changes needed?
   
   To support the consistent APIs
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it introduces a new PySpark function API.
   
   ### How was this patch tested?
   
   Unittest was added.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700430638


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700409209






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496355785



##########
File path: python/pyspark/sql/functions.py
##########
@@ -934,6 +934,25 @@ def lead(col, offset=1, default=None):
     return Column(sc._jvm.functions.lead(_to_java_column(col), offset, default))
 
 
+@since(3.1)
+def nth_value(col, offset, ignoreNulls=False):
+    """
+    Window function: returns the value that is the `offset`\\th row of the window frame
+    (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+
+    It will return the `offset`\\th non-null value it sees when `ignoreNulls` is set to
+    true. If all values are null, then null is returned.
+
+    This is equivalent to the nth_value function in SQL.
+
+    :param col: name of column or expression
+    :param offset: number of row to use as the value
+    :param ignoreNulls: default value

Review comment:
       an optional specification that indicates the NthValue should skip null values in the determination of which row to use.

##########
File path: python/pyspark/sql/functions.py
##########
@@ -934,6 +934,25 @@ def lead(col, offset=1, default=None):
     return Column(sc._jvm.functions.lead(_to_java_column(col), offset, default))
 
 
+@since(3.1)
+def nth_value(col, offset, ignoreNulls=False):
+    """
+    Window function: returns the value that is the `offset`\\th row of the window frame

Review comment:
       `\\th`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700401132


   **[Test build #129204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129204/testReport)** for PR 29899 at commit [`179bdea`](https://github.com/apache/spark/commit/179bdea5475d55c41103170d94811c43c3dc8f48).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700443463


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33825/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496348050



##########
File path: python/pyspark/sql/functions.py
##########
@@ -934,6 +934,25 @@ def lead(col, offset=1, default=None):
     return Column(sc._jvm.functions.lead(_to_java_column(col), offset, default))
 
 
+@since(3.1)
+def nth_value(col, offset, ignoreNulls=False):
+    """
+    Window function: returns the value that is the `offset`th row of the window frame
+    (counting from 1), and `null` if the size of window frame is less than `offset` rows.
+
+    It will return the `offset`th non-null value it sees when `ignoreNulls` is set to
+    true. If all values are null, then null is returned.
+
+    This is equivalent to the nth_value function in SQL.
+
+    :param col: name of column or expression
+    :param offset: number of row to use as the value
+    :param ignoreNulls: default value
+    """
+    sc = SparkContext._active_spark_context
+    return Column(sc._jvm.functions.nth_value(_to_java_column(col), offset, ignoreNulls))

Review comment:
       Ideally we should also do the type check. But let me just match it with other window functions for now, and do it separately together with other window functions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700430638


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700431700


   Thank you @dongjoon-hyun.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700425862


   **[Test build #129210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129210/testReport)** for PR 29899 at commit [`98b04a2`](https://github.com/apache/spark/commit/98b04a2546418be49e2a105d8909043bc3113f7b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700408943


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33819/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700397766


   **[Test build #129207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129207/testReport)** for PR 29899 at commit [`dccf85d`](https://github.com/apache/spark/commit/dccf85dafcf979deb1e008cbb5299e5893404de4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700396896


   cc @beliefer and @cloud-fan FYI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496346470



##########
File path: python/pyspark/sql/functions.pyi
##########
@@ -85,6 +85,9 @@ def lag(
 def lead(
     col: ColumnOrName, offset: int = ..., default: Optional[Any] = ...
 ) -> Column: ...
+def nth_value(
+    col: ColumnOrName, offset: int, ignoreNulls: Optional[bool] = ...

Review comment:
       I was about to suggest to make it accept `Column` in the original PR but .. let's leave it matched for now - I think Window one might have to be treated differently given that it has different SQL syntax.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700419123






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700404950


   **[Test build #129207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129207/testReport)** for PR 29899 at commit [`dccf85d`](https://github.com/apache/spark/commit/dccf85dafcf979deb1e008cbb5299e5893404de4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700418910


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33822/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700393912


   **[Test build #129204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129204/testReport)** for PR 29899 at commit [`179bdea`](https://github.com/apache/spark/commit/179bdea5475d55c41103170d94811c43c3dc8f48).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700409209






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700412707


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33822/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700419470


   **[Test build #129210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129210/testReport)** for PR 29899 at commit [`98b04a2`](https://github.com/apache/spark/commit/98b04a2546418be49e2a105d8909043bc3113f7b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700404872


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33819/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #29899:
URL: https://github.com/apache/spark/pull/29899


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700433263


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33825/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700444599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700403909


   @HyukjinKwon Thank you! I learned more about PySpark.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496365651



##########
File path: python/pyspark/sql/functions.py
##########
@@ -934,6 +934,25 @@ def lead(col, offset=1, default=None):
     return Column(sc._jvm.functions.lead(_to_java_column(col), offset, default))
 
 
+@since(3.1)
+def nth_value(col, offset, ignoreNulls=False):
+    """
+    Window function: returns the value that is the `offset`\\th row of the window frame

Review comment:
       Yeah, it should have `\\` to distinguish `th` from the previous backquote.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700496796


   Thanks @zero323 please go ahead!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496346470



##########
File path: python/pyspark/sql/functions.pyi
##########
@@ -85,6 +85,9 @@ def lag(
 def lead(
     col: ColumnOrName, offset: int = ..., default: Optional[Any] = ...
 ) -> Column: ...
+def nth_value(
+    col: ColumnOrName, offset: int, ignoreNulls: Optional[bool] = ...

Review comment:
       I was about to suggest to make it accept `Column` in the original PR but .. let's leave it matched for now - I think Window once might have to be treated differently given that it has different SQL syntax.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700426470






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700444599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700397766


   **[Test build #129207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129207/testReport)** for PR 29899 at commit [`dccf85d`](https://github.com/apache/spark/commit/dccf85dafcf979deb1e008cbb5299e5893404de4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700406036






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700419470


   **[Test build #129210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129210/testReport)** for PR 29899 at commit [`98b04a2`](https://github.com/apache/spark/commit/98b04a2546418be49e2a105d8909043bc3113f7b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700401706






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700394479


   cc @zero323 would you mind taking a look when you find some time?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700431300


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700488786


   > cc @zero323 would you mind taking a look when you find some time?
   
   Looks good. I am going to add this to R as well, unless there someone already works on that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29899:
URL: https://github.com/apache/spark/pull/29899#discussion_r496347604



##########
File path: python/pyspark/sql/functions.pyi
##########
@@ -85,6 +85,9 @@ def lag(
 def lead(
     col: ColumnOrName, offset: int = ..., default: Optional[Any] = ...

Review comment:
       I'll take a look separately for these primitive types vs `Column` issue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700426470






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700401706






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700419123






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700393912


   **[Test build #129204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129204/testReport)** for PR 29899 at commit [`179bdea`](https://github.com/apache/spark/commit/179bdea5475d55c41103170d94811c43c3dc8f48).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29899: [SPARK-33020][PYTHON] Add nth_value as a PySpark function

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29899:
URL: https://github.com/apache/spark/pull/29899#issuecomment-700406036






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org