You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/01/25 22:02:53 UTC

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

GitHub user huaxingao opened a pull request:

    https://github.com/apache/spark/pull/20400

    [SPARK-23084][PYTHON]Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark

    
    
    ## What changes were proposed in this pull request?
    
    Added unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark, also updated the rangeBetween API
    
    ## How was this patch tested?
    
    did unit test on my local. Please let me know if I need to add unit test in tests.py


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huaxingao/spark spark_23084

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20400.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20400
    
----
commit 0690596029d1c061c16564f12416311e7624e5b2
Author: Huaxin Gao <hu...@...>
Date:   2018-01-25T21:50:21Z

    [SPARK-23084][PYTHON]Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86676/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86919/testReport)** for PR 20400 at commit [`25fee39`](https://github.com/apache/spark/commit/25fee3901cfba3599330da394e437c91a9783368).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86897/testReport)** for PR 20400 at commit [`45545d6`](https://github.com/apache/spark/commit/45545d65ce9ecc077bf842602ecc465ceeeda061).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #87061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87061/testReport)** for PR 20400 at commit [`f82c7d1`](https://github.com/apache/spark/commit/f82c7d11d12611eed5ac5bd9f4b8a4e6516fdf84).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165284238
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -120,20 +122,46 @@ def rangeBetween(start, end):
             and "5" means the five off after the current row.
     
             We recommend users use ``Window.unboundedPreceding``, ``Window.unboundedFollowing``,
    -        and ``Window.currentRow`` to specify special boundary values, rather than using integral
    -        values directly.
    +        ``Window.currentRow``, ``pyspark.sql.functions.unboundedPreceding``,
    +        ``pyspark.sql.functions.unboundedFollowing`` and ``pyspark.sql.functions.currentRow``
    +        to specify special boundary values, rather than using integral values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      a column returned by ``pyspark.sql.functions.unboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    a column returned by ``pyspark.sql.functions.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +
    +        >>> from pyspark.sql import functions as F, SparkSession, Window
    +        >>> spark = SparkSession.builder.getOrCreate()
    +        >>> df = spark.createDataFrame(
    +        ...     [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])
    +        >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(
    +        ...     F.currentRow(), F.lit(1))
    +        >>> df.withColumn("sum", F.sum("id").over(window)).show()
    +        +---+--------+---+
    +        | id|category|sum|
    +        +---+--------+---+
    +        |  1|       b|  3|
    +        |  2|       b|  5|
    +        |  3|       b|  3|
    +        |  1|       a|  4|
    +        |  1|       a|  4|
    +        |  2|       a|  2|
    +        +---+--------+---+
    +        <BLANKLINE>
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    --- End diff --
    
    Is it possibly that we mix int and Column in the parameters?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164950347
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +126,20 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    --- End diff --
    
    remove this line?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87061/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164966938
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +126,20 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    --- End diff --
    
    @jiangxb1987 Sorry for the extra line. Will remove. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86662/testReport)** for PR 20400 at commit [`0690596`](https://github.com/apache/spark/commit/0690596029d1c061c16564f12416311e7624e5b2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/260/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/450/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164965129
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,45 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +    >>> df = spark.createDataFrame([(5,)])
    +    >>> df.select(unboundedPreceding()).show
    +    <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
    +    """
    +    sc = SparkContext._active_spark_context
    +    return Column(sc._jvm.functions.unboundedPreceding())
    +
    +
    +@since(2.3)
    +def unboundedFollowing():
    +    """
    +    Window function: returns the special frame boundary that represents the last row
    +    in the window partition.
    +    >>> df = spark.createDataFrame([(5,)])
    --- End diff --
    
    Will add a newline. Thanks. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86897/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164972094
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -212,16 +218,20 @@ def rangeBetween(self, start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +
    --- End diff --
    
    Can we have a doctest resembling this -
     https://github.com/jiangxb1987/spark/blob/cec519b8cfbf1ed2a3107056ef5281a5be75ec54/sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala#L214-L240
    ?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165258763
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,48 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +
    +    >>> df = spark.createDataFrame([(5,)])
    +    >>> df.select(unboundedPreceding()).columns[0]
    +    'UNBOUNDED PRECEDING'
    --- End diff --
    
    I guess I will just remove these tests since we already have a doc test in window.py. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164078885
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,36 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    --- End diff --
    
    Wait .. why do we this naming convention here? I thought we decided to stick to this_namimg_convention. Seems Scala side has this one which seems added in 2.3 -
    
    https://github.com/apache/spark/blob/d5861aba9d80ca15ad3f22793b79822e470d6913/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L786-L788
    
    @cloud-fan and @rxin, was this just a mistake or did I miss something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/791/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    @HyukjinKwon Thanks a lot for your help!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165254925
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -217,11 +242,16 @@ def rangeBetween(self, start, end):
             :param end: boundary end, inclusive.
                         The frame is unbounded if this is ``Window.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +
    --- End diff --
    
    I think extra line was requested to be removed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165839717
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,36 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +    """
    +    sc = SparkContext._active_spark_context
    +    return Column(sc._jvm.functions.unboundedPreceding())
    +
    +
    +@since(2.3)
    +def unboundedFollowing():
    +    """
    +    Window function: returns the special frame boundary that represents the last row
    +    in the window partition.
    +    """
    +    sc = SparkContext._active_spark_context
    +    return Column(sc._jvm.functions.unboundedFollowing())
    +
    +
    +@since(2.3)
    --- End diff --
    
    Shall we just go with 2.4?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164261678
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, int) and isinstance(end, int):
    --- End diff --
    
    Yup, but I think we can still have `long` type one:
    
    ```
    >>> long(1)
    1L
    >>> isinstance(long(1), int)
    False
    ```
    
    You can simply do like `isinstance(long(1), (int, long))` with 
    
    ```
    if sys.version >= '3':
        long = int
    ```
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    cc @jiangxb1987 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164950417
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,45 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +    >>> df = spark.createDataFrame([(5,)])
    +    >>> df.select(unboundedPreceding()).show
    +    <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
    --- End diff --
    
    Seems this print out the function `show`. Is this intentional?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165254437
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -129,11 +131,34 @@ def rangeBetween(start, end):
             :param end: boundary end, inclusive.
                         The frame is unbounded if this is ``Window.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +
    +        >>> from pyspark.sql import functions as F, SparkSession, Window
    +        >>> spark = SparkSession.builder.getOrCreate()
    +        >>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"),
    +        ... (3, "b")], ["id", "category"])
    +        >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(F.currentRow(),
    +        ... F.lit(1))
    --- End diff --
    
    ditto:
    
    ```python
    >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(
    ...     F.currentRow(), F.lit(1))
    ```
    
    or line break or anything complying pep8.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    LGTM only one nit


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164261850
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, int) and isinstance(end, int):
    +            if start <= Window._PRECEDING_THRESHOLD:
    +                start = Window.unboundedPreceding
    --- End diff --
    
    @jiangxb1987 
    Do you mean to change to 
    ```
            if isinstance(start, int) and isinstance(end, int):
                if start == Window._PRECEDING_THRESHOLD:  
                    # Window._PRECEDING_THRESHOLD == Long.MinValue
                    start = Window.unboundedPreceding
                if end == Window._FOLLOWING_THRESHOLD:
                    # Window._FOLLOWING_THRESHOLD == Long.MaxValue
                    end = Window.unboundedFollowing
    ```
    I ran python tests, tests.py failed at 
    ```
            with patch("sys.maxsize", 2 ** 127 - 1):
                importlib.reload(window)
                self.assertTrue(rows_frame_match())
                self.assertTrue(range_frame_match())
    ```
    So I guess I will keep 
    ```if start <= Window._PRECEDING_THRESHOLD```
     and 
    ``` if end >= Window._FOLLOWING_THRESHOLD```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86795/testReport)** for PR 20400 at commit [`bbf8778`](https://github.com/apache/spark/commit/bbf8778a963a5e0b8de1b5ab1fddf4cafe13c180).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/584/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164965040
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,45 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +    >>> df = spark.createDataFrame([(5,)])
    +    >>> df.select(unboundedPreceding()).show
    +    <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
    --- End diff --
    
    @HyukjinKwon Thanks for your comment.
    Yes, it is intentional. I am trying to print out something that contains "UNBOUNDED PRECEDING" when calling the method unboundedPreceding(), so I will know this method gets executed correctly. I couldn't figure out a better way to do this. Please let me know if you have a better way.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/359/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165254380
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -129,11 +131,34 @@ def rangeBetween(start, end):
             :param end: boundary end, inclusive.
                         The frame is unbounded if this is ``Window.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +
    +        >>> from pyspark.sql import functions as F, SparkSession, Window
    +        >>> spark = SparkSession.builder.getOrCreate()
    +        >>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"),
    +        ... (3, "b")], ["id", "category"])
    --- End diff --
    
    I think we better format it as .. 
    
    ```python
    >>> df = spark.createDataFrame(
    ...     [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164950531
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,45 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +    >>> df = spark.createDataFrame([(5,)])
    +    >>> df.select(unboundedPreceding()).show
    +    <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
    +    """
    +    sc = SparkContext._active_spark_context
    +    return Column(sc._jvm.functions.unboundedPreceding())
    +
    +
    +@since(2.3)
    +def unboundedFollowing():
    +    """
    +    Window function: returns the special frame boundary that represents the last row
    +    in the window partition.
    +    >>> df = spark.createDataFrame([(5,)])
    --- End diff --
    
    I believe we didn't claim we follow PEP 257 yet but I believe it would be good to have a newline between doctest and the description at least, if you don't mind.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165893442
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -208,20 +236,27 @@ def rangeBetween(self, start, end):
             and "5" means the five off after the current row.
     
             We recommend users use ``Window.unboundedPreceding``, ``Window.unboundedFollowing``,
    -        and ``Window.currentRow`` to specify special boundary values, rather than using integral
    -        values directly.
    +        ``Window.currentRow``, ``pyspark.sql.functions.unboundedPreceding``,
    +        ``pyspark.sql.functions.unboundedFollowing`` and ``pyspark.sql.functions.currentRow``
    +        to specify special boundary values, rather than using integral values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      a column returned by ``pyspark.sql.functions.unboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    a column returned by ``pyspark.sql.functions.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    +            if start <= Window._PRECEDING_THRESHOLD:
    +                start = Window.unboundedPreceding
    +            if end >= Window._FOLLOWING_THRESHOLD:
    +                end = Window.unboundedFollowing
    +        elif isinstance(start, Column) and isinstance(end, Column):
    +            start = start._jc
    +            end = end._jc
             return WindowSpec(self._jspec.rangeBetween(start, end))
     
     
    --- End diff --
    
    @HyukjinKwon Thanks. Will make changes. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/468/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164207282
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, int) and isinstance(end, int):
    --- End diff --
    
    This is different from the WindowSpec interface, we support Long values:
    ```
    def rangeBetween(start: Long, end: Long): WindowSpec
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165229511
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -212,16 +218,20 @@ def rangeBetween(self, start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +
    --- End diff --
    
    @HyukjinKwon Thank you very much for your comments. I will make the changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86919/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164207466
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, int) and isinstance(end, int):
    +            if start <= Window._PRECEDING_THRESHOLD:
    +                start = Window.unboundedPreceding
    --- End diff --
    
    This is not standard behavior, we may consider change this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165270774
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -129,11 +131,34 @@ def rangeBetween(start, end):
             :param end: boundary end, inclusive.
                         The frame is unbounded if this is ``Window.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +
    +        >>> from pyspark.sql import functions as F, SparkSession, Window
    +        >>> spark = SparkSession.builder.getOrCreate()
    +        >>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"),
    +        ... (3, "b")], ["id", "category"])
    +        >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(F.currentRow(),
    +        ... F.lit(1))
    +        >>> df.withColumn("sum", F.sum("id").over(window)).show()
    +        +---+--------+---+
    +        | id|category|sum|
    +        +---+--------+---+
    +        |  1|       b|  3|
    +        |  2|       b|  5|
    +        |  3|       b|  3|
    +        |  1|       a|  4|
    +        |  1|       a|  4|
    +        |  2|       a|  2|
    +        +---+--------+---+
    +        <BLANKLINE>
    --- End diff --
    
    Seems to me this <BLANKLINE> is required.
    I will change the rest except this one. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/249/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164054210
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, int) and isinstance(end, int):
    --- End diff --
    
    Sure, we need some tests .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165229664
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +126,20 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +                    any value greater than or equal to 9223372036854775807.
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    +            if start <= Window._PRECEDING_THRESHOLD:
    +                start = Window.unboundedPreceding
    +            if end >= Window._FOLLOWING_THRESHOLD:
    +                end = Window.unboundedFollowing
    --- End diff --
    
    will change. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164052902
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,36 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +    """
    +    sc = SparkContext._active_spark_context
    +    return Column(sc._jvm.functions.unboundedPreceding())
    +
    +
    +@since(2.3)
    +def unboundedFollowing():
    +    """
    +    Window function: returns the special frame boundary that represents the last row
    +    in the window partition.
    +    """
    +    sc = SparkContext._active_spark_context
    +    return Column(sc._jvm.functions.unboundedFollowing())
    +
    +
    +@since(2.3)
    +def currentRow():
    +    """
    +    Window function: returns the special frame boundary that represents the current row
    +    in the window partition.
    +    """
    +    sc = SparkContext._active_spark_context
    +    return Column(sc._jvm.functions.currentRow())
    --- End diff --
    
    Yea, let's add doctests .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165254262
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,48 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +
    +    >>> df = spark.createDataFrame([(5,)])
    +    >>> df.select(unboundedPreceding()).columns[0]
    +    'UNBOUNDED PRECEDING'
    --- End diff --
    
    Shall we just put these tests as separate tests in `tests.py`? I don't think this is quite useful as an example in doc to be honest .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164261413
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, int) and isinstance(end, int):
    --- End diff --
    
    @HyukjinKwon @jiangxb1987 
    Thank you very much for your comments. 
    It seems to me that int and long are "unified" in python 2. I tried the following:
    ```
    Python 2.7.10 (default, Oct 23 2015, 19:19:21) 
    [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> value = 9223372036854775807
    >>> isinstance(value, int)
    True
    ```
    It seems to me that we don't have to do long for Python 2. 
    I guess I will keep
    ```if isinstance(start, int) and isinstance(end, int)```
    ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164973111
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -212,16 +218,20 @@ def rangeBetween(self, start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
    --- End diff --
    
    Wait .. why do we expose `org.apache.spark.sql.catalyst` path in Python doc .. ? In addition, this package is meant to be internal if I haven't missed something .. ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/251/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164248098
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, int) and isinstance(end, int):
    --- End diff --
    
    oh, thanks @HyukjinKwon , I'm not familiar with python :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86662/testReport)** for PR 20400 at commit [`0690596`](https://github.com/apache/spark/commit/0690596029d1c061c16564f12416311e7624e5b2).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20400


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164971807
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +126,20 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +                    any value greater than or equal to 9223372036854775807.
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    +            if start <= Window._PRECEDING_THRESHOLD:
    +                start = Window.unboundedPreceding
    +            if end >= Window._FOLLOWING_THRESHOLD:
    +                end = Window.unboundedFollowing
    --- End diff --
    
    Shall we add a logic like:
    
    ```
    elif isinstance(start, Column) and isinstance(end, Column):
        start = start._jc
        end = end._jc
    ```
    
    too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86665/testReport)** for PR 20400 at commit [`2c66156`](https://github.com/apache/spark/commit/2c661561df80b551559d0685602e37fa62bf5d1e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86795/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #87061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87061/testReport)** for PR 20400 at commit [`f82c7d1`](https://github.com/apache/spark/commit/f82c7d11d12611eed5ac5bd9f4b8a4e6516fdf84).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164083276
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,36 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    --- End diff --
    
    Oh! I am so sorry. I meant https://issues.apache.org/jira/browse/SPARK-10621 and this one seems a different case. Please ignore my comment above.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Yup, we could also string as a column but I was thinking of matching the
    signature with the Scala one for now, just for consistency ..
    
    On 1 Feb 2018 5:24 pm, "Liang-Chi Hsieh" <no...@github.com> wrote:
    
    *@viirya* commented on this pull request.
    ------------------------------
    
    In python/pyspark/sql/window.py
    <https://github.com/apache/spark/pull/20400#discussion_r165284238>:
    
    >          """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    
    Is it possibly that we mix int and Column in the parameters?
    ------------------------------
    
    In python/pyspark/sql/window.py
    <https://github.com/apache/spark/pull/20400#discussion_r165284328>:
    
    >                      any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    
    ditto.
    
    —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub
    <https://github.com/apache/spark/pull/20400#pullrequestreview-93195310>,
    or mute
    the thread
    <https://github.com/notifications/unsubscribe-auth/AGLXhdPpCrSrpXKsNU3qTUy17dXMGLmfks5tQXTDgaJpZM4Rtjrt>
    .



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165256200
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -217,11 +242,16 @@ def rangeBetween(self, start, end):
             :param end: boundary end, inclusive.
    --- End diff --
    
    I think we should also update the description above `We recommend user ...` in a similar way. Linking is optionally better but just ` ``...`` ` is fine enough to me.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86662/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86897/testReport)** for PR 20400 at commit [`45545d6`](https://github.com/apache/spark/commit/45545d65ce9ecc077bf842602ecc465ceeeda061).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87307/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86665/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165284328
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -208,20 +236,27 @@ def rangeBetween(self, start, end):
             and "5" means the five off after the current row.
     
             We recommend users use ``Window.unboundedPreceding``, ``Window.unboundedFollowing``,
    -        and ``Window.currentRow`` to specify special boundary values, rather than using integral
    -        values directly.
    +        ``Window.currentRow``, ``pyspark.sql.functions.unboundedPreceding``,
    +        ``pyspark.sql.functions.unboundedFollowing`` and ``pyspark.sql.functions.currentRow``
    +        to specify special boundary values, rather than using integral values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      a column returned by ``pyspark.sql.functions.unboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    a column returned by ``pyspark.sql.functions.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    --- End diff --
    
    ditto.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86919/testReport)** for PR 20400 at commit [`25fee39`](https://github.com/apache/spark/commit/25fee3901cfba3599330da394e437c91a9783368).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165254861
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -129,11 +131,34 @@ def rangeBetween(start, end):
             :param end: boundary end, inclusive.
                         The frame is unbounded if this is ``Window.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
    +
    +        >>> from pyspark.sql import functions as F, SparkSession, Window
    +        >>> spark = SparkSession.builder.getOrCreate()
    +        >>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"),
    +        ... (3, "b")], ["id", "category"])
    +        >>> window = Window.orderBy("id").partitionBy("category").rangeBetween(F.currentRow(),
    +        ... F.lit(1))
    +        >>> df.withColumn("sum", F.sum("id").over(window)).show()
    +        +---+--------+---+
    +        | id|category|sum|
    +        +---+--------+---+
    +        |  1|       b|  3|
    +        |  2|       b|  5|
    +        |  3|       b|  3|
    +        |  1|       a|  4|
    +        |  1|       a|  4|
    +        |  2|       a|  2|
    +        +---+--------+---+
    +        <BLANKLINE>
    --- End diff --
    
    Hm, do we need `BLANKLINE`? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86795/testReport)** for PR 20400 at commit [`bbf8778`](https://github.com/apache/spark/commit/bbf8778a963a5e0b8de1b5ab1fddf4cafe13c180).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #87307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87307/testReport)** for PR 20400 at commit [`f82c7d1`](https://github.com/apache/spark/commit/f82c7d11d12611eed5ac5bd9f4b8a4e6516fdf84).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165840412
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -208,20 +236,27 @@ def rangeBetween(self, start, end):
             and "5" means the five off after the current row.
     
             We recommend users use ``Window.unboundedPreceding``, ``Window.unboundedFollowing``,
    -        and ``Window.currentRow`` to specify special boundary values, rather than using integral
    -        values directly.
    +        ``Window.currentRow``, ``pyspark.sql.functions.unboundedPreceding``,
    +        ``pyspark.sql.functions.unboundedFollowing`` and ``pyspark.sql.functions.currentRow``
    +        to specify special boundary values, rather than using integral values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      a column returned by ``pyspark.sql.functions.unboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    a column returned by ``pyspark.sql.functions.unboundedFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, (int, long)) and isinstance(end, (int, long)):
    +            if start <= Window._PRECEDING_THRESHOLD:
    +                start = Window.unboundedPreceding
    +            if end >= Window._FOLLOWING_THRESHOLD:
    +                end = Window.unboundedFollowing
    +        elif isinstance(start, Column) and isinstance(end, Column):
    +            start = start._jc
    +            end = end._jc
             return WindowSpec(self._jspec.rangeBetween(start, end))
     
     
    --- End diff --
    
    `doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)` and let's remove `<BLANKLINE>` at https://github.com/apache/spark/pull/20400/files#r165254861


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164245801
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
                         any value greater than or equal to min(sys.maxsize, 9223372036854775807).
             """
    -        if start <= Window._PRECEDING_THRESHOLD:
    -            start = Window.unboundedPreceding
    -        if end >= Window._FOLLOWING_THRESHOLD:
    -            end = Window.unboundedFollowing
    +        if isinstance(start, int) and isinstance(end, int):
    --- End diff --
    
    FYI, Python 3 doesn't have `long` and it was merged to `int`. We should do `long` here and assign `int` to `long` in Python 3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164968965
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -809,6 +809,45 @@ def ntile(n):
         return Column(sc._jvm.functions.ntile(int(n)))
     
     
    +@since(2.3)
    +def unboundedPreceding():
    +    """
    +    Window function: returns the special frame boundary that represents the first row
    +    in the window partition.
    +    >>> df = spark.createDataFrame([(5,)])
    +    >>> df.select(unboundedPreceding()).show
    +    <bound method DataFrame.show of DataFrame[UNBOUNDED PRECEDING: null]>
    --- End diff --
    
    I think we should have a working example here in doc tests here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r165256128
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -217,11 +242,16 @@ def rangeBetween(self, start, end):
             :param end: boundary end, inclusive.
                         The frame is unbounded if this is ``Window.unboundedFollowing``, or
    --- End diff --
    
    Can we rewrite to reflect the current change here? For example, something like .. 
    
    ```
    ``Window.unboundedFollowing`` or a column returned by ``pyspark.sql.functions.unboundedFollowing``?
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    LGTM too except the three comments above.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86665/testReport)** for PR 20400 at commit [`2c66156`](https://github.com/apache/spark/commit/2c661561df80b551559d0685602e37fa62bf5d1e).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #87307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87307/testReport)** for PR 20400 at commit [`f82c7d1`](https://github.com/apache/spark/commit/f82c7d11d12611eed5ac5bd9f4b8a4e6516fdf84).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86676/testReport)** for PR 20400 at commit [`b856860`](https://github.com/apache/spark/commit/b85686099c2effe99319d05422c510658becd8fc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20400#discussion_r164205837
  
    --- Diff: python/pyspark/sql/window.py ---
    @@ -124,16 +124,19 @@ def rangeBetween(start, end):
             values directly.
     
             :param start: boundary start, inclusive.
    -                      The frame is unbounded if this is ``Window.unboundedPreceding``, or
    +                      The frame is unbounded if this is ``Window.unboundedPreceding``,
    +                      ``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
                           any value less than or equal to max(-sys.maxsize, -9223372036854775808).
             :param end: boundary end, inclusive.
    -                    The frame is unbounded if this is ``Window.unboundedFollowing``, or
    +                    The frame is unbounded if this is ``Window.unboundedFollowing``,
    +                    ``org.apache.spark.sql.catalyst.expressions.UnboundedPFollowing``, or
    --- End diff --
    
    nit: `UnboundedPFollowing` -> `UnboundedFollowing `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20400
  
    **[Test build #86676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86676/testReport)** for PR 20400 at commit [`b856860`](https://github.com/apache/spark/commit/b85686099c2effe99319d05422c510658becd8fc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org