You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/06 15:03:29 UTC

[GitHub] [spark] srowen commented on a change in pull request #23946: [SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation

srowen commented on a change in pull request #23946: [SPARK-26860][PySpark] [SparkR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation
URL: https://github.com/apache/spark/pull/23946#discussion_r262980228
 
 

 ##########
 File path: python/pyspark/sql/window.py
 ##########
 @@ -97,6 +97,32 @@ def rowsBetween(start, end):
         and ``Window.currentRow`` to specify special boundary values, rather than using integral
         values directly.
 
+        A row based boundary is based on the position of the row within the partition.
+        An offset indicates the number of rows above or below the current row, the frame for the
+        current row starts or ends. For instance, given a row based sliding frame with a lower bound
+        offset of -1 and a upper bound offset of +2. The frame for row with index 5 would range from
+        index 4 to index 6.
+
+        >>> from pyspark.sql import Window
 
 Review comment:
   These tests seem to fail:
   ```
   **********************************************************************
   File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/sql/window.py", line 173, in __main__.Window.rangeBetween
   Failed example:
       df.withColumn("sum", func.sum("id").over(window)).show()
   Expected nothing
   Got:
       +---+--------+---+
       | id|category|sum|
       +---+--------+---+
       |  1|       b|  3|
       |  2|       b|  5|
       |  3|       b|  3|
       |  1|       a|  4|
       |  1|       a|  4|
       |  2|       a|  2|
       +---+--------+---+
       <BLANKLINE>
   **********************************************************************
   File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/sql/window.py", line 114, in __main__.Window.rowsBetween
   Failed example:
       df.withColumn("sum", func.sum("id").over(window)).show()
   Expected nothing
   Got:
       +---+--------+---+
       | id|category|sum|
       +---+--------+---+
       |  1|       b|  3|
       |  2|       b|  5|
       |  3|       b|  3|
       |  1|       a|  2|
       |  1|       a|  3|
       |  2|       a|  2|
       +---+--------+---+
       <BLANKLINE>
   **********************************************************************
      1 of   9 in __main__.Window.rangeBetween
      1 of   9 in __main__.Window.rowsBetween
   ***Test Failed*** 2 failures.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org