You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/11/02 18:37:59 UTC

[jira] [Resolved] (SPARK-17895) Improve documentation of "rowsBetween" and "rangeBetween"

     [ https://issues.apache.org/jira/browse/SPARK-17895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin resolved SPARK-17895.
---------------------------------
       Resolution: Fixed
         Assignee: Weiluo Ren
    Fix Version/s: 2.1.0

> Improve documentation of "rowsBetween" and "rangeBetween"
> ---------------------------------------------------------
>
>                 Key: SPARK-17895
>                 URL: https://issues.apache.org/jira/browse/SPARK-17895
>             Project: Spark
>          Issue Type: Documentation
>          Components: PySpark, SparkR, SQL
>            Reporter: Weiluo Ren
>            Assignee: Weiluo Ren
>            Priority: Minor
>             Fix For: 2.1.0
>
>
> This is an issue found by [~junyangq] when he was fixing SparkR docs.
> In WindowSpec we have two methods "rangeBetween" and "rowsBetween" (See [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala#L82]). However, the description of "rangeBetween" does not clearly differentiate it from "rowsBetween". Even though in [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L109] we have pretty nice description for "RangeFrame" and "RowFrame" which are used in "rangeBetween" and "rowsBetween", I cannot find them in the online Spark scala api. 
> We could add small examples to the description of "rangeBetween" and "rowsBetween" like
> {code}
> val df = Seq(1,1,2).toDF("id")
> df.withColumn("sum", sum('id) over Window.orderBy('id).rangeBetween(0,1)).show
> /**
>  * It shows
>  * +---+---+
>  * | id|sum|
>  * +---+---+
>  * |  1|  4|
>  * |  1|  4|
>  * |  2|  2|
>  * +---+---+
> */
> df.withColumn("sum", sum('id) over Window.orderBy('id).rowsBetween(0,1)).show
> /**
>  * It shows
>  * +---+---+
>  * | id|sum|
>  * +---+---+
>  * |  1|  2|
>  * |  1|  3|
>  * |  2|  2|
>  * +---+---+
> */
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org