You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Trienis <mi...@orcsol.com> on 2015/08/21 00:32:12 UTC

Spark SQL window functions (RowsBetween)

Hi All,

I would like some clarification regarding window functions for Apache Spark
1.4.0

   -
   https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

In particular, the "rowsBetween"

   * {{{
   *   val w = Window.partitionBy("name").orderBy("id")
   *   df.select(
   *     sum("price").over(w.rangeBetween(Long.MinValue, 2)),
   *     avg("price").over(w.rowsBetween(0, 4))
   *   )
   * }}}


Are any of the window functions available without a hive context? If the
answer is no, then is there any other way to accomplish this without using
hive?

I need to compare the the i[th] row with the [i-1]th row of col2 (sorted by
col1). If item_i of the i[th] row and the item_[i-1] of the [i-1]th row are
different then I need to increment the count of item_[i-1] by 1.


col1| col2
----------------------------------------------------------
1    | item_1
2    | item_1
3    | item_2
4    | item_1
5    | item_2
6    | item_1

In the above example, if we scan two rows at a time downwards,  we see that
row 2 and row 3 are different therefore we add one to item_1. Next, we see
that row 3 is different from row 4, then add one to item_2. Continue until
we end up with:

 col2      | col3
-------------------------------
 item_1  | 2
 item_2  | 2

Thanks, Mike.