You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Trienis <mi...@orcsol.com> on 2015/08/21 00:32:12 UTC
Spark SQL window functions (RowsBetween)
Hi All,
I would like some clarification regarding window functions for Apache Spark
1.4.0
-
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
In particular, the "rowsBetween"
* {{{
* val w = Window.partitionBy("name").orderBy("id")
* df.select(
* sum("price").over(w.rangeBetween(Long.MinValue, 2)),
* avg("price").over(w.rowsBetween(0, 4))
* )
* }}}
Are any of the window functions available without a hive context? If the
answer is no, then is there any other way to accomplish this without using
hive?
I need to compare the the i[th] row with the [i-1]th row of col2 (sorted by
col1). If item_i of the i[th] row and the item_[i-1] of the [i-1]th row are
different then I need to increment the count of item_[i-1] by 1.
col1| col2
----------------------------------------------------------
1 | item_1
2 | item_1
3 | item_2
4 | item_1
5 | item_2
6 | item_1
In the above example, if we scan two rows at a time downwards, we see that
row 2 and row 3 are different therefore we add one to item_1. Next, we see
that row 3 is different from row 4, then add one to item_2. Continue until
we end up with:
col2 | col3
-------------------------------
item_1 | 2
item_2 | 2
Thanks, Mike.