You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anton Okolnychyi <an...@gmail.com> on 2016/06/27 08:50:02 UTC

Last() Window Function

Hi all!

I am learning Spark SQL and window functions.
The behavior of the last() window function was unexpected for me in one
case(for a person without any previous experience in the window functions).

I define my window specification as follows:
Window.partitionBy('transportType, 'route).orderBy('eventTime).

So, I have neither rowsBetween nor rangeBetween boundaries.
In this scenario, I expect to get the latest event (by time) in a group if
I apply the last('eventTime) window function over this window
specification.
However, this does not happen.

Looking at the code, I was able to figure out that if there are no
range/rows boundaries, the UnspecifiedFrame is assigned. Later, in
ResolveWindowFrame for the last() function, Spark assigns a default window
frame. The default frame depends on the presence of any order specification
(if one has an order specification, the default frame is RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW). That's why the last() window function
does work I as expected in my case. There is a very helpful comment in
SpecifiedWindowFrame. I wish I could find it in the documentation.

That's why I have 2 questions:
- Did I miss the place in the documentation where this behavior is
described? If no, would it be appropriate from my side to try to find where
this can be done?
- Would it be appropriate/useful to add some window function examples to
spark/examples? There are no such so far

Sincerely,
Anton Okolnychyi