You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tobias Pfeiffer <tg...@preferred.jp> on 2014/12/08 09:56:46 UTC

Count-based windows

Hi,

I am interested in building an application that uses sliding windows not
based on the time when the item was received, but on either
* a timestamp embedded in the data, or
* a count (like: every 10 items, look at the last 100 items).

Also, I want to do this on stream data received from Kafka, but also on
HDFS data (where clearly the aspect "received in" is not present). I found <
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-windowing-Driven-by-absolutely-time-td1733.html#a1843>
as an instruction for how to use the timestamp, but does anyone have a
suggestion on how to use item count as window size constraint?

Thanks
Tobias