You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Hemant Bhanawat <he...@gmail.com> on 2019/12/24 05:05:31 UTC

Regarding structured streaming windows on older data

For demonstration purpose, I was using data that had older timestamps with
structured streaming. The data was for the year 2018, window was of 24
hours and watermark of 0 seconds. Few things that I saw and could not
explain are:
1. The initial batch of streaming had around 60 windows. It processed all
but the last one.
2. The data for a window is not sent to the writer immediately.
3. If I ingest data for 2019 in the midway, it is not processed. In fact,
spark didnt output the 2019 data at all.

Can someone point me to some doc or explanation on how the structured
streaming works with data that has non current timestamps?

Thanks in advance,
Hemant