You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jacek Laskowski <ja...@japila.pl> on 2019/10/08 09:02:45 UTC

[SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

Hi,

I haven't spent much time on it, but the following DEBUG message
from WatermarkTracker sparked my interest :)

I ran a streaming aggregation in Append mode and got the messages:

19/10/08 10:48:56 DEBUG WatermarkTracker: Observed event time stats 0:
EventTimeStats(15000,1000,8000.0,2)
19/10/08 10:48:56 INFO WatermarkTracker: Updating event-time watermark from
0 to 5000 ms
19/10/08 10:48:56 DEBUG WatermarkTracker: Event time watermark didn't move:
5000 < 5000

I think the DEBUG message "Event time watermark didn't move" seems
incorrect given that the query has just started and "Observed event time
stats". It's true that the event-time watermark didn't move if it was 5000
before, but it was not as it has just started from scratch (no checkpointed
state).

Can anyone shed some light on this? I'll be digging deeper in a bit, but am
hoping to get some more info before. Thanks!

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming
https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Follow me at https://twitter.com/jaceklaskowski

Re: [SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi HeartSaVioR,

> It might be due to empty batch

Yeah...that's my understanding too. It's for a streaming aggregation in
Append output mode so that's possible. I'll have a closer look at it.

Thanks much for keeping up with this and the other questions. Much
appreciated!

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming
https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Follow me at https://twitter.com/jaceklaskowski



On Mon, Oct 14, 2019 at 12:42 AM Jungtaek Lim <ka...@gmail.com>
wrote:

> It might be due to empty batch (activated when there're stateful
> operator(s) and the previous batch advances watermark), which has no input
> so no moving watermark.
>
> Did you only turn on DEBUG for WatermarkTracker? If you turn on DEBUG for
> MicroBatchExecution as well, it would log "Completed batch <batchId>" so if
> I'm not missing, it should be logged between updating event-time watermark
> and watermark didn't move. You can attach streaming query listener and get
> more information about batches.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Tue, Oct 8, 2019 at 6:12 PM Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> I haven't spent much time on it, but the following DEBUG message
>> from WatermarkTracker sparked my interest :)
>>
>> I ran a streaming aggregation in Append mode and got the messages:
>>
>> 19/10/08 10:48:56 DEBUG WatermarkTracker: Observed event time stats 0:
>> EventTimeStats(15000,1000,8000.0,2)
>> 19/10/08 10:48:56 INFO WatermarkTracker: Updating event-time watermark
>> from 0 to 5000 ms
>> 19/10/08 10:48:56 DEBUG WatermarkTracker: Event time watermark didn't
>> move: 5000 < 5000
>>
>> I think the DEBUG message "Event time watermark didn't move" seems
>> incorrect given that the query has just started and "Observed event time
>> stats". It's true that the event-time watermark didn't move if it was 5000
>> before, but it was not as it has just started from scratch (no checkpointed
>> state).
>>
>> Can anyone shed some light on this? I'll be digging deeper in a bit, but
>> am hoping to get some more info before. Thanks!
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> The Internals of Spark SQL https://bit.ly/spark-sql-internals
>> The Internals of Spark Structured Streaming
>> https://bit.ly/spark-structured-streaming
>> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>

Re: [SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

Posted by Jungtaek Lim <ka...@gmail.com>.
It might be due to empty batch (activated when there're stateful
operator(s) and the previous batch advances watermark), which has no input
so no moving watermark.

Did you only turn on DEBUG for WatermarkTracker? If you turn on DEBUG for
MicroBatchExecution as well, it would log "Completed batch <batchId>" so if
I'm not missing, it should be logged between updating event-time watermark
and watermark didn't move. You can attach streaming query listener and get
more information about batches.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Tue, Oct 8, 2019 at 6:12 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I haven't spent much time on it, but the following DEBUG message
> from WatermarkTracker sparked my interest :)
>
> I ran a streaming aggregation in Append mode and got the messages:
>
> 19/10/08 10:48:56 DEBUG WatermarkTracker: Observed event time stats 0:
> EventTimeStats(15000,1000,8000.0,2)
> 19/10/08 10:48:56 INFO WatermarkTracker: Updating event-time watermark
> from 0 to 5000 ms
> 19/10/08 10:48:56 DEBUG WatermarkTracker: Event time watermark didn't
> move: 5000 < 5000
>
> I think the DEBUG message "Event time watermark didn't move" seems
> incorrect given that the query has just started and "Observed event time
> stats". It's true that the event-time watermark didn't move if it was 5000
> before, but it was not as it has just started from scratch (no checkpointed
> state).
>
> Can anyone shed some light on this? I'll be digging deeper in a bit, but
> am hoping to get some more info before. Thanks!
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> The Internals of Spark SQL https://bit.ly/spark-sql-internals
> The Internals of Spark Structured Streaming
> https://bit.ly/spark-structured-streaming
> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
> Follow me at https://twitter.com/jaceklaskowski
>
>