You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Kostas Kloudas (JIRA)" <ji...@apache.org> on 2018/03/16 09:14:00 UTC

[jira] [Comment Edited] (FLINK-5601) Window operator does not checkpoint watermarks

    [ https://issues.apache.org/jira/browse/FLINK-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401627#comment-16401627 ] 

Kostas Kloudas edited comment on FLINK-5601 at 3/16/18 9:13 AM:
----------------------------------------------------------------

This is an important and known issue.

It was also encountered during the implementation of the triggerDSL (which was not finally merged).

Back then we did not have union state. I think that now, the {{TimestampsAndPeriodicWatermarksOperator}} (and not the {{windowOperator}} as users may play with the watermark in many places in their pipeline) or the source (whoever has the watermark emitter) should checkpoint its watermark in a union state, and upon restoring/rescaling, its task should take its previously checkpointed watermark, and in case of scaling up, the new tasks should take the minimum of the previously checkpointed watermarks.

 

What do you think?


was (Author: kkl0u):
This is an important and known issue.

It was also encountered during the implementation of the triggerDSL (which was not finally merged).

Back then we did not have union state. I think that now, the TimestampsAndPeriodicWatermarksOperator or the source (whoever has the watermark emitter) should checkpoint its watermark in a union state, and upon restoring/rescaling, its task should take its previously checkpointed watermark, and in case of scaling up, the new tasks should take the minimum of the previously checkpointed watermarks.

 

What do you think?

> Window operator does not checkpoint watermarks
> ----------------------------------------------
>
>                 Key: FLINK-5601
>                 URL: https://issues.apache.org/jira/browse/FLINK-5601
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>            Reporter: Ufuk Celebi
>            Priority: Major
>
> During release testing [~stefanrichter83@gmail.com] and I noticed that watermarks are not checkpointed in the window operator.
> This can lead to non determinism when restoring checkpoints. I was running an adjusted {{SessionWindowITCase}} via Kafka for testing migration and rescaling and ran into failures, because the data generator required determinisitic behaviour.
> What happened was that on restore it could happen that late elements were not dropped, because the watermarks needed to be re-established after restore first.
> [~aljoscha] Do you know whether there is a special reason for explicitly not checkpointing watermarks?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)