You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Lorenzo Nicora <lo...@gmail.com> on 2020/07/22 07:27:45 UTC

Changing watermark in the middle of a flow

Hi

I have a linear streaming flow with a single source and multiple sinks to
publish intermediate results.
The time characteristic is Event Time and I am adding
one AssignerWithPeriodicWatermarks immediately after the source.
I need to add a different assigner, in the middle of the flow, to change
the event time - i.e. extracting a different field from the record as event
time.

I am not sure I completely understand the implications of changing
event time and watermark in the middle of a flow.

Can anybody give me a hint or direct me to any relevant documentation?

Lorenzo

Re: Changing watermark in the middle of a flow

Posted by Kostas Kloudas <kk...@gmail.com>.
Hi Lorenzo,

If you want to learn how Flink uses watermarks, it is worth checking [1].

Now in a nutshell, what a watermark will do in a pipeline is that it
may fire timers that you may have registered, or windows that you may
have accumulated.
If you have no time-sensitive operations between the first and the
second watermark generators, then I do not think you have to worry
(although it would help if you could share a bit more about your
pipeline in order to have a more educated estimation). If you have
windows, then your windows will fire and the emitted elements will
have the timestamp of the end of the window.

After the second watermark assigner, the watermarks coming from the
previous one are discarded and they are not flowing in the pipeline
anymore. You will only have the new watermarks.

A problem may arise if, for example, the second watermark generator
emits watermarks with smaller values than the first but the timestamps
of the elements are assigned based on the progress of the first
generator (e.g. windows fired) and now all your elements are
considered "late".

I hope that the above paint the big picture of what is happening in
your pipeline.

Again, I may be missing something so feel free to send more details
about your pipeline so that we can help a bit more.

Cheers,
Kostas

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/timely-stream-processing.html

On Wed, Jul 22, 2020 at 9:35 AM Lorenzo Nicora <lo...@gmail.com> wrote:
>
> Hi
>
> I have a linear streaming flow with a single source and multiple sinks to publish intermediate results.
> The time characteristic is Event Time and I am adding one AssignerWithPeriodicWatermarks immediately after the source.
> I need to add a different assigner, in the middle of the flow, to change the event time - i.e. extracting a different field from the record as event time.
>
> I am not sure I completely understand the implications of changing event time and watermark in the middle of a flow.
>
> Can anybody give me a hint or direct me to any relevant documentation?
>
> Lorenzo