You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by John O <so...@samsung.com> on 2018/08/15 01:44:10 UTC

watermark does not progress

I am noticing that watermark does not progress as expected when running locally in IDE. It just stays at Long.MIN

I am using EventTime processing and have tried both these time extractors.

*         assignAscendingTimestamps ...

*         assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor) ...

Also, configured the environment as so
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

If I run the job on a flink cluster, I do see the watermark progress.

Is watermarking not supported in local mode?

Thanks
Jo

Re: watermark does not progress

Posted by Hequn Cheng <ch...@gmail.com>.
Hi Jo,

Thanks for letting us know.

Best, Hequn

On Fri, Aug 17, 2018 at 2:12 AM, John O <so...@samsung.com> wrote:

> Just wanted to post an update on this.
>
>
>
> Problem was my dataset. I was using a kafka topic with multiple partitions
> but only generated data for a single key. This meant that in a
> parallelism>1 environment, some sources will never get any data and
> watermark. After “keyby”, the next processor will have to choose which
> watermark to use from the multiple sources(the lowest value) thus never
> progressing the watermark.
>
>
>
>
>
> Jo
>
>
>
>
>
>
>
> *From:* Hequn Cheng <ch...@gmail.com>
> *Sent:* Wednesday, August 15, 2018 6:38 AM
> *To:* John O <so...@samsung.com>
> *Cc:* Fabian Hueske <fh...@gmail.com>; vino yang <ya...@gmail.com>;
> user <us...@flink.apache.org>
>
> *Subject:* Re: watermark does not progress
>
>
>
> Hi John,
>
>
>
> I guess the source data of local are different from the cluster. And as
> Fabian said, it is probably that some partitions don't carry data.
> As a choice, you can set job parallelism to 1 and check the result.
>
>
>
> Best, Hequn
>
>
>
> On Wed, Aug 15, 2018 at 5:22 PM, John O <so...@samsung.com> wrote:
>
> I did some more testing.
>
> Below is a pseudo version of by setup.
>
> kafkaconsumer->
> assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)->
> process(print1 ctx.timerService().currentWatermark()) ->
> keyBy(_.someProp) ->
> process(print2 ctx.timerService().currentWatermark()) ->
>
> I am manually sending monotonically increasing (eventtime ) records to
> kafka topic.
>
> What I see is in print1 I see expected watermark
>
> But print2 is always Long.MIN
>
> It looks like keyBy wipes out the watermark.
>
>
>
> Now, if I run the exact same code on a flink cluster, print2 outputs
> expected watermark.
>
>
>
> Jo
>
>
>
> *From:* Fabian Hueske <fh...@gmail.com>
> *Sent:* Wednesday, August 15, 2018 2:07 AM
> *To:* vino yang <ya...@gmail.com>
> *Cc:* John O <so...@samsung.com>; user <us...@flink.apache.org>
> *Subject:* Re: watermark does not progress
>
>
>
> Hi John,
>
>
>
> Watermarks cannot make progress if you have stream partitions that do not
> carry any data.
>
> What kind of source are you using?
>
>
>
> Best,
>
> Fabian
>
>
>
> 2018-08-15 4:25 GMT+02:00 vino yang <ya...@gmail.com>:
>
> Hi Johe,
>
>
>
> In local mode, it should also work.
>
> When you debug, you can set a breakpoint in the getCurrentWatermark method
> to see if you can enter the method and if the behavior is what you expect.
>
> What is your source? If you post your code, it might be easier to locate.
>
> In addition, for positioning watermark, you can also refer to this
> email[1].
>
>
>
> [1]: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Debugging-watermarks-td7065.html
>
>
>
> Thanks, vino.
>
>
>
> John O <so...@samsung.com> 于2018年8月15日周三 上午9:44写道:
>
> I am noticing that watermark does not progress as expected when running
> locally in IDE. It just stays at Long.MIN
>
>
>
> I am using EventTime processing and have tried both these time extractors.
>
> ·         assignAscendingTimestamps ...
>
> ·         assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)
> ...
>
>
>
> Also, configured the environment as so
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>
>
>
> If I run the job on a flink cluster, I do see the watermark progress.
>
>
>
> Is watermarking not supported in local mode?
>
>
>
> Thanks
>
> Jo
>
>
>
>
>

RE: watermark does not progress

Posted by John O <so...@samsung.com>.
Just wanted to post an update on this.

Problem was my dataset. I was using a kafka topic with multiple partitions but only generated data for a single key. This meant that in a parallelism>1 environment, some sources will never get any data and watermark. After “keyby”, the next processor will have to choose which watermark to use from the multiple sources(the lowest value) thus never progressing the watermark.


Jo



From: Hequn Cheng <ch...@gmail.com>
Sent: Wednesday, August 15, 2018 6:38 AM
To: John O <so...@samsung.com>
Cc: Fabian Hueske <fh...@gmail.com>; vino yang <ya...@gmail.com>; user <us...@flink.apache.org>
Subject: Re: watermark does not progress

Hi John,

I guess the source data of local are different from the cluster. And as Fabian said, it is probably that some partitions don't carry data.
As a choice, you can set job parallelism to 1 and check the result.

Best, Hequn

On Wed, Aug 15, 2018 at 5:22 PM, John O <so...@samsung.com>> wrote:
I did some more testing.
Below is a pseudo version of by setup.

kafkaconsumer->
assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)->
process(print1 ctx.timerService().currentWatermark()) ->
keyBy(_.someProp) ->
process(print2 ctx.timerService().currentWatermark()) ->

I am manually sending monotonically increasing (eventtime ) records to kafka topic.

What I see is in print1 I see expected watermark
But print2 is always Long.MIN

It looks like keyBy wipes out the watermark.

Now, if I run the exact same code on a flink cluster, print2 outputs expected watermark.

Jo

From: Fabian Hueske <fh...@gmail.com>>
Sent: Wednesday, August 15, 2018 2:07 AM
To: vino yang <ya...@gmail.com>>
Cc: John O <so...@samsung.com>>; user <us...@flink.apache.org>>
Subject: Re: watermark does not progress

Hi John,

Watermarks cannot make progress if you have stream partitions that do not carry any data.
What kind of source are you using?

Best,
Fabian

2018-08-15 4:25 GMT+02:00 vino yang <ya...@gmail.com>>:
Hi Johe,

In local mode, it should also work.
When you debug, you can set a breakpoint in the getCurrentWatermark method to see if you can enter the method and if the behavior is what you expect.
What is your source? If you post your code, it might be easier to locate.
In addition, for positioning watermark, you can also refer to this email[1].

[1]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Debugging-watermarks-td7065.html

Thanks, vino.

John O <so...@samsung.com>> 于2018年8月15日周三 上午9:44写道:
I am noticing that watermark does not progress as expected when running locally in IDE. It just stays at Long.MIN

I am using EventTime processing and have tried both these time extractors.

•         assignAscendingTimestamps ...

•         assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor) ...

Also, configured the environment as so
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

If I run the job on a flink cluster, I do see the watermark progress.

Is watermarking not supported in local mode?

Thanks
Jo



Re: watermark does not progress

Posted by Hequn Cheng <ch...@gmail.com>.
Hi John,

I guess the source data of local are different from the cluster. And as
Fabian said, it is probably that some partitions don't carry data.
As a choice, you can set job parallelism to 1 and check the result.

Best, Hequn

On Wed, Aug 15, 2018 at 5:22 PM, John O <so...@samsung.com> wrote:

> I did some more testing.
>
> Below is a pseudo version of by setup.
>
> kafkaconsumer->
> assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)->
> process(print1 ctx.timerService().currentWatermark()) ->
> keyBy(_.someProp) ->
> process(print2 ctx.timerService().currentWatermark()) ->
>
> I am manually sending monotonically increasing (eventtime ) records to
> kafka topic.
>
> What I see is in print1 I see expected watermark
>
> But print2 is always Long.MIN
>
> It looks like keyBy wipes out the watermark.
>
>
>
> Now, if I run the exact same code on a flink cluster, print2 outputs
> expected watermark.
>
>
>
> Jo
>
>
>
> *From:* Fabian Hueske <fh...@gmail.com>
> *Sent:* Wednesday, August 15, 2018 2:07 AM
> *To:* vino yang <ya...@gmail.com>
> *Cc:* John O <so...@samsung.com>; user <us...@flink.apache.org>
> *Subject:* Re: watermark does not progress
>
>
>
> Hi John,
>
>
>
> Watermarks cannot make progress if you have stream partitions that do not
> carry any data.
>
> What kind of source are you using?
>
>
>
> Best,
>
> Fabian
>
>
>
> 2018-08-15 4:25 GMT+02:00 vino yang <ya...@gmail.com>:
>
> Hi Johe,
>
>
>
> In local mode, it should also work.
>
> When you debug, you can set a breakpoint in the getCurrentWatermark method
> to see if you can enter the method and if the behavior is what you expect.
>
> What is your source? If you post your code, it might be easier to locate.
>
> In addition, for positioning watermark, you can also refer to this
> email[1].
>
>
>
> [1]: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Debugging-watermarks-td7065.html
>
>
>
> Thanks, vino.
>
>
>
> John O <so...@samsung.com> 于2018年8月15日周三 上午9:44写道:
>
> I am noticing that watermark does not progress as expected when running
> locally in IDE. It just stays at Long.MIN
>
>
>
> I am using EventTime processing and have tried both these time extractors.
>
> ·         assignAscendingTimestamps ...
>
> ·         assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)
> ...
>
>
>
> Also, configured the environment as so
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>
>
>
> If I run the job on a flink cluster, I do see the watermark progress.
>
>
>
> Is watermarking not supported in local mode?
>
>
>
> Thanks
>
> Jo
>
>
>

RE: watermark does not progress

Posted by John O <so...@samsung.com>.
I did some more testing.

Below is a pseudo version of by setup.

kafkaconsumer->
assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)->
process(print1 ctx.timerService().currentWatermark()) ->
keyBy(_.someProp) ->
process(print2 ctx.timerService().currentWatermark()) ->

I am manually sending monotonically increasing (eventtime ) records to kafka topic.

What I see is in print1 I see expected watermark
But print2 is always Long.MIN

It looks like keyBy wipes out the watermark.

Now, if I run the exact same code on a flink cluster, print2 outputs expected watermark.

Jo


From: Fabian Hueske <fh...@gmail.com>
Sent: Wednesday, August 15, 2018 2:07 AM
To: vino yang <ya...@gmail.com>
Cc: John O <so...@samsung.com>; user <us...@flink.apache.org>
Subject: Re: watermark does not progress

Hi John,

Watermarks cannot make progress if you have stream partitions that do not carry any data.
What kind of source are you using?

Best,
Fabian

2018-08-15 4:25 GMT+02:00 vino yang <ya...@gmail.com>>:
Hi Johe,

In local mode, it should also work.
When you debug, you can set a breakpoint in the getCurrentWatermark method to see if you can enter the method and if the behavior is what you expect.
What is your source? If you post your code, it might be easier to locate.
In addition, for positioning watermark, you can also refer to this email[1].

[1]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Debugging-watermarks-td7065.html

Thanks, vino.

John O <so...@samsung.com>> 于2018年8月15日周三 上午9:44写道:
I am noticing that watermark does not progress as expected when running locally in IDE. It just stays at Long.MIN

I am using EventTime processing and have tried both these time extractors.

•         assignAscendingTimestamps ...

•         assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor) ...

Also, configured the environment as so
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

If I run the job on a flink cluster, I do see the watermark progress.

Is watermarking not supported in local mode?

Thanks
Jo


Re: watermark does not progress

Posted by Fabian Hueske <fh...@gmail.com>.
Hi John,

Watermarks cannot make progress if you have stream partitions that do not
carry any data.
What kind of source are you using?

Best,
Fabian

2018-08-15 4:25 GMT+02:00 vino yang <ya...@gmail.com>:

> Hi Johe,
>
> In local mode, it should also work.
> When you debug, you can set a breakpoint in the getCurrentWatermark method
> to see if you can enter the method and if the behavior is what you expect.
> What is your source? If you post your code, it might be easier to locate.
> In addition, for positioning watermark, you can also refer to this
> email[1].
>
> [1]: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Debugging-watermarks-td7065.html
>
> Thanks, vino.
>
> John O <so...@samsung.com> 于2018年8月15日周三 上午9:44写道:
>
>> I am noticing that watermark does not progress as expected when running
>> locally in IDE. It just stays at Long.MIN
>>
>>
>>
>> I am using EventTime processing and have tried both these time
>> extractors.
>>
>> ·         assignAscendingTimestamps ...
>>
>> ·         assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)
>> ...
>>
>>
>>
>> Also, configured the environment as so
>> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>>
>>
>>
>> If I run the job on a flink cluster, I do see the watermark progress.
>>
>>
>>
>> Is watermarking not supported in local mode?
>>
>>
>>
>> Thanks
>>
>> Jo
>>
>

Re: watermark does not progress

Posted by vino yang <ya...@gmail.com>.
Hi Johe,

In local mode, it should also work.
When you debug, you can set a breakpoint in the getCurrentWatermark method
to see if you can enter the method and if the behavior is what you expect.
What is your source? If you post your code, it might be easier to locate.
In addition, for positioning watermark, you can also refer to this email[1].

[1]:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Debugging-watermarks-td7065.html

Thanks, vino.

John O <so...@samsung.com> 于2018年8月15日周三 上午9:44写道:

> I am noticing that watermark does not progress as expected when running
> locally in IDE. It just stays at Long.MIN
>
>
>
> I am using EventTime processing and have tried both these time extractors.
>
> ·         assignAscendingTimestamps ...
>
> ·         assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)
> ...
>
>
>
> Also, configured the environment as so
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>
>
>
> If I run the job on a flink cluster, I do see the watermark progress.
>
>
>
> Is watermarking not supported in local mode?
>
>
>
> Thanks
>
> Jo
>