You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by 邢瑞斌 <xi...@gmail.com> on 2019/12/25 12:26:45 UTC

Rewind offset to a previous position and ensure certainty.

Hi,

I'm trying to use Kafka as an event store and I want to create several
partitions to improve read/write throughput. Occasionally I need to rewind
offset to a previous position for recomputing. Since order isn't guaranteed
among partitions in Kafka, does this mean that Flink won't produce the same
results as before when rewind even if it uses event time? For example,
consumer for a partition progresses extremely fast and raises watermark, so
events from other partitions are discarded. Is there any ways to prevent
this from happening?

Thanks in advance!

Ruibin

Re: Rewind offset to a previous position and ensure certainty.

Posted by Zhijiang <wa...@aliyun.com.INVALID>.
If I understood correctly, different partitions of Kafka would be emitted by different source tasks with different watermark progress.  And the Flink framework would align the different watermarks to only output the smallest watermark among them, so the events from slow partitions would not be discarded because the downstream operator would only see the watermark based on the slow partition atm. You can refer to [1] for some details.

As for rewinding the offset of partition position, I guess it only happens in failure recovery case or you manually restart the job. Anyway all the topology tasks would be restarted and previous received watermarks are cleared.
So it would also not discard the events in this case.  Unless you can only rewind some source task to previous positions and keep other downstream tasks still running, it might have the issues you concern. But Flink can not support such operation/function atm. :) 

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/event_timestamps_watermarks.html

Best,
Zhijiang
------------------------------------------------------------------
From:邢瑞斌 <xi...@gmail.com>
Send Time:2019 Dec. 25 (Wed.) 20:27
To:user-zh <us...@flink.apache.org>; user <us...@flink.apache.org>
Subject:Rewind offset to a previous position and ensure certainty.

Hi,

I'm trying to use Kafka as an event store and I want to create several partitions to improve read/write throughput. Occasionally I need to rewind offset to a previous position for recomputing. Since order isn't guaranteed among partitions in Kafka, does this mean that Flink won't produce the same results as before when rewind even if it uses event time? For example, consumer for a partition progresses extremely fast and raises watermark, so events from other partitions are discarded. Is there any ways to prevent this from happening?

Thanks in advance!

Ruibin


Re: Rewind offset to a previous position and ensure certainty.

Posted by vino yang <ya...@gmail.com>.
Hi Ruibin,

Are you finding how to generate watermark pre Kafka partition?
Flink provides Kafka-partition-aware watermark generation. [1]

Best,
Vino

[1]:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_timestamps_watermarks.html#timestamps-per-kafka-partition

邢瑞斌 <xi...@gmail.com> 于2019年12月25日周三 下午8:27写道:

> Hi,
>
> I'm trying to use Kafka as an event store and I want to create several
> partitions to improve read/write throughput. Occasionally I need to rewind
> offset to a previous position for recomputing. Since order isn't guaranteed
> among partitions in Kafka, does this mean that Flink won't produce the same
> results as before when rewind even if it uses event time? For example,
> consumer for a partition progresses extremely fast and raises watermark, so
> events from other partitions are discarded. Is there any ways to prevent
> this from happening?
>
> Thanks in advance!
>
> Ruibin
>

Re: Rewind offset to a previous position and ensure certainty.

Posted by vino yang <ya...@gmail.com>.
Hi Ruibin,

Are you finding how to generate watermark pre Kafka partition?
Flink provides Kafka-partition-aware watermark generation. [1]

Best,
Vino

[1]:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_timestamps_watermarks.html#timestamps-per-kafka-partition

邢瑞斌 <xi...@gmail.com> 于2019年12月25日周三 下午8:27写道:

> Hi,
>
> I'm trying to use Kafka as an event store and I want to create several
> partitions to improve read/write throughput. Occasionally I need to rewind
> offset to a previous position for recomputing. Since order isn't guaranteed
> among partitions in Kafka, does this mean that Flink won't produce the same
> results as before when rewind even if it uses event time? For example,
> consumer for a partition progresses extremely fast and raises watermark, so
> events from other partitions are discarded. Is there any ways to prevent
> this from happening?
>
> Thanks in advance!
>
> Ruibin
>

Re: Rewind offset to a previous position and ensure certainty.

Posted by Zhijiang <wa...@aliyun.com>.
If I understood correctly, different partitions of Kafka would be emitted by different source tasks with different watermark progress.  And the Flink framework would align the different watermarks to only output the smallest watermark among them, so the events from slow partitions would not be discarded because the downstream operator would only see the watermark based on the slow partition atm. You can refer to [1] for some details.

As for rewinding the offset of partition position, I guess it only happens in failure recovery case or you manually restart the job. Anyway all the topology tasks would be restarted and previous received watermarks are cleared.
So it would also not discard the events in this case.  Unless you can only rewind some source task to previous positions and keep other downstream tasks still running, it might have the issues you concern. But Flink can not support such operation/function atm. :) 

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/event_timestamps_watermarks.html

Best,
Zhijiang
------------------------------------------------------------------
From:邢瑞斌 <xi...@gmail.com>
Send Time:2019 Dec. 25 (Wed.) 20:27
To:user-zh <us...@flink.apache.org>; user <us...@flink.apache.org>
Subject:Rewind offset to a previous position and ensure certainty.

Hi,

I'm trying to use Kafka as an event store and I want to create several partitions to improve read/write throughput. Occasionally I need to rewind offset to a previous position for recomputing. Since order isn't guaranteed among partitions in Kafka, does this mean that Flink won't produce the same results as before when rewind even if it uses event time? For example, consumer for a partition progresses extremely fast and raises watermark, so events from other partitions are discarded. Is there any ways to prevent this from happening?

Thanks in advance!

Ruibin