You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by Eleanore Jin <el...@gmail.com> on 2020/03/10 16:43:04 UTC

Is incremental checkpoints needed?

Hi All,

I am using Apache Beam to construct the pipeline, and this pipeline is
running with Flink Runner.

Both Source and Sink are Kafka topics, I have enabled Beam Exactly once
semantics.

I believe how it works in beam is:
the messages will be cached and not processed by the KafkaExactlyOnceSink,
until the checkpoint completes and all the cached messages are
checkpointed, then it will start processing those messages.

So is there any benefit to enable increment checkpointing when using
RocksDB as backend. Because I see the states as consumer offsets, and
cached messages in between checkpoints. Delta seems to be the complete new
checkpointed states.

Thanks a lot!
Eleanore

Re: Is incremental checkpoints needed?

Posted by Eleanore Jin <el...@gmail.com>.
Hi Arvid,

Thank you for the clarification!

Best,
Eleanore


On Tue, Mar 10, 2020 at 12:32 PM Arvid Heise <ar...@ververica.com> wrote:

> Hi Eleanore,
>
> incremental checkpointing would be needed if you have a large state
> (GB-TB), but between two checkpoints only little changes happen (KB-MB).
>
> There are two reasons for large state: large user state or large operator
> state coming from joins, windows, or grouping. In the end, you will see the
> total size in the web ui. If it's small and checkpointing duration is low,
> there is absolutely no way to go incremental.
>
> On Tue, Mar 10, 2020 at 5:43 PM Eleanore Jin <el...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am using Apache Beam to construct the pipeline, and this pipeline is
>> running with Flink Runner.
>>
>> Both Source and Sink are Kafka topics, I have enabled Beam Exactly once
>> semantics.
>>
>> I believe how it works in beam is:
>> the messages will be cached and not processed by the
>> KafkaExactlyOnceSink, until the checkpoint completes and all the cached
>> messages are checkpointed, then it will start processing those messages.
>>
>> So is there any benefit to enable increment checkpointing when using
>> RocksDB as backend. Because I see the states as consumer offsets, and
>> cached messages in between checkpoints. Delta seems to be the complete new
>> checkpointed states.
>>
>> Thanks a lot!
>> Eleanore
>>
>

Re: Is incremental checkpoints needed?

Posted by Eleanore Jin <el...@gmail.com>.
Hi Arvid,

Thank you for the clarification!

Best,
Eleanore


On Tue, Mar 10, 2020 at 12:32 PM Arvid Heise <ar...@ververica.com> wrote:

> Hi Eleanore,
>
> incremental checkpointing would be needed if you have a large state
> (GB-TB), but between two checkpoints only little changes happen (KB-MB).
>
> There are two reasons for large state: large user state or large operator
> state coming from joins, windows, or grouping. In the end, you will see the
> total size in the web ui. If it's small and checkpointing duration is low,
> there is absolutely no way to go incremental.
>
> On Tue, Mar 10, 2020 at 5:43 PM Eleanore Jin <el...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am using Apache Beam to construct the pipeline, and this pipeline is
>> running with Flink Runner.
>>
>> Both Source and Sink are Kafka topics, I have enabled Beam Exactly once
>> semantics.
>>
>> I believe how it works in beam is:
>> the messages will be cached and not processed by the
>> KafkaExactlyOnceSink, until the checkpoint completes and all the cached
>> messages are checkpointed, then it will start processing those messages.
>>
>> So is there any benefit to enable increment checkpointing when using
>> RocksDB as backend. Because I see the states as consumer offsets, and
>> cached messages in between checkpoints. Delta seems to be the complete new
>> checkpointed states.
>>
>> Thanks a lot!
>> Eleanore
>>
>

Re: Is incremental checkpoints needed?

Posted by Arvid Heise <ar...@ververica.com>.
Hi Eleanore,

incremental checkpointing would be needed if you have a large state
(GB-TB), but between two checkpoints only little changes happen (KB-MB).

There are two reasons for large state: large user state or large operator
state coming from joins, windows, or grouping. In the end, you will see the
total size in the web ui. If it's small and checkpointing duration is low,
there is absolutely no way to go incremental.

On Tue, Mar 10, 2020 at 5:43 PM Eleanore Jin <el...@gmail.com> wrote:

> Hi All,
>
> I am using Apache Beam to construct the pipeline, and this pipeline is
> running with Flink Runner.
>
> Both Source and Sink are Kafka topics, I have enabled Beam Exactly once
> semantics.
>
> I believe how it works in beam is:
> the messages will be cached and not processed by the KafkaExactlyOnceSink,
> until the checkpoint completes and all the cached messages are
> checkpointed, then it will start processing those messages.
>
> So is there any benefit to enable increment checkpointing when using
> RocksDB as backend. Because I see the states as consumer offsets, and
> cached messages in between checkpoints. Delta seems to be the complete new
> checkpointed states.
>
> Thanks a lot!
> Eleanore
>