You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "Muzzammil Ameen (muameen)" <mu...@cisco.com> on 2022/09/22 07:53:32 UTC

Difference between Checkpoint and Savepoint

Hi folks,
I would like to know the difference between externalized checkpoint and savepoint. Regarding externalized checkpoint, is the checkpoint written to persistent storage only if the job is failed or suspended? What about cancelled or killed by the user? What information is written to persistent storage when externalzed checkpoint is created? How is it different when compared to savepoint?
Consider this scenario, I have killed a flink job and would like to restart the job while restoring previous state. I have both options – restore from latest checkpoint and savepoint. Which should one choose for job restore and why?



Regards,
Muzzammil Ameen
muameen@cisco.com

Re: Difference between Checkpoint and Savepoint

Posted by Hangxiang Yu <ma...@gmail.com>.

Hi,
> Regarding externalized checkpoint, is the checkpoint written to
persistent storage only if the job is failed or suspended? What about
cancelled or killed by the user?
The checkpoint will be retained on cancellation and failure if you
configure RETAIN_ON_CANCELLATION.

> What information is written to persistent storage when externalized checkpoint
is created?
Actually, the externalized checkpoint works just by retaining the latest
checkpoint. So the content is the same as the running checkpoint.

> How is it different when compared to savepoint?
Because the externalized checkpoint is just a specific checkpoint whose
life cycle is longer than the running checkpoint,  so the basic difference
between it and savepoints is just what martijn pointed out.

In my opinion, both could help us to reuse the state. But different formats
have their limitations as you could see in the link (the canonical
savepoint is the slowest).
You could choose by:
1. speed (the externalized checkpoint doesn't rely on extra operation)
2. compatibility (the canonical savepoint has the best compatibility)
3. Operability (savepoint could be triggered manually in any time when the
externalized checkpoint is determined by checkpoint interval)


On Thu, Sep 22, 2022 at 11:05 PM Martijn Visser <ma...@apache.org>
wrote:

> Hi,
>
> There's a specific page on the Flink documentation which explains the
> difference between checkpoints and savepoints at
> https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/state/checkpoints_vs_savepoints/
> - Have you read this? If so, what's missing so that we can improve the page?
>
> Best regards,
>
> Martijn
>
> On Thu, Sep 22, 2022 at 9:55 AM Muzzammil Ameen (muameen) <
> muameen@cisco.com> wrote:
>
>> Hi folks,
>>
>> I would like to know the difference between externalized checkpoint and
>> savepoint. Regarding externalized checkpoint, is the checkpoint written to
>> persistent storage only if the job is failed or suspended? What about
>> cancelled or killed by the user? What information is written to persistent
>> storage when externalzed checkpoint is created? How is it different when
>> compared to savepoint?
>>
>> Consider this scenario, I have killed a flink job and would like to
>> restart the job while restoring previous state. I have both options –
>> restore from latest checkpoint and savepoint. Which should one choose for
>> job restore and why?
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>>
>> Muzzammil Ameen
>>
>> muameen@cisco.com
>>
>

-- 
Best,
Hangxiang.

Re: Difference between Checkpoint and Savepoint

Posted by Martijn Visser <ma...@apache.org>.

Hi,

There's a specific page on the Flink documentation which explains the
difference between checkpoints and savepoints at
https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/state/checkpoints_vs_savepoints/
- Have you read this? If so, what's missing so that we can improve the page?

Best regards,

Martijn

On Thu, Sep 22, 2022 at 9:55 AM Muzzammil Ameen (muameen) <mu...@cisco.com>
wrote:

> Hi folks,
>
> I would like to know the difference between externalized checkpoint and
> savepoint. Regarding externalized checkpoint, is the checkpoint written to
> persistent storage only if the job is failed or suspended? What about
> cancelled or killed by the user? What information is written to persistent
> storage when externalzed checkpoint is created? How is it different when
> compared to savepoint?
>
> Consider this scenario, I have killed a flink job and would like to
> restart the job while restoring previous state. I have both options –
> restore from latest checkpoint and savepoint. Which should one choose for
> job restore and why?
>
>
>
>
>
>
>
> Regards,
>
> Muzzammil Ameen
>
> muameen@cisco.com
>