You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Fanbin Bu <fa...@coinbase.com> on 2020/03/25 18:53:29 UTC

savepoint - checkpoint - directory

Hi,

For savepoint, the dir looks like
s3://bucket/savepoint-jobid/*

To resume, i do:
flink run -s s3://bucket/savepoint-jobid/
perfect!


For checkpoint, the dir looks like
s3://bucket/jobid/chk-100
s3://bucket/jobid/shared.   <-- what is this for?

To resume, which one should i do:
flink run -s s3://bucket/jobid
or
flink run -s s3://bucket/jobid/chk-100


Another question, I saw that `flink cancel` is deprecated and recommend to
use `flink stop`. But isn't this causing production down time? In order to
avoid down time, is it recommended to just do `flink savepoint`?

Thanks,
Fanbin

Re: savepoint - checkpoint - directory

Posted by Yun Tang <my...@live.com>.
Hi Fanbin

To resume from checkpoint, you should provide at least the directory named as /path/chk-x or /path/chk-x/_metadata. The sub-dir named as “shared” is used to store incremental  checkpoint content. You could refer to [1] for more information.

BTW, stop with savepoint could help reduce source rewind time.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure

获取 Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Fanbin Bu <fa...@coinbase.com>
Sent: Thursday, March 26, 2020 2:53:29 AM
To: user <us...@flink.apache.org>
Subject: savepoint - checkpoint - directory

Hi,

For savepoint, the dir looks like
s3://bucket/savepoint-jobid/*

To resume, i do:
flink run -s s3://bucket/savepoint-jobid/
perfect!


For checkpoint, the dir looks like
s3://bucket/jobid/chk-100
s3://bucket/jobid/shared.   <-- what is this for?

To resume, which one should i do:
flink run -s s3://bucket/jobid
or
flink run -s s3://bucket/jobid/chk-100


Another question, I saw that `flink cancel` is deprecated and recommend to use `flink stop`. But isn't this causing production down time? In order to avoid down time, is it recommended to just do `flink savepoint`?

Thanks,
Fanbin