You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Jeffrey Martin <je...@gmail.com> on 2020/09/15 05:30:12 UTC

restoring from externalized incremental rocksdb checkpoint?

Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled.
The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when
upgrading the job binary)? Docs say to use the checkpoint or savepoint
metadata file, but AFAICT there's no metadata file in HDFS in the various
directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin

Re: restoring from externalized incremental rocksdb checkpoint?

Posted by Congxian Qiu <qc...@gmail.com>.

Hi  Jeff
   Sorry for the late reply.  You can only restore the checkpoint in which
there is a _metadata in the chk-xxx directory, if there is not _metadata in
the chk-xxx directory, that means the chk-xxx is not complete, you can't
restore from it.

Best,
Congxian


Jeffrey Martin <je...@gmail.com> 于2020年9月15日周二 下午2:18写道：

> Thanks for the quick reply Congxian.
>
> The non-empty chk-N directories I looked at contained only files whose
> names are UUIDs. Nothing named _metadata (unless HDFS hides files that
> start with an underscore?).
>
> Just to be clear though -- I should expect a metadata file when using
> incremental checkpoints?
>
> On Mon, Sep 14, 2020 at 10:46 PM Congxian Qiu <qc...@gmail.com>
> wrote:
>
>> Hi Jeff
>>    You can restore from retained checkpoint such as[1] `bin/flink run -s
>> :checkpointMetaDataPath [:runArgs]` ,  you may find the metadata in the
>> `chk-xxx` directory[2]
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#directory-structure
>> Best,
>> Congxian
>>
>>
>> Jeffrey Martin <je...@gmail.com> 于2020年9月15日周二 下午1:30写道：
>>
>>> Hi,
>>>
>>> My job on Flink 1.10 uses RocksDB with incremental checkpointing
>>> enabled. The checkpoints are retained on cancellation.
>>>
>>> How do I resume from the retained checkpoint after cancellation (e.g.,
>>> when upgrading the job binary)? Docs say to use the checkpoint or savepoint
>>> metadata file, but AFAICT there's no metadata file in HDFS in the various
>>> directories under "$checkpointsDir/snapshots/$jobID",
>>>
>>> Thanks,
>>>
>>> Jeff Martin
>>>
>>>
>>>
>>
>>
>
>

Re: restoring from externalized incremental rocksdb checkpoint?

Posted by Jeffrey Martin <je...@gmail.com>.

Thanks for the quick reply Congxian.

The non-empty chk-N directories I looked at contained only files whose
names are UUIDs. Nothing named _metadata (unless HDFS hides files that
start with an underscore?).

Just to be clear though -- I should expect a metadata file when using
incremental checkpoints?

On Mon, Sep 14, 2020 at 10:46 PM Congxian Qiu <qc...@gmail.com>
wrote:

> Hi Jeff
>    You can restore from retained checkpoint such as[1] `bin/flink run -s
> :checkpointMetaDataPath [:runArgs]` ,  you may find the metadata in the
> `chk-xxx` directory[2]
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#directory-structure
> Best,
> Congxian
>
>
> Jeffrey Martin <je...@gmail.com> 于2020年9月15日周二 下午1:30写道：
>
>> Hi,
>>
>> My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled.
>> The checkpoints are retained on cancellation.
>>
>> How do I resume from the retained checkpoint after cancellation (e.g.,
>> when upgrading the job binary)? Docs say to use the checkpoint or savepoint
>> metadata file, but AFAICT there's no metadata file in HDFS in the various
>> directories under "$checkpointsDir/snapshots/$jobID",
>>
>> Thanks,
>>
>> Jeff Martin
>>
>>
>>
>
>

Re: restoring from externalized incremental rocksdb checkpoint?

Posted by Congxian Qiu <qc...@gmail.com>.

Hi Jeff
   You can restore from retained checkpoint such as[1] `bin/flink run -s
:checkpointMetaDataPath [:runArgs]` ,  you may find the metadata in the
`chk-xxx` directory[2]

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#directory-structure
Best,
Congxian


Jeffrey Martin <je...@gmail.com> 于2020年9月15日周二 下午1:30写道：

> Hi,
>
> My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled.
> The checkpoints are retained on cancellation.
>
> How do I resume from the retained checkpoint after cancellation (e.g.,
> when upgrading the job binary)? Docs say to use the checkpoint or savepoint
> metadata file, but AFAICT there's no metadata file in HDFS in the various
> directories under "$checkpointsDir/snapshots/$jobID",
>
> Thanks,
>
> Jeff Martin
>