You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Rakshit Ramesh <ra...@datakaveri.org> on 2021/09/17 09:20:10 UTC

Re: Savepoints with bootstraping a datastream function

Hi Arvid.
I went through the code, confluence and jira on FLIP-147.
I couldn't determine if it's possible to manually trigger a
savepoint/checkpoint as
I couldn't find any javadoc apis for the same.
Also, would the setting "ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH" still
create a checkpoint
if my entire bounded job finishes before the checkpoint interval?

On Tue, 13 Jul 2021 at 17:58, Rakshit Ramesh <ra...@datakaveri.org>
wrote:

> That's great news!
> Thanks.
>
> On Tue, 13 Jul 2021 at 14:17, Arvid Heise <ar...@apache.org> wrote:
>
>> The only known workaround is to provide your own source(function) that
>> doesn't finish until all of the source subtasks finish and a final
>> checkpoint is completed. However, coordinating the sources with the old
>> SourceFunction interface requires some external tool.
>>
>> FLIP-147 is targeted for 1.14 in August.
>>
>> On Sat, Jul 10, 2021 at 7:46 PM Rakshit Ramesh <
>> rakshit.ramesh@datakaveri.org> wrote:
>>
>>> Hi Arvid,
>>> Since I'm trying to save checkpoints for a bounded process
>>> the checkpoint isn't being written on time since the job finishes before
>>> that can happen.
>>>
>>> Looks like one major feature that would be required for this to work is
>>> discussed in FLIP-147
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished
>>>
>>> Is there any userland workaround for this?
>>>
>>> Thanks!
>>>
>>> On Thu, 8 Jul 2021 at 11:52, Rakshit Ramesh <
>>> rakshit.ramesh@datakaveri.org> wrote:
>>>
>>>> Yes! I was only worried about the jobid changing and the checkpoint
>>>> being un-referenceable.
>>>> But since I can pass a path to the checkpoint that will not be an issue.
>>>>
>>>>
>>>> Thanks a lot for your suggestions!
>>>>
>>>> On Thu, 8 Jul 2021 at 11:26, Arvid Heise <ar...@apache.org> wrote:
>>>>
>>>>> Hi Rakshit,
>>>>>
>>>>> It sounds to me as if you don't need the Savepoint API at all. You can
>>>>> (re)start all applications with the previous state (be it retained
>>>>> checkpoint or savepoint). You just need to provide the path to that in your
>>>>> application invocation [1] (every entry point has such a parameter, you
>>>>> might need to check the respective documentation if you are not using CLI).
>>>>> Note that although it only says savepoint, starting from a checkpoint is
>>>>> fine as well (just not recommended in the beginning).
>>>>>
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/cli/#starting-a-job-from-a-savepoint
>>>>>
>>>>> On Thu, Jul 8, 2021 at 6:31 AM Rakshit Ramesh <
>>>>> rakshit.ramesh@datakaveri.org> wrote:
>>>>>
>>>>>> Sorry for being a little vague there.
>>>>>> I want to create a Savepoint from a DataStream right before the job
>>>>>> is finished or cancelled.
>>>>>> What you have shown in the IT case is how a datastream can be
>>>>>> bootstrapped with state that is
>>>>>> formed formed by means of DataSet.
>>>>>> My jobs are triggered by a scheduler periodically (every day) using
>>>>>> the api and I would like
>>>>>> to bootstrap each day's job with the state of the previous day.
>>>>>>
>>>>>> But thanks for the input on the Checkpoint behaviour wrt a FINISHED
>>>>>> state,
>>>>>> I think that will work for me.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> On Thu, 8 Jul 2021 at 02:03, Arvid Heise <ar...@apache.org> wrote:
>>>>>>
>>>>>>> I don't quite understand your question. You use Savepoint API to
>>>>>>> create a savepoint with a batch job (that's why it's DataSet Transform
>>>>>>> currently). That savepoint can only be restored through a datastream
>>>>>>> application. Dataset applications cannot start from a savepoint.
>>>>>>>
>>>>>>> So I don't understand why you see a difference between "restoring a
>>>>>>> savepoint to a datastream" and "create a NewSavepoint for a datastream".
>>>>>>> It's ultimately the same thing for me. Just to be very clear: the main
>>>>>>> purpose of Savepoint API is to create the initial state of a datastream
>>>>>>> application.
>>>>>>>
>>>>>>> For your second question, yes retained checkpoints outlive the job
>>>>>>> in all regards. It's the users responsibility to eventually clean that up.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 7, 2021 at 6:56 PM Rakshit Ramesh <
>>>>>>> rakshit.ramesh@datakaveri.org> wrote:
>>>>>>>
>>>>>>>> Yes I could understand restoring a savepoint to a datastream.
>>>>>>>> What I couldn't figure out is to create a NewSavepoint for a
>>>>>>>> datastream.
>>>>>>>> What I understand is that NewSavepoints only take in Bootstrap
>>>>>>>> transformation for Dataset Transform functions.
>>>>>>>>
>>>>>>>>
>>>>>>>> About the checkpoints, does
>>>>>>>>  CheckpointConfig.ExternalizedCheckpointCleanup =
>>>>>>>> RETAIN_ON_CANCELLATION
>>>>>>>> offer the same behaviour when the job is "FINISHED" and not
>>>>>>>> "CANCELLED" ?
>>>>>>>>
>>>>>>>> What I'm looking for is a way to retain the state for a bounded job
>>>>>>>> so that the state is reloaded on the next job run (through api).
>>>>>>>>
>>>>>>>> On Wed, 7 Jul 2021 at 14:18, Arvid Heise <ar...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Rakshit,
>>>>>>>>>
>>>>>>>>> The example is valid. The state processor API is kinda working
>>>>>>>>> like a DataSet application but the state is meant to be read in DataStream.
>>>>>>>>> Please check out the SavepointWriterITCase [1] for a full example. There is
>>>>>>>>> no checkpoint/savepoint in DataSet applications.
>>>>>>>>>
>>>>>>>>> Checkpoints can be stored on different checkpoint storages, such
>>>>>>>>> as S3 or HDFS. If you use RocksDB state backend, Flink pretty much just
>>>>>>>>> copy the SST files of RocksDB to S3. Checkpoints are usually bound to the
>>>>>>>>> life of an application. So they are created by the application and deleted
>>>>>>>>> on termination.
>>>>>>>>> However, you can resume an application both from savepoint and
>>>>>>>>> checkpoints. Checkpoints can be retained [2] to avoid them being deleted by
>>>>>>>>> the application during termination. But that's considered an advanced
>>>>>>>>> feature and you should first try it with savepoints.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/apache/flink/blob/release-1.13.0/flink-libraries/flink-state-processing-api/src/test/java/org/apache/flink/state/api/SavepointWriterITCase.java#L141-L141
>>>>>>>>> [2]
>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#retained-checkpoints
>>>>>>>>>
>>>>>>>>> On Mon, Jul 5, 2021 at 5:56 PM Rakshit Ramesh <
>>>>>>>>> rakshit.ramesh@datakaveri.org> wrote:
>>>>>>>>>
>>>>>>>>>> I'm trying to bootstrap state into a KeyedProcessFunction
>>>>>>>>>> equivalent that takes in
>>>>>>>>>> a DataStream but I'm unable to find a reference for the same.
>>>>>>>>>> I found this gist
>>>>>>>>>>
>>>>>>>>>> https://gist.github.com/alpinegizmo/ff3d2e748287853c88f21259830b29cf
>>>>>>>>>> But it seems to only apply for DataSet.
>>>>>>>>>> My usecase is to manually trigger a Savepoint into s3 for later
>>>>>>>>>> reuse.
>>>>>>>>>> I'm also guessing that checkpoints can't be stored in rocksdb or
>>>>>>>>>> s3 for later reuse.
>>>>>>>>>>
>>>>>>>>>

Re: Re: Savepoints with bootstraping a datastream function

Posted by Rakshit Ramesh <ra...@datakaveri.org>.
Thanks for the information Yun!
I will go with the workaround then.

On Fri, 17 Sept 2021 at 15:22, Yun Gao <yu...@aliyun.com> wrote:

> Hi Rakshit,
>
> I think FLIP-147 might still not be able to support this case, since
> for bounded jobs, it supports each task exit after a checkpoint to
> commit the remaining data, but it could not ensures all the tasks
> exit after the same checkpoint; for savepoint, it could not supporting
> taking a savepoint after all the tasks are finished.
>
> We are also thinking on if we could support this case with some methods.
> For now, I think we may still need the workaround method as Arvid points
> out,
> that we coordinate all the sources to wait till a final checkpoint is
> completed
> before exit.
>
> Best,
> yun
>
>
> ------------------Original Mail ------------------
> *Sender:*Rakshit Ramesh <ra...@datakaveri.org>
> *Send Date:*Fri Sep 17 17:20:40 2021
> *Recipients:*Arvid Heise <ar...@apache.org>
> *CC:*user <us...@flink.apache.org>
> *Subject:*Re: Savepoints with bootstraping a datastream function
>
>> Hi Arvid.
>> I went through the code, confluence and jira on FLIP-147.
>> I couldn't determine if it's possible to manually trigger a
>> savepoint/checkpoint as
>> I couldn't find any javadoc apis for the same.
>> Also, would the setting "ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH" still
>> create a checkpoint
>> if my entire bounded job finishes before the checkpoint interval?
>>
>> On Tue, 13 Jul 2021 at 17:58, Rakshit Ramesh <
>> rakshit.ramesh@datakaveri.org> wrote:
>>
>>> That's great news!
>>> Thanks.
>>>
>>> On Tue, 13 Jul 2021 at 14:17, Arvid Heise <ar...@apache.org> wrote:
>>>
>>>> The only known workaround is to provide your own source(function) that
>>>> doesn't finish until all of the source subtasks finish and a final
>>>> checkpoint is completed. However, coordinating the sources with the old
>>>> SourceFunction interface requires some external tool.
>>>>
>>>> FLIP-147 is targeted for 1.14 in August.
>>>>
>>>> On Sat, Jul 10, 2021 at 7:46 PM Rakshit Ramesh <
>>>> rakshit.ramesh@datakaveri.org> wrote:
>>>>
>>>>> Hi Arvid,
>>>>> Since I'm trying to save checkpoints for a bounded process
>>>>> the checkpoint isn't being written on time since the job finishes
>>>>> before that can happen.
>>>>>
>>>>> Looks like one major feature that would be required for this to work
>>>>> is discussed in FLIP-147
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished
>>>>>
>>>>> Is there any userland workaround for this?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Thu, 8 Jul 2021 at 11:52, Rakshit Ramesh <
>>>>> rakshit.ramesh@datakaveri.org> wrote:
>>>>>
>>>>>> Yes! I was only worried about the jobid changing and the checkpoint
>>>>>> being un-referenceable.
>>>>>> But since I can pass a path to the checkpoint that will not be an
>>>>>> issue.
>>>>>>
>>>>>>
>>>>>> Thanks a lot for your suggestions!
>>>>>>
>>>>>> On Thu, 8 Jul 2021 at 11:26, Arvid Heise <ar...@apache.org> wrote:
>>>>>>
>>>>>>> Hi Rakshit,
>>>>>>>
>>>>>>> It sounds to me as if you don't need the Savepoint API at all. You
>>>>>>> can (re)start all applications with the previous state (be it retained
>>>>>>> checkpoint or savepoint). You just need to provide the path to that in your
>>>>>>> application invocation [1] (every entry point has such a parameter, you
>>>>>>> might need to check the respective documentation if you are not using CLI).
>>>>>>> Note that although it only says savepoint, starting from a checkpoint is
>>>>>>> fine as well (just not recommended in the beginning).
>>>>>>>
>>>>>>> [1]
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/cli/#starting-a-job-from-a-savepoint
>>>>>>>
>>>>>>> On Thu, Jul 8, 2021 at 6:31 AM Rakshit Ramesh <
>>>>>>> rakshit.ramesh@datakaveri.org> wrote:
>>>>>>>
>>>>>>>> Sorry for being a little vague there.
>>>>>>>> I want to create a Savepoint from a DataStream right before the job
>>>>>>>> is finished or cancelled.
>>>>>>>> What you have shown in the IT case is how a datastream can be
>>>>>>>> bootstrapped with state that is
>>>>>>>> formed formed by means of DataSet.
>>>>>>>> My jobs are triggered by a scheduler periodically (every day) using
>>>>>>>> the api and I would like
>>>>>>>> to bootstrap each day's job with the state of the previous day.
>>>>>>>>
>>>>>>>> But thanks for the input on the Checkpoint behaviour wrt a FINISHED
>>>>>>>> state,
>>>>>>>> I think that will work for me.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> On Thu, 8 Jul 2021 at 02:03, Arvid Heise <ar...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> I don't quite understand your question. You use Savepoint API to
>>>>>>>>> create a savepoint with a batch job (that's why it's DataSet Transform
>>>>>>>>> currently). That savepoint can only be restored through a datastream
>>>>>>>>> application. Dataset applications cannot start from a savepoint.
>>>>>>>>>
>>>>>>>>> So I don't understand why you see a difference between "restoring
>>>>>>>>> a savepoint to a datastream" and "create a NewSavepoint for a datastream".
>>>>>>>>> It's ultimately the same thing for me. Just to be very clear: the main
>>>>>>>>> purpose of Savepoint API is to create the initial state of a datastream
>>>>>>>>> application.
>>>>>>>>>
>>>>>>>>> For your second question, yes retained checkpoints outlive the job
>>>>>>>>> in all regards. It's the users responsibility to eventually clean that up.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 7, 2021 at 6:56 PM Rakshit Ramesh <
>>>>>>>>> rakshit.ramesh@datakaveri.org> wrote:
>>>>>>>>>
>>>>>>>>>> Yes I could understand restoring a savepoint to a datastream.
>>>>>>>>>> What I couldn't figure out is to create a NewSavepoint for a
>>>>>>>>>> datastream.
>>>>>>>>>> What I understand is that NewSavepoints only take in Bootstrap
>>>>>>>>>> transformation for Dataset Transform functions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> About the checkpoints, does
>>>>>>>>>>  CheckpointConfig.ExternalizedCheckpointCleanup =
>>>>>>>>>> RETAIN_ON_CANCELLATION
>>>>>>>>>> offer the same behaviour when the job is "FINISHED" and not
>>>>>>>>>> "CANCELLED" ?
>>>>>>>>>>
>>>>>>>>>> What I'm looking for is a way to retain the state for a bounded
>>>>>>>>>> job so that the state is reloaded on the next job run (through api).
>>>>>>>>>>
>>>>>>>>>> On Wed, 7 Jul 2021 at 14:18, Arvid Heise <ar...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Rakshit,
>>>>>>>>>>>
>>>>>>>>>>> The example is valid. The state processor API is kinda working
>>>>>>>>>>> like a DataSet application but the state is meant to be read in DataStream.
>>>>>>>>>>> Please check out the SavepointWriterITCase [1] for a full example. There is
>>>>>>>>>>> no checkpoint/savepoint in DataSet applications.
>>>>>>>>>>>
>>>>>>>>>>> Checkpoints can be stored on different checkpoint storages, such
>>>>>>>>>>> as S3 or HDFS. If you use RocksDB state backend, Flink pretty much just
>>>>>>>>>>> copy the SST files of RocksDB to S3. Checkpoints are usually bound to the
>>>>>>>>>>> life of an application. So they are created by the application and deleted
>>>>>>>>>>> on termination.
>>>>>>>>>>> However, you can resume an application both from savepoint and
>>>>>>>>>>> checkpoints. Checkpoints can be retained [2] to avoid them being deleted by
>>>>>>>>>>> the application during termination. But that's considered an advanced
>>>>>>>>>>> feature and you should first try it with savepoints.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://github.com/apache/flink/blob/release-1.13.0/flink-libraries/flink-state-processing-api/src/test/java/org/apache/flink/state/api/SavepointWriterITCase.java#L141-L141
>>>>>>>>>>> [2]
>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#retained-checkpoints
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 5, 2021 at 5:56 PM Rakshit Ramesh <
>>>>>>>>>>> rakshit.ramesh@datakaveri.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm trying to bootstrap state into a KeyedProcessFunction
>>>>>>>>>>>> equivalent that takes in
>>>>>>>>>>>> a DataStream but I'm unable to find a reference for the same.
>>>>>>>>>>>> I found this gist
>>>>>>>>>>>>
>>>>>>>>>>>> https://gist.github.com/alpinegizmo/ff3d2e748287853c88f21259830b29cf
>>>>>>>>>>>> But it seems to only apply for DataSet.
>>>>>>>>>>>> My usecase is to manually trigger a Savepoint into s3 for later
>>>>>>>>>>>> reuse.
>>>>>>>>>>>> I'm also guessing that checkpoints can't be stored in rocksdb
>>>>>>>>>>>> or s3 for later reuse.
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Re: Savepoints with bootstraping a datastream function

Posted by Yun Gao <yu...@aliyun.com>.
Hi Rakshit,

I think FLIP-147 might still not be able to support this case, since
for bounded jobs, it supports each task exit after a checkpoint to 
commit the remaining data, but it could not ensures all the tasks
exit after the same checkpoint; for savepoint, it could not supporting
taking a savepoint after all the tasks are finished. 

We are also thinking on if we could support this case with some methods.
For now, I think we may still need the workaround method as Arvid points out,
that we coordinate all the sources to wait till a final checkpoint is completed
before exit. 

Best,
yun



 ------------------Original Mail ------------------
Sender:Rakshit Ramesh <ra...@datakaveri.org>
Send Date:Fri Sep 17 17:20:40 2021
Recipients:Arvid Heise <ar...@apache.org>
CC:user <us...@flink.apache.org>
Subject:Re: Savepoints with bootstraping a datastream function

Hi Arvid.
I went through the code, confluence and jira on FLIP-147.
I couldn't determine if it's possible to manually trigger a savepoint/checkpoint as 
I couldn't find any javadoc apis for the same.
Also, would the setting "ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH" still create a checkpoint
if my entire bounded job finishes before the checkpoint interval? 

On Tue, 13 Jul 2021 at 17:58, Rakshit Ramesh <ra...@datakaveri.org> wrote:

That's great news!
Thanks.

On Tue, 13 Jul 2021 at 14:17, Arvid Heise <ar...@apache.org> wrote:

The only known workaround is to provide your own source(function) that doesn't finish until all of the source subtasks finish and a final checkpoint is completed. However, coordinating the sources with the old SourceFunction interface requires some external tool.

FLIP-147 is targeted for 1.14 in August.

On Sat, Jul 10, 2021 at 7:46 PM Rakshit Ramesh <ra...@datakaveri.org> wrote:

Hi Arvid,
Since I'm trying to save checkpoints for a bounded process
the checkpoint isn't being written on time since the job finishes before that can happen.

Looks like one major feature that would be required for this to work is discussed in FLIP-147
https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished

Is there any userland workaround for this?

Thanks!

On Thu, 8 Jul 2021 at 11:52, Rakshit Ramesh <ra...@datakaveri.org> wrote:

Yes! I was only worried about the jobid changing and the checkpoint being un-referenceable. 
But since I can pass a path to the checkpoint that will not be an issue.


Thanks a lot for your suggestions!

On Thu, 8 Jul 2021 at 11:26, Arvid Heise <ar...@apache.org> wrote:

Hi Rakshit,

It sounds to me as if you don't need the Savepoint API at all. You can (re)start all applications with the previous state (be it retained checkpoint or savepoint). You just need to provide the path to that in your application invocation [1] (every entry point has such a parameter, you might need to check the respective documentation if you are not using CLI). Note that although it only says savepoint, starting from a checkpoint is fine as well (just not recommended in the beginning).

[1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/cli/#starting-a-job-from-a-savepoint
On Thu, Jul 8, 2021 at 6:31 AM Rakshit Ramesh <ra...@datakaveri.org> wrote:

Sorry for being a little vague there.
I want to create a Savepoint from a DataStream right before the job is finished or cancelled.
What you have shown in the IT case is how a datastream can be bootstrapped with state that is
formed formed by means of DataSet. 
My jobs are triggered by a scheduler periodically (every day) using the api and I would like 
to bootstrap each day's job with the state of the previous day.

But thanks for the input on the Checkpoint behaviour wrt a FINISHED state, 
I think that will work for me.

Thanks!

On Thu, 8 Jul 2021 at 02:03, Arvid Heise <ar...@apache.org> wrote:

I don't quite understand your question. You use Savepoint API to create a savepoint with a batch job (that's why it's DataSet Transform currently). That savepoint can only be restored through a datastream application. Dataset applications cannot start from a savepoint.

So I don't understand why you see a difference between "restoring a savepoint to a datastream" and "create a NewSavepoint for a datastream". It's ultimately the same thing for me. Just to be very clear: the main purpose of Savepoint API is to create the initial state of a datastream application.

For your second question, yes retained checkpoints outlive the job in all regards. It's the users responsibility to eventually clean that up.



On Wed, Jul 7, 2021 at 6:56 PM Rakshit Ramesh <ra...@datakaveri.org> wrote:

Yes I could understand restoring a savepoint to a datastream.
What I couldn't figure out is to create a NewSavepoint for a datastream.
What I understand is that NewSavepoints only take in Bootstrap transformation for Dataset Transform functions.


About the checkpoints, does 
 CheckpointConfig.ExternalizedCheckpointCleanup = RETAIN_ON_CANCELLATION 
offer the same behaviour when the job is "FINISHED" and not "CANCELLED" ?

What I'm looking for is a way to retain the state for a bounded job so that the state is reloaded on the next job run (through api).

On Wed, 7 Jul 2021 at 14:18, Arvid Heise <ar...@apache.org> wrote:

Hi Rakshit,

The example is valid. The state processor API is kinda working like a DataSet application but the state is meant to be read in DataStream. Please check out the SavepointWriterITCase [1] for a full example. There is no checkpoint/savepoint in DataSet applications.

Checkpoints can be stored on different checkpoint storages, such as S3 or HDFS. If you use RocksDB state backend, Flink pretty much just copy the SST files of RocksDB to S3. Checkpoints are usually bound to the life of an application. So they are created by the application and deleted on termination.
However, you can resume an application both from savepoint and checkpoints. Checkpoints can be retained [2] to avoid them being deleted by the application during termination. But that's considered an advanced feature and you should first try it with savepoints.

[1] https://github.com/apache/flink/blob/release-1.13.0/flink-libraries/flink-state-processing-api/src/test/java/org/apache/flink/state/api/SavepointWriterITCase.java#L141-L141
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#retained-checkpoints
On Mon, Jul 5, 2021 at 5:56 PM Rakshit Ramesh <ra...@datakaveri.org> wrote:

I'm trying to bootstrap state into a KeyedProcessFunction equivalent that takes in 
a DataStream but I'm unable to find a reference for the same.
I found this gist 
https://gist.github.com/alpinegizmo/ff3d2e748287853c88f21259830b29cf
But it seems to only apply for DataSet.
My usecase is to manually trigger a Savepoint into s3 for later reuse.
I'm also guessing that checkpoints can't be stored in rocksdb or s3 for later reuse.