You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Gwenhael Pasquiers <gw...@ericsson.com> on 2017/08/29 14:46:13 UTC

yarn and checkpointing

Hi,

Is it possible to use checkpointing to restore the state of an app after a restart on yarn ?

From what I've seen it looks like that checkpointing only works within a flink cluster life-time. However the yarn mode has one cluster per app, and (unless the app crashes and is automatically restarted by the restart-strategy) the over-yarn-cluster has the same life time as the app, so when we stop the app, we stop the cluster that will clean it's checkpoints. 

So when the app is stopped, the cluster dies and cleans the checkpoints folder. Then of course it won't be able to restore the state at the next run.

When running flink on yarn are we supposed to cancel with savepoint and then restore from savepoint ?

Re: yarn and checkpointing

Posted by Chesnay Schepler <ch...@apache.org>.

Checkpoints are only used for recovery during the job execution.

If the entire cluster is shutdown and restarted you will need to take a 
savepoint and restore from that.

On 29.08.2017 16:46, Gwenhael Pasquiers wrote:
> Hi,
>
> Is it possible to use checkpointing to restore the state of an app after a restart on yarn ?
>
>  From what I've seen it looks like that checkpointing only works within a flink cluster life-time. However the yarn mode has one cluster per app, and (unless the app crashes and is automatically restarted by the restart-strategy) the over-yarn-cluster has the same life time as the app, so when we stop the app, we stop the cluster that will clean it's checkpoints.
>
> So when the app is stopped, the cluster dies and cleans the checkpoints folder. Then of course it won't be able to restore the state at the next run.
>
> When running flink on yarn are we supposed to cancel with savepoint and then restore from savepoint ?