You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Astrac <al...@gmail.com> on 2016/09/30 13:17:42 UTC

Controlling savepoints from inside an application

In the project I am working on we are versioning all our flink operators in
order to be able to re-build the state from external sources (i.e. Kafka) by
bumping that version number; this works pretty nicely so far, except that we
need to be aware of wether or not we need to load the savepoint before
deploying as we need to provide different command line arguments via our
deployment script.

The thing I'd like to do instead is versioning the savepoint storage
mechanism and store the savepoints in different folders depending on the
version of our application we are running. This way when we bump the version
number we really start from scratch and we don't risk any exception due to
state deserialisation; when we don't bump the number instead we keep the
state from the previous version of the application and start from there.

To do this I would need to control the storage path of the savepoints from
within our application code but I couldn't find a way to do it; if that's
relevant we run on Yarn, keep checkpoint on the FsStateBackend and keep both
savepoints and checkpoints on HDFS. Our main class looks something like
this:

    val flinkEnvironment =
StreamExecutionEnvironment.getExecutionEnvironment
    val stateBackend = new
FsStateBackend(s"hdfs://${config.getString("utd.hdfs.namenode.host")}:8020/user/hadoop/flink/checkpoints")
    flinkEnvironment.setStateBackend(stateBackend)
    // ... define more configurations and the streaming jobs
    flinkEnvironment.enableCheckpointing(600000).execute()

Is there a way in this initialisation code to achieve the following?

* Configure the savepoint path while we build the StreamExecutionEnvironment
rather than in flink-conf.yml
* Manually read a savepoint rather than passing it via the CLI

Many Thanks!



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Controlling-savepoints-from-inside-an-application-tp9273.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Controlling savepoints from inside an application

Posted by Ufuk Celebi <uc...@apache.org>.

On Fri, Sep 30, 2016 at 4:19 PM, Astrac <al...@gmail.com> wrote:
> What I mean by "manually reading the savepoint" is that rather than
> providing the savepoint path via "the --fromSavepoint
> hdfs://some-path/to/savepoint" option I'd like to provide it in the code
> that initialises the StreamExecutionEnvironment. This way I can use my
> versioning strategy to load a savepoint that is compatible with the current
> version of the application (or none if this is a new version of the state,
> effectively rebuilding everything from Kafka).
>
> On the other side, i.e. writing the savepoint somewhere, at the moment I
> would be happy with triggering savepoints via CLI if it was possible to
> configure the path where they are stored via the initialisation code where
> we build the StreamExecutionEnvironment rather than via flink-conf.yml;
> since I don't see mention of this in the FLIP, is this something you would
> be happy to add as well?

I think what you describe makes sense. I will update FLIP-10
accordingly. I'm finishing the initial PR for FLIP-10, which will not
include this yet though. Thanks for your input!

Re: Controlling savepoints from inside an application

Posted by Astrac <al...@gmail.com>.

Thanks for the answer,

the changes in the FLIP are quite interesting, are they coming in 1.2?

What I mean by "manually reading the savepoint" is that rather than
providing the savepoint path via "the --fromSavepoint
hdfs://some-path/to/savepoint" option I'd like to provide it in the code
that initialises the StreamExecutionEnvironment. This way I can use my
versioning strategy to load a savepoint that is compatible with the current
version of the application (or none if this is a new version of the state,
effectively rebuilding everything from Kafka).

On the other side, i.e. writing the savepoint somewhere, at the moment I
would be happy with triggering savepoints via CLI if it was possible to
configure the path where they are stored via the initialisation code where
we build the StreamExecutionEnvironment rather than via flink-conf.yml;
since I don't see mention of this in the FLIP, is this something you would
be happy to add as well?



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Controlling-savepoints-from-inside-an-application-tp9273p9276.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Controlling savepoints from inside an application

Posted by Ufuk Celebi <uc...@apache.org>.

Hey Aldo,

On Fri, Sep 30, 2016 at 3:17 PM, Astrac <al...@gmail.com> wrote:
> * Configure the savepoint path while we build the StreamExecutionEnvironment
> rather than in flink-conf.yml
> * Manually read a savepoint rather than passing it via the CLI

what you describe is not possible right now, but I'm working on
unifying savepoints and checkpoints [1]. With the upcoming changes for
this, it will be possible to provide the paths in the environment.
What do you mean with manually reading a savepoint?

I would really appreciate some feedback on the FLIP, too. There is a
corresponding [DISCUSS] thread for comments.

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-10%3A+Unify+Checkpoints+and+Savepoints