You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Sergei Poganshev <s....@slice.com> on 2019/03/27 23:21:03 UTC

What are savepoint state manipulation support plans

What are the plans to support savepoint state manipulation with batch jobs
natively in core Flink?

I've tried using the bravo tool [1]. It's pretty good at reading
savepoints, but writing seems hacky. For example I wonder what exactly
happens with the following lines:

val newOpState = writer.writeAll()
val newSavepoint = StateMetadataUtils.createNewSavepoint(savepoint, newOpState)
StateMetadataUtils.writeSavepointMetadata(savepointDir, newSavepoint)

Does it actually wait for the batch job to finish all its tasks and writes
metadata file then? I'm asking because this code didn't execute at all when
I tried to run it in k8s environment with a standalone-job.sh setup. (i.e.
the _metadata file did not get created)


[1] https://github.com/king/bravo

Re: What are savepoint state manipulation support plans

Posted by Vishal Santoshi <vi...@gmail.com>.
+1

On Thu, Mar 28, 2019, 5:01 AM Ufuk Celebi <uc...@apache.org> wrote:

> I think such a tool would be really valuable to users.
>
> @Gordon: What do you think about creating an umbrella ticket for this
> and linking it in this thread? That way, it's easier to follow this
> effort. You could also link Bravo and Seth's tool in the ticket as
> starting points.
>
> – Ufuk
>

Re: What are savepoint state manipulation support plans

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
FYI: Seth starting a FLIP for adding a savepoint connector that addresses
this -
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-FLIP-43-Savepoint-Connector-td29233.html

Please join the discussion there if you are interested!

On Thu, Mar 28, 2019 at 5:23 PM Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> @Ufuk
>
> Yes, creating a JIRA now already to track this makes sense.
>
> I've proceeded to open one:
> https://issues.apache.org/jira/browse/FLINK-12047
> Let's move any further discussions there.
>
> Cheers,
> Gordon
>
> On Thu, Mar 28, 2019 at 5:01 PM Ufuk Celebi <uc...@apache.org> wrote:
>
>> I think such a tool would be really valuable to users.
>>
>> @Gordon: What do you think about creating an umbrella ticket for this
>> and linking it in this thread? That way, it's easier to follow this
>> effort. You could also link Bravo and Seth's tool in the ticket as
>> starting points.
>>
>> – Ufuk
>>
>

Re: What are savepoint state manipulation support plans

Posted by Ufuk Celebi <uc...@apache.org>.
Thanks Gordon. We already have 5 people watching it. :-)

On Thu, Mar 28, 2019 at 10:23 AM Tzu-Li (Gordon) Tai
<tz...@apache.org> wrote:
>
> @Ufuk
>
> Yes, creating a JIRA now already to track this makes sense.
>
> I've proceeded to open one:  https://issues.apache.org/jira/browse/FLINK-12047
> Let's move any further discussions there.
>
> Cheers,
> Gordon
>
> On Thu, Mar 28, 2019 at 5:01 PM Ufuk Celebi <uc...@apache.org> wrote:
>>
>> I think such a tool would be really valuable to users.
>>
>> @Gordon: What do you think about creating an umbrella ticket for this
>> and linking it in this thread? That way, it's easier to follow this
>> effort. You could also link Bravo and Seth's tool in the ticket as
>> starting points.
>>
>> – Ufuk

Re: What are savepoint state manipulation support plans

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
@Ufuk

Yes, creating a JIRA now already to track this makes sense.

I've proceeded to open one:
https://issues.apache.org/jira/browse/FLINK-12047
Let's move any further discussions there.

Cheers,
Gordon

On Thu, Mar 28, 2019 at 5:01 PM Ufuk Celebi <uc...@apache.org> wrote:

> I think such a tool would be really valuable to users.
>
> @Gordon: What do you think about creating an umbrella ticket for this
> and linking it in this thread? That way, it's easier to follow this
> effort. You could also link Bravo and Seth's tool in the ticket as
> starting points.
>
> – Ufuk
>

Re: What are savepoint state manipulation support plans

Posted by Ufuk Celebi <uc...@apache.org>.
I think such a tool would be really valuable to users.

@Gordon: What do you think about creating an umbrella ticket for this
and linking it in this thread? That way, it's easier to follow this
effort. You could also link Bravo and Seth's tool in the ticket as
starting points.

– Ufuk

Re: What are savepoint state manipulation support plans

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi!

Regarding the support for savepoint reading / writing / processing directly
in core Flink, we've been thinking about that lately and might push a bit
for adding the functionality to Flink in the next release.
For example, beside Bravo, Seth (CC'ed) also had implemented something [1]
for this. We should start thinking about converging the efforts of similar
tools and supporting it in Flink soon.
There's no official JIRA / feature proposal for this yet, but if you're
interested, please keep an eye on the dev mailing list for it in the future.

Cheers,
Gordon

[1] https://github.com/sjwiesman/flink/tree/savepoint-connector

On Thu, Mar 28, 2019 at 4:26 PM Gyula Fóra <gy...@gmail.com> wrote:

> Hi!
>
> I dont think there is any ongoing effort in core Flink other than this
> library we created.
>
> You are probably right that it is pretty hacky at the moment. I would say
> this one way we could do it that seemed convenient to me at the time I have
> written the code.
>
> If you have ideas how to structure it better or improve it, you know
> where to find the code, feel free to open a PR :) That might actually takes
> us closer to having this properly in flink one day soon.
>
> Just to clarify the code you are showing:
> writer.writeAll() -> Runs the batch job that writes the checkpoint files
> for the changed operator states, returns the reference to the OperatorState
> metadata object
> StateMetadataUtils.createNewSavepoint() -> Replaces the metadata for the
> operator states you have just written in the previous savepoint
> StateMetadataUtils.writeSavepointMetadata() -> Writes a new metadata file
>
> So metadata writing happens as the very last step after the batch job has
> run. This is similar to how it works in streaming jobs in the sense there
> the jobmanager writes the metafile after the checkpointing is done. The
> downside of this approach is that the client might not have access to write
> the metafile here.
>
> Gyula
>
>
>

Re: What are savepoint state manipulation support plans

Posted by Gyula Fóra <gy...@gmail.com>.
Hi!

I dont think there is any ongoing effort in core Flink other than this
library we created.

You are probably right that it is pretty hacky at the moment. I would say
this one way we could do it that seemed convenient to me at the time I have
written the code.

If you have ideas how to structure it better or improve it, you know
where to find the code, feel free to open a PR :) That might actually takes
us closer to having this properly in flink one day soon.

Just to clarify the code you are showing:
writer.writeAll() -> Runs the batch job that writes the checkpoint files
for the changed operator states, returns the reference to the OperatorState
metadata object
StateMetadataUtils.createNewSavepoint() -> Replaces the metadata for the
operator states you have just written in the previous savepoint
StateMetadataUtils.writeSavepointMetadata() -> Writes a new metadata file

So metadata writing happens as the very last step after the batch job has
run. This is similar to how it works in streaming jobs in the sense there
the jobmanager writes the metafile after the checkpointing is done. The
downside of this approach is that the client might not have access to write
the metafile here.

Gyula