You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stefan Richter (JIRA)" <ji...@apache.org> on 2017/05/05 13:31:04 UTC

[jira] [Closed] (FLINK-5051) Backwards compatibility for serializers in backend state

     [ https://issues.apache.org/jira/browse/FLINK-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Richter closed FLINK-5051.
---------------------------------
    Resolution: Implemented

> Backwards compatibility for serializers in backend state
> --------------------------------------------------------
>
>                 Key: FLINK-5051
>                 URL: https://issues.apache.org/jira/browse/FLINK-5051
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>
> When a new state is register, e.g. in a keyed backend via `getPartitionedState`, the caller has to provide all type serializers required for the persistence of state components. Explicitly passing the serializers on state creation already allows for potentiall version upgrades of serializers.
> However, those serializers are currently not part of any snapshot and are only provided at runtime, when the state is registered newly or restored. For backwards compatibility, this has strong implications: checkpoints are not self contained in that state is currently a blackbox without knowledge about it's corresponding serializers. Most cases where we would need to restructure the state are basically lost. We could only convert them lazily at runtime and only once the user is registering the concrete state, which might happen at unpredictable points.
> I suggest to adapt our solution as follows:
> - As now, all states are registered with their set of serializers.
> - Unlike now, all serializers are written to the snapshot. This makes savepoints self-contained and also allows to create inspection tools for savepoints at some point in the future.
> - Introduce an interface {{Versioned}} with {{long getVersion()}} and {{boolean isCompatible(Versioned v)}} which is then implemented by serializers. Compatible serializers must ensure that they can deserialize older versions, and can then serialize them in their new format. This is how we upgrade.
> We need to find the right tradeoff in how many places we need to store the serializers. I suggest to write them once per parallel operator instance for each state, i.e. we have a map with state_name -> tuple3<serializer<KEY>, serializer<NAMESPACE>, serializer<STATE>>. This could go before all key-groups are written, right at the head of the file. Then, for each file we see on restore, we can first read the serializer map from the head of the stream, then go through the key groups by offset.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)