You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Tzu-Li (Gordon) Tai (JIRA)" <ji...@apache.org> on 2018/05/16 09:47:00 UTC

[jira] [Updated] (FLINK-9377) Remove writing serializers as part of the checkpoint meta information

     [ https://issues.apache.org/jira/browse/FLINK-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tzu-Li (Gordon) Tai updated FLINK-9377:
---------------------------------------
    Description: 
When writing meta information of a state in savepoints, we currently write both the state serializer as well as the state serializer's configuration snapshot.

Writing both is actually redundant, as most of the time they have identical information.
 Moreover, the fact that we use Java serialization to write the serializer and rely on it to be re-readable on the restore run, already poses problems for serializers such as the {{AvroSerializer}} (see discussion in FLINK-9202) to perform even a compatible upgrade.

The proposal here is to leave only the config snapshot as meta information, and use that as the single source of truth of information about the schema of serialized state.
 The config snapshot should be treated as a factory (or provided to a factory) to re-create serializers capable of reading old, serialized state.

  was:
When writing meta information of a state in savepoints, we currently write both the state serializer as well as the state serializer's configuration snapshot.

Writing both is actually redundant, as most of the time they have identical information.
Moreover, the fact that we use Java serialization to write the serializer and rely on it to be re-readable on the restore run, already poses problems for serializers such as the {{AvroSerializer}} (see discussion in FLINK-9202).

The proposal here is to leave only the config snapshot as meta information, and use that as the single source of truth of information about the schema of serialized state.
The config snapshot should be treated as a factory (or provided to a factory) to re-create serializers capable of reading old, serialized state.


> Remove writing serializers as part of the checkpoint meta information
> ---------------------------------------------------------------------
>
>                 Key: FLINK-9377
>                 URL: https://issues.apache.org/jira/browse/FLINK-9377
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Blocker
>             Fix For: 1.6.0
>
>
> When writing meta information of a state in savepoints, we currently write both the state serializer as well as the state serializer's configuration snapshot.
> Writing both is actually redundant, as most of the time they have identical information.
>  Moreover, the fact that we use Java serialization to write the serializer and rely on it to be re-readable on the restore run, already poses problems for serializers such as the {{AvroSerializer}} (see discussion in FLINK-9202) to perform even a compatible upgrade.
> The proposal here is to leave only the config snapshot as meta information, and use that as the single source of truth of information about the schema of serialized state.
>  The config snapshot should be treated as a factory (or provided to a factory) to re-create serializers capable of reading old, serialized state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)