You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2018/09/18 08:02:09 UTC

[GitHub] tzulitai opened a new pull request #6711: [FLINK-9377] [core, state backends] Remove serializers from checkpoints

tzulitai opened a new pull request #6711: [FLINK-9377] [core, state backends] Remove serializers from checkpoints
URL: https://github.com/apache/flink/pull/6711
 
 
   ## What is the purpose of the change
   
   The goal of this PR is the following:
   
   - Extend the `TypeSerializerConfigSnapshot` to also be the factory for the originating serializer. This introduces a new `restoreSerializer()` method on the interface (API-breaking change).
   - Since interface would inevitably be broken, we also use the opportunity to move the responsibility of serializer schema compatibility checks from `TypeSerializer` to the `TypeSerializerConfigSnapshot`. This is an improvement to move away from the previous design of having to reconfigure serializers, leading to mutable serializers as well as a ill-defined separation of concerns (serialization v.s. compatibility).
   - Without the need to touch all `TypeSerializerConfigSnapshot` subclasses at once while still letting the codebase to build and pass tests, the new methods on the `TypeSerializerConfigSnapshot` interface should have "workaround" base implementations. These base implementations should eventually be removed once all subclasses have the proper implementations in place.
   
   The next steps after this PR is merged would be the following:
   - Introduce a `SerializerUpgradeTestBase` that covers test concerns for upgrading a serializer and its interplay with different state backends. Ultimately, a check needs to be added that ALL Flink-shipped serializers have a corresponding test subclass.
   - Incrementally implement the new methods for all serializers (will be several PRs).
   - Use the new interfaces to implement state migration in the state backends.
   
   ## Brief change log
   
   #### b0151cd Add the `TypeSerializerConfigSnapshot#restoreSerializer()` method
   
   The workaround base implementation for this is to just return the config's originating serializer.
   There are two scenarios in our current tests that needs to be covered by this: 1) taking a snapshot of current serializers, and then trying to restore the serializer, and 2) migration tests which read from savepoint files and try to restore serializers.
   
   For 1), the workaround is to inject the originating serializer just before the config is written, and then write / read that serializer as part of writing / reading the config.
   For 2), that is already covered by our backwards compatibility path for Flink <= 1.6.x, which still contained both serializer and config. The backwards compatibility resolution would be to wrap the serializer and config inside a `BackwardsCompatibleConfigSnapshot`, introduced in 80c3043.
   
   #### 6073a0d Add the `TypeSerializerConfigSnapshot#resolveSchemaCompatibility(TypeSerializer)` method
   
   The workaround base implementation for this is to forward the compatibility check to the to-be-removed `TypeSerializer#ensureCompatibility(TypeSerializerConfigSnapshot)` method.
   
   We also define a new set of compatibility results, defined in `TypeSerializerSchemaCompatibility`, which will eventually completely replace `CompatibilityResult`.
   
   #### 48124d7 Deprecate `TypeSerializerSerializationUtil`
   
   This is a simple code cleanup.
   
   #### 80c3043 Introduce `BackwardsCompatibleConfigSnapshot`
   
   This is our backwards compatibility resolution path when reading from savepoint versions which still contained both the serializer and their config.
   
   A `BackwardsCompatibleConfigSnapshot` wraps the serializer and config together, and on `restoreSerializer`, instead of directly calling `restoreSerializer` on the wrapped config (which for previous versions may not be capable of actually instantiating the serializer), return the wrapped serializer instead.
   
   #### f9a0dee Remove serializers from checkpoints
   
   This commit glues everything together to remove serializers from checkpoints.
   This commit changes the serialization formats for state backends (while remaining backwards compatibility).
   
   ## Verifying this change
   
   All current state migration tests, state backend tests, serializer tests etc. should still be working.
   No new tests have been added yet.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**yes** / no)
     - The serializers: (**yes** / no / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (**yes** / no)
     - If yes, how is the feature documented? (not applicable / docs / **JavaDocs** / not documented)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services