You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/02/20 06:19:59 UTC

[GitHub] tzulitai opened a new pull request #7759: [FLINK-11485][FLINK-10897] POJO state schema evolution / migrate PojoSerializer to use new compatibility APIs

tzulitai opened a new pull request #7759: [FLINK-11485][FLINK-10897] POJO state schema evolution / migrate PojoSerializer to use new compatibility APIs
URL: https://github.com/apache/flink/pull/7759
 
 
   ## What is the purpose of the change
   
   This PR mainly solves FLINK-11485 (migrate `PojoSerializer` to use new serialization compatibility APIs), while also solves FLINK-10897 (support POJO state schema evolution) indirectly.
   
   The new snapshot class for the `PojoSerializer` is now the `PojoSerializerSnapshot`.
   This new snapshot class has the following features:
   - Avoids Java serialization completely, including serialization of POJO fields and classes.
   - Properly supports schema evolution of POJO types, by properly signaling `compatibleAfterMigration` when fields are added or removed from the target POJO type. When performing migration, the old POJO serializer obtained from `PojoSerializerSnapshot#restoreSerializer`, which is used to read old data into Java objects, is capable of reading and dropping values of fields that no longer exist.
   - Properly reconfigures the `PojoSerializer` by providing a new reconfigured instance via `compatibleWithReconfiguredSerializer`. This allows the implementation of the `PojoSerializer` to be immutable.
   
   Please see below for the detailed list of changes.
   
   ## Brief change log
   
   - 3bd3f73 to 843088c: Cherry-picking / refactoring of the `OptionalMap` utility introduced by @igalshilman in #7496. This is refactored because it will be useful for the `PojoSerializerSnapshot` as well.
   
   - 4ad190f: Refactor the logic of resolving overall compatibility results across multiple nested serializers of a composite serializer out of the `CompositeTypeSerializerSnapshot` class. The `PojoSerializer` also has nested serializers (e.g. field serializer, registered subclass serializers), and will make use of this common utility.
   
   - 931c70f: Introduces a new `PojoSerializerSnapshotData` class. This is a container class for the actual snapshotted content of a `PojoSerializer`. The important bits here is reading / writing of the snapshot content, as well as how missing fields / classes in the snapshot content at restore time is handled. Please see the class for the serialization format and content of the snapshotted data.
   
   - b4bad71: Introduces a new `PojoSerializerSnapshot` class, to be the new snapshot class for the `PojoSerializer`. The read / write of the snapshot is delegated to the `PojoSerializerSnapshotData`, so the important part here is the compatibility resolution logic in `resolveSchemaCompatibility`, as well as creation of the restore serializer in `restoreSerializer` method. The restored serializer should be able to handle missing fields in the new Pojo type; values of missing fields will be read and simply dropped.
   
   - b9453c6: This is the main change that lets the `PojoSerializer` use the new snapshot class, and no longer uses the now deprecated `PojoSerializerConfigSnapshot`. This also deals with how the `PojoSerializerConfigSnapshot` delegates compatibility checks to the new snapshot class, when restoring from savepoint version earlier than 1.8.0. Since the new snapshot class properly handles POJO schema changes, this commit essentially also enables POJO state schema evolution.
   
   - d03433c: Touches `PojoSerializerTest` and `PojoSerializerUpgradeTest`. Changes in the `PojoSerializerTest` confirms that the new `PojoSerializer` actually handles reconfiguration cases properly. Changes in the `PojoSerializerUpgradeTest` is mostly removing the behaviour of expecting errors when the schema of a POJO type is changed. Removing those test behaviours confirms that POJO schema can indeed be evolved now.
   
   - 93c2446: Documents POJO schema evolution in the "Working with State" docs.
   
   ## Verifying this change
   
   - Changes to `PojoSerializerTest` confirms that the new `PojoSerializer` handles compatibility resolution properly.
   - Changes to `PojoSerializerUpgradeTest` confirms that POJO fields can be removed / added when restoring from savepoints.
   - A new `PojoSerializerSnapshotMigrationTest` confirms that old savepoints with the legacy `PojoSerializerConfigSnapshot` can be smoothly migrated to the new `PojoSerializerSnapshot` in 1.8.0.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency):  **no**
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no**
     - The serializers: **yes**
     - The runtime per-record code paths (performance sensitive): **no**
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **yes**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **yes - POJO state schema evolution**
     - If yes, how is the feature documented? **docs**
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services