You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2018/12/09 08:36:56 UTC

[GitHub] tzulitai opened a new pull request #7264: [FLINK-11094] [rocksdb] Make non-accessed restored state in RocksDBStateBackend snapshottable

tzulitai opened a new pull request #7264: [FLINK-11094] [rocksdb] Make non-accessed restored state in RocksDBStateBackend snapshottable
URL: https://github.com/apache/flink/pull/7264
 
 
   ## What is the purpose of the change
   
   The ultimate goal of this PR is to fix a bug where NPE is thrown if a non-accessed restored state in `RocksDBKeyedStatebackend` was snapshotted.
   This was an overlooked issue caused by changes in [FLINK-10679](https://issues.apache.org/jira/browse/FLINK-10679).
   
   The problem is that in that change, in the `RocksDBKeyedBackend`, `RegisteredStateMetaInfoBase`s were no longer created eagerly for all restored state, but instead only lazily created when the state was accessed again by the user. This causes non-accessed restored state to have empty meta info, and throws NPE when trying to take a snapshot of them.
   
   The rationale behind FLINK-10679 was that, since `RegisteredStateMetaInfoBase` holds already serializer instances for state access, creating them eagerly at restore time with restored serializer snapshots did not make sense (because at that point-in-time, we do not have the new serializers yet for state access; the snapshot is only capable of creating the previous state serializer).
   
   This PR fixes this by letting state meta infos hold a `StateSerializerProvider`, instead of serializer instances. This allows meta infos to be instantiated from snapshots without eagerly accessing the restore serializer.
   
   The API for the `StateSerializerProvider` is as follows:
   ```
   public abstract class StateSerializerProvider<T> {
       TypeSerializer<T> currentSchemaSerializer();
       TypeSerializerSchemaCompatibility<T> registerSerializerForRestoredState(TypeSerializer<T> newRegisteredSerializer);
       TypeSerializer<T> previousSchemaSerializer();
   
       // factory methods for provider:
       public static <T> StateSerializerProvider<T> fromNewState(TypeSerializer<T> registeredSerializer);
       public static <T> StateSerializerProvider<T> fromRestoredState(TypeSerializerSnapshot<T> snapshot);
   }
   ```
   
   A StateSerializerProvider can be created either from:
   1. A restored serializer snapshot when restoring the state.
   2. A fresh, new state's serializer, when registering the state for the first time.
   
   For 1), state that has not been accessed yet after the restore will return the same serializer (i.e. the serializer for previous schema) for both `currentSchemaSerializer` and `previousSchemaSerializer`, meaning that since no new serializer has been registered, the schema remains the same anyways.
   Once a restored state is re-accessed, then `registerSerializerForRestoredState(TypeSerializer<T> newSerializer)` should be used to update what serializer the provider returns in `currentSchemaSerializer`.
   
   ## Brief change log
   
   Main changes are:
   
   - First 3 hotfixes (3555f30 to 9e0a400) is some preliminary code cleanup / test hardening for the `StateBackendMigrationTestBase`.
   - c20c2cc: Introduce the new `StateSerializerProvider` abstraction and added tests for it
   - ee192df: Eagerly create meta infos from restored snapshots, so that they are created even if the state isn't accessed.
   - 02d5a9d: Let meta infos use `StateSerializerProvider` instead of serializer instance
   - e9542d0: Add test coverage for snapshotting non-accessed restored state
   
   ## Verifying this change
   
   This PR adds a new test `StateBackendTestBase.testSnapshotNonAccessedState()` to cover the issue.
   The test fails for RocksDB state backend case without the fixes.
   
   Also, unit tests are added for `StateSerializerProvider`, in `StateSerializerProviderTest`.
   
   All existing migration related tests in `StateBackendMigrationTestBase` are also related and should not be failing.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services