You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by kl0u <gi...@git.apache.org> on 2018/04/25 11:57:42 UTC

[GitHub] flink pull request #5910: [FLINK-8841] [state] Remove HashMapSerializer and ...

GitHub user kl0u opened a pull request:

    https://github.com/apache/flink/pull/5910

    [FLINK-8841] [state] Remove HashMapSerializer and use MapSerializer instead.

    ## What is the purpose of the change
    
    So far we had the `MapSerializer` and the `HashMapSerializer`. The two had almost identical code and the second was only used on the `HeapStateBackend`/`FSStateBackend` when creating a `MapState`. This PR removes the `HashMapSerializer` and replaces its uses with the `MapSerializer`. It also guarantees backwards compatibility.
    
    ## Brief change log
    
    It introduces the `MigrationUtil` as an inner class of the `InstantiationUtil`. This class contains mapping between deprecated/deleted serializers and their replacements.
    
    Also the removal of the `HashMapSerializer` uniformizes a bit the `HeapMapState` and the `RocksDBMapState`.
    
    ## Verifying this change
    
    Added the `HeapKeyedStateBackendSnapshotMigrationTest#testMapStateMigrationAfterHashMapSerRemoval()`.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
      - The serializers: yes
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? no
      - If yes, how is the feature documented? not applicable
    
    R @StefanRRichter or @aljoscha 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kl0u/flink map-serializer-inv

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5910
    
----
commit 7de83de4765384080cea6d94b64a81e1584ce82e
Author: kkloudas <kk...@...>
Date:   2018-04-24T12:48:34Z

    [FLINK-8841] Remove HashMapSerializer and use MapSerializer instead.

----


---

[GitHub] flink issue #5910: [FLINK-8841] [state] Remove HashMapSerializer and use Map...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/5910
  
    +1


---

[GitHub] flink pull request #5910: [FLINK-8841] [state] Remove HashMapSerializer and ...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5910#discussion_r184304412
  
    --- Diff: flink-core/src/main/java/org/apache/flink/util/InstantiationUtil.java ---
    @@ -221,6 +226,52 @@ protected ObjectStreamClass readClassDescriptor() throws IOException, ClassNotFo
     		}
     	}
     
    +	/**
    +	 * A mapping between the full path of a deprecated serializer and its equivalent.
    +	 * These mappings are hardcoded and fixed.
    +	 *
    +	 * <p>IMPORTANT: mappings can be removed after 1 release as there will be a "migration path".
    +	 * As an example, a serializer is removed in 1.5-SNAPSHOT, then the mapping should be added for 1.5,
    +	 * and it can be removed in 1.6, as the path would be Flink-{< 1.5} -> Flink-1.5 -> Flink-{>= 1.6}.
    +	 */
    +	private enum MigrationUtil {
    +
    +		// To add a new mapping just pick a name and add an entry as the following:
    +
    +		GENERIC_DATA_ARRAY_SERIALIZER(
    +				"org.apache.avro.generic.GenericData$Array",
    +				ObjectStreamClass.lookup(KryoRegistrationSerializerConfigSnapshot.DummyRegisteredClass.class)),
    +		HASH_MAP_SERIALIZER(
    +				"org.apache.flink.runtime.state.HashMapSerializer",
    +				ObjectStreamClass.lookup(MapSerializer.class)); // added in 1.5
    +
    +		/** An internal unmoddifiable map containing the mappings between deprecated and new serialzers. */
    +		private static final Map<String, ObjectStreamClass> equivalenceMap = Collections.unmodifiableMap(initMap());
    --- End diff --
    
    If it is static final, I suggest to use capital letter and underscores for the variable name.


---

[GitHub] flink issue #5910: [FLINK-8841] [state] Remove HashMapSerializer and use Map...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/5910
  
    LGTM 👍 


---

[GitHub] flink pull request #5910: [FLINK-8841] [state] Remove HashMapSerializer and ...

Posted by yuqi1129 <gi...@git.apache.org>.
Github user yuqi1129 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5910#discussion_r184044453
  
    --- Diff: flink-core/src/main/java/org/apache/flink/util/InstantiationUtil.java ---
    @@ -221,6 +226,52 @@ protected ObjectStreamClass readClassDescriptor() throws IOException, ClassNotFo
     		}
     	}
     
    +	/**
    +	 * A mapping between the full path of a deprecated serializer and its equivalent.
    +	 * These mappings are hardcoded and fixed.
    +	 *
    +	 * <p>IMPORTANT: mappings can be removed after 1 release as there will be a "migration path".
    +	 * As an example, a serializer is removed in 1.5-SNAPSHOT, then the mapping should be added for 1.5,
    +	 * and it can be removed in 1.6, as the path would be Flink-{< 1.5} -> Flink-1.5 -> Flink-{>= 1.6}.
    +	 */
    +	private enum MigrationUtil {
    +
    +		// To add a new mapping just pick a name and add an entry as the following:
    +
    +		GENERIC_DATA_ARRAY_SERIALIZER(
    +				"org.apache.avro.generic.GenericData$Array",
    +				ObjectStreamClass.lookup(KryoRegistrationSerializerConfigSnapshot.DummyRegisteredClass.class)),
    +		HASH_MAP_SERIALIZER(
    +				"org.apache.flink.runtime.state.HashMapSerializer",
    +				ObjectStreamClass.lookup(MapSerializer.class)); // added in 1.5
    +
    +		/** An internal unmoddifiable map containing the mappings between deprecated and new serialzers. */
    --- End diff --
    
    `unmoddifiable`, `serialzers ` 
    spelling mistake



---

[GitHub] flink pull request #5910: [FLINK-8841] [state] Remove HashMapSerializer and ...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5910#discussion_r184304664
  
    --- Diff: flink-core/src/main/java/org/apache/flink/util/InstantiationUtil.java ---
    @@ -221,6 +226,52 @@ protected ObjectStreamClass readClassDescriptor() throws IOException, ClassNotFo
     		}
     	}
     
    +	/**
    +	 * A mapping between the full path of a deprecated serializer and its equivalent.
    +	 * These mappings are hardcoded and fixed.
    +	 *
    +	 * <p>IMPORTANT: mappings can be removed after 1 release as there will be a "migration path".
    +	 * As an example, a serializer is removed in 1.5-SNAPSHOT, then the mapping should be added for 1.5,
    +	 * and it can be removed in 1.6, as the path would be Flink-{< 1.5} -> Flink-1.5 -> Flink-{>= 1.6}.
    +	 */
    +	private enum MigrationUtil {
    +
    +		// To add a new mapping just pick a name and add an entry as the following:
    +
    +		GENERIC_DATA_ARRAY_SERIALIZER(
    +				"org.apache.avro.generic.GenericData$Array",
    +				ObjectStreamClass.lookup(KryoRegistrationSerializerConfigSnapshot.DummyRegisteredClass.class)),
    +		HASH_MAP_SERIALIZER(
    +				"org.apache.flink.runtime.state.HashMapSerializer",
    +				ObjectStreamClass.lookup(MapSerializer.class)); // added in 1.5
    +
    +		/** An internal unmoddifiable map containing the mappings between deprecated and new serialzers. */
    +		private static final Map<String, ObjectStreamClass> equivalenceMap = Collections.unmodifiableMap(initMap());
    +
    +		/** The full name of the class of the old serializer. */
    +		private final String oldSerializer;
    +
    +		/** The serialization descriptor of the class of the new serializer. */
    +		private final ObjectStreamClass newSerializer;
    --- End diff --
    
    Here I suggest to add `StreamClass` to the name.


---

[GitHub] flink pull request #5910: [FLINK-8841] [state] Remove HashMapSerializer and ...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5910#discussion_r184304800
  
    --- Diff: flink-core/src/main/java/org/apache/flink/util/InstantiationUtil.java ---
    @@ -221,6 +226,52 @@ protected ObjectStreamClass readClassDescriptor() throws IOException, ClassNotFo
     		}
     	}
     
    +	/**
    +	 * A mapping between the full path of a deprecated serializer and its equivalent.
    +	 * These mappings are hardcoded and fixed.
    +	 *
    +	 * <p>IMPORTANT: mappings can be removed after 1 release as there will be a "migration path".
    +	 * As an example, a serializer is removed in 1.5-SNAPSHOT, then the mapping should be added for 1.5,
    +	 * and it can be removed in 1.6, as the path would be Flink-{< 1.5} -> Flink-1.5 -> Flink-{>= 1.6}.
    +	 */
    +	private enum MigrationUtil {
    +
    +		// To add a new mapping just pick a name and add an entry as the following:
    +
    +		GENERIC_DATA_ARRAY_SERIALIZER(
    +				"org.apache.avro.generic.GenericData$Array",
    +				ObjectStreamClass.lookup(KryoRegistrationSerializerConfigSnapshot.DummyRegisteredClass.class)),
    +		HASH_MAP_SERIALIZER(
    +				"org.apache.flink.runtime.state.HashMapSerializer",
    +				ObjectStreamClass.lookup(MapSerializer.class)); // added in 1.5
    +
    +		/** An internal unmoddifiable map containing the mappings between deprecated and new serialzers. */
    +		private static final Map<String, ObjectStreamClass> equivalenceMap = Collections.unmodifiableMap(initMap());
    +
    +		/** The full name of the class of the old serializer. */
    +		private final String oldSerializer;
    +
    +		/** The serialization descriptor of the class of the new serializer. */
    +		private final ObjectStreamClass newSerializer;
    +
    +		MigrationUtil(final String oldSerializer, final ObjectStreamClass newSerializer) {
    --- End diff --
    
    the `final` in on these parameters is just noise


---

[GitHub] flink pull request #5910: [FLINK-8841] [state] Remove HashMapSerializer and ...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5910#discussion_r184304555
  
    --- Diff: flink-core/src/main/java/org/apache/flink/util/InstantiationUtil.java ---
    @@ -221,6 +226,52 @@ protected ObjectStreamClass readClassDescriptor() throws IOException, ClassNotFo
     		}
     	}
     
    +	/**
    +	 * A mapping between the full path of a deprecated serializer and its equivalent.
    +	 * These mappings are hardcoded and fixed.
    +	 *
    +	 * <p>IMPORTANT: mappings can be removed after 1 release as there will be a "migration path".
    +	 * As an example, a serializer is removed in 1.5-SNAPSHOT, then the mapping should be added for 1.5,
    +	 * and it can be removed in 1.6, as the path would be Flink-{< 1.5} -> Flink-1.5 -> Flink-{>= 1.6}.
    +	 */
    +	private enum MigrationUtil {
    +
    +		// To add a new mapping just pick a name and add an entry as the following:
    +
    +		GENERIC_DATA_ARRAY_SERIALIZER(
    +				"org.apache.avro.generic.GenericData$Array",
    +				ObjectStreamClass.lookup(KryoRegistrationSerializerConfigSnapshot.DummyRegisteredClass.class)),
    +		HASH_MAP_SERIALIZER(
    +				"org.apache.flink.runtime.state.HashMapSerializer",
    +				ObjectStreamClass.lookup(MapSerializer.class)); // added in 1.5
    +
    +		/** An internal unmoddifiable map containing the mappings between deprecated and new serialzers. */
    +		private static final Map<String, ObjectStreamClass> equivalenceMap = Collections.unmodifiableMap(initMap());
    +
    +		/** The full name of the class of the old serializer. */
    +		private final String oldSerializer;
    --- End diff --
    
    mabye add `name` to the variable name to show it is a string, not a serializer.


---

[GitHub] flink pull request #5910: [FLINK-8841] [state] Remove HashMapSerializer and ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/5910


---

[GitHub] flink issue #5910: [FLINK-8841] [state] Remove HashMapSerializer and use Map...

Posted by kl0u <gi...@git.apache.org>.
Github user kl0u commented on the issue:

    https://github.com/apache/flink/pull/5910
  
    Thanks @yuqi1129, @bowenli86 and @StefanRRichter for the reviews. I integrated your comments. If you are done with reviewing, I will merge it as soon as Travis gives the green light.


---