You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/31 12:10:04 UTC

[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

    [ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031060#comment-16031060 ] 

ASF GitHub Bot commented on FLINK-6764:
---------------------------------------

GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/4026

    [FLINK-6764] Deduplicate stateless serializers in checkpoints

    This PR is based on #4014, so only the last commit 39ffe7e is relevant.
    
    Prior to this PR, we would write multiple instances of the same serializer even if it was stateless. This commit changes that by first writing a serializer index at the head of the stream, and only write the index of a serializer when one needs to be written. The index map is built using `IdentitiyHashMap`s, so that stateful serializers are considered as separate entries in the index.
    
    ## Test
    
    New tests are added to `PojoSerializerTest` and `SerializationProxiesTest` to test that stateless serializers are not duplicated on restore.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-6764

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4026.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4026
    
----
commit 852d7b569f02b1783c8410e111455d02127d0b5d
Author: Tzu-Li (Gordon) Tai <tz...@apache.org>
Date:   2017-05-29T18:07:04Z

    [FLINK-6763] [core] Make serialization of composite serializer configs more efficient
    
    This commit affects the serialization formats of configuration snapshots
    of composite serializers, most notably the PojoSerializer, as well as
    others such as MapSerializer, GenericArraySerializer, TupleSerializer,
    etc. It also affects the serialization formats of the
    OperatorBackendSerializationProxy and KeyedBackendSerializationProxy.
    
    Prior to this commit, whenever we write a serializer and its config
    snapshot into a checkpoint, we always write the start offset and end
    offset of the serializer bytes, effectively indexing every serializer
    and its config. This required buffering the whole list of serializer and
    config snapshot pairs when writing the checkpoint.
    
    This commit changes this to be more efficient by just writing the length
    of the serializer bytes prior to writing the serializer. This allows
    lesser buffering for the writes.

commit 19b2f6abfd780d2456a0a7f7bb5dc0de3001ee78
Author: Tzu-Li (Gordon) Tai <tz...@apache.org>
Date:   2017-05-30T14:33:36Z

    [FLINK-6763] Include excludeSerializer flag in PojoSerializerConfigSnapshot

commit f002db90e80bbec4641c3baa4501d69b546b71b9
Author: Tzu-Li (Gordon) Tai <tz...@apache.org>
Date:   2017-05-30T17:07:54Z

    [FLINK-6763] Include excludeSerializers flag in CompositeTypeSerializerConfigSnapshot and state backend serialization proxies

commit 39ffe7ea1fbe289090ee72d97f2ccef3cdec049f
Author: Tzu-Li (Gordon) Tai <tz...@apache.org>
Date:   2017-05-31T12:00:45Z

    [FLINK-6474] Deduplicate stateless serializers from checkpoints
    
    Prior to this commit, we would write multiple instances of the same
    serializer even if it was stateless. This commit changes that by first
    writing a serializer index at the head of the stream, and only write the
    index of a serializer when one needs to be written. The index map is
    built using IdentitiyHashMaps, so that stateful serializers are
    considered as separate entries in the index.

----


> Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-6764
>                 URL: https://issues.apache.org/jira/browse/FLINK-6764
>             Project: Flink
>          Issue Type: Improvement
>          Components: Type Serialization System
>    Affects Versions: 1.3.0, 1.4.0
>            Reporter: Till Rohrmann
>            Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved by deduplicating stateless {{TypeSerializer}} when being serialized. This would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)