You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alexey Trenikhun <ye...@msn.com> on 2019/02/08 02:38:51 UTC

Re: MapState - TypeSerializer

What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey


________________________________
From: Andrey Zagrebin <an...@data-artisans.com>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: user@flink.apache.org
Subject: Re: MapState - TypeSerializer

Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <ye...@msn.com>> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey


Re: MapState - TypeSerializer

Posted by Alexey Trenikhun <ye...@msn.com>.
It seems changed since "Flink Forward Berlin 2018" (https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-tzuli-gordon-tai-upgrading-apache-flink-applications-state-of-the-union), slide 25, where I see part of entries V1 and part V2. Thank you for up-to-date links.
[https://cdn.slidesharecdn.com/ss_thumbnails/2018-09-041220-1300gordontaiupgradingapacheflinkapplications-180907101008-thumbnail-4.jpg?cb=1536916550]<https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-tzuli-gordon-tai-upgrading-apache-flink-applications-state-of-the-union>

Flink Forward Berlin 2018: Tzu-Li (Gordon) Tai - "Upgrading Apache Fl…<https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-tzuli-gordon-tai-upgrading-apache-flink-applications-state-of-the-union>
Apache Flink streaming applications are typically designed to run indefinitely for long periods of time. As with all long-running services, the applications ne…
www.slideshare.net


Thanks,
Alexey
________________________________
From: Yun Tang <my...@live.com>
Sent: Thursday, February 7, 2019 10:32 PM
To: Alexey Trenikhun; Congxian Qiu
Cc: user@flink.apache.org
Subject: Re: MapState - TypeSerializer

Hi Alexey

First of all, 'TypeSerializerConfigSnapshot' has actually been deprecated from Flink-1.7 [1], current serializer's snapshot class should be 'TypeSerializerSnapshot'.

And answer your question, only TypeSerializer2's snapshot would be stored during checkpoint. For off-heap state backend (e.g. RocksDBStateBackend), state migration happened before any actual state read/write [2], all data would be stored in RocksDB using latest serializer after loading from savepoint.

These three pictures below, borrowed from Gordon's talk at Flink Forward China 2018 [3], should give a vivid interpretation.

[cid:75003e80-a942-44dc-b298-003505f189dd][cid:ddd5868e-f6aa-4a9d-80c5-7ad1d7f89241]
[cid:64b0ee2e-9788-44e8-bde5-5bd528074e85]



[1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/custom_serialization.html#migrating-from-deprecated-serializer-snapshot-apis-before-flink-17
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/custom_serialization.html#off-heap-state-backends-eg-rocksdbstatebackend
[3] https://files.alicdn.com/tpsservice/d9fde10f25b061916eab468ac2c1fc47.pdf

Best
Yun Tang
________________________________
From: Alexey Trenikhun <ye...@msn.com>
Sent: Friday, February 8, 2019 12:35
To: Congxian Qiu
Cc: user@flink.apache.org
Subject: Re: MapState - TypeSerializer

But it will be two TypeSerializerConfigSnapshots, otherwise unclear how TypeSerializer2 will able to check compatibility?

Thanks,
Alexey


________________________________
From: Congxian Qiu <qc...@gmail.com>
Sent: Thursday, February 7, 2019 8:14 PM
To: Alexey Trenikhun
Cc: user@flink.apache.org
Subject: Re: MapState - TypeSerializer

Hi, Alexey
    In your case, only TypeSerializer2 will be stored in meta information. and TypeSerializer2 and TypeSeriaizer1 have to be compatible.

Best,
Congxian


Alexey Trenikhun <ye...@msn.com>> 于2019年2月8日周五 上午10:39写道:
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey


________________________________
From: Andrey Zagrebin <an...@data-artisans.com>>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: user@flink.apache.org<ma...@flink.apache.org>
Subject: Re: MapState - TypeSerializer

Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <ye...@msn.com>> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey


Re: MapState - TypeSerializer

Posted by Yun Tang <my...@live.com>.
Hi Alexey

First of all, 'TypeSerializerConfigSnapshot' has actually been deprecated from Flink-1.7 [1], current serializer's snapshot class should be 'TypeSerializerSnapshot'.

And answer your question, only TypeSerializer2's snapshot would be stored during checkpoint. For off-heap state backend (e.g. RocksDBStateBackend), state migration happened before any actual state read/write [2], all data would be stored in RocksDB using latest serializer after loading from savepoint.

These three pictures below, borrowed from Gordon's talk at Flink Forward China 2018 [3], should give a vivid interpretation.

[cid:75003e80-a942-44dc-b298-003505f189dd][cid:ddd5868e-f6aa-4a9d-80c5-7ad1d7f89241]
[cid:64b0ee2e-9788-44e8-bde5-5bd528074e85]



[1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/custom_serialization.html#migrating-from-deprecated-serializer-snapshot-apis-before-flink-17
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/custom_serialization.html#off-heap-state-backends-eg-rocksdbstatebackend
[3] https://files.alicdn.com/tpsservice/d9fde10f25b061916eab468ac2c1fc47.pdf

Best
Yun Tang
________________________________
From: Alexey Trenikhun <ye...@msn.com>
Sent: Friday, February 8, 2019 12:35
To: Congxian Qiu
Cc: user@flink.apache.org
Subject: Re: MapState - TypeSerializer

But it will be two TypeSerializerConfigSnapshots, otherwise unclear how TypeSerializer2 will able to check compatibility?

Thanks,
Alexey


________________________________
From: Congxian Qiu <qc...@gmail.com>
Sent: Thursday, February 7, 2019 8:14 PM
To: Alexey Trenikhun
Cc: user@flink.apache.org
Subject: Re: MapState - TypeSerializer

Hi, Alexey
    In your case, only TypeSerializer2 will be stored in meta information. and TypeSerializer2 and TypeSeriaizer1 have to be compatible.

Best,
Congxian


Alexey Trenikhun <ye...@msn.com>> 于2019年2月8日周五 上午10:39写道:
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey


________________________________
From: Andrey Zagrebin <an...@data-artisans.com>>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: user@flink.apache.org<ma...@flink.apache.org>
Subject: Re: MapState - TypeSerializer

Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <ye...@msn.com>> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey


Re: MapState - TypeSerializer

Posted by Alexey Trenikhun <ye...@msn.com>.
But it will be two TypeSerializerConfigSnapshots, otherwise unclear how TypeSerializer2 will able to check compatibility?

Thanks,
Alexey


________________________________
From: Congxian Qiu <qc...@gmail.com>
Sent: Thursday, February 7, 2019 8:14 PM
To: Alexey Trenikhun
Cc: user@flink.apache.org
Subject: Re: MapState - TypeSerializer

Hi, Alexey
    In your case, only TypeSerializer2 will be stored in meta information. and TypeSerializer2 and TypeSeriaizer1 have to be compatible.

Best,
Congxian


Alexey Trenikhun <ye...@msn.com>> 于2019年2月8日周五 上午10:39写道:
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information?
Thanks,
Alexey


________________________________
From: Andrey Zagrebin <an...@data-artisans.com>>
Sent: Wednesday, November 28, 2018 2:23 AM
To: Alexey Trenikhun
Cc: user@flink.apache.org<ma...@flink.apache.org>
Subject: Re: MapState - TypeSerializer

Hi Alexey,

it is written once per state name in its meta information, apart from user data entries.

Best,
Andrey

On 28 Nov 2018, at 04:56, Alexey Trenikhun <ye...@msn.com>> wrote:

Hello,
Flink documentation states that “TypeSerializers and TypeSerializerConfigSnapshots are written as part of checkpoints along with the state values”, in context of MapState, does it mean TypeSerializer per each MapState entry or only once per state?
Alexey


Re: MapState - TypeSerializer

Posted by Congxian Qiu <qc...@gmail.com>.
Hi, Alexey
    In your case, only TypeSerializer2 will be stored in meta information.
and TypeSerializer2 and TypeSeriaizer1 have to be compatible.

Best,
Congxian


Alexey Trenikhun <ye...@msn.com> 于2019年2月8日周五 上午10:39写道:

> What if I’m using RocksDB, and MapState had single entry and
> TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put
> new entry, at that point we have two entries written by different
> serializers, so both TypeSerializers should be stored in meta information?
> Thanks,
> Alexey
>
>
> ------------------------------
> *From:* Andrey Zagrebin <an...@data-artisans.com>
> *Sent:* Wednesday, November 28, 2018 2:23 AM
> *To:* Alexey Trenikhun
> *Cc:* user@flink.apache.org
> *Subject:* Re: MapState - TypeSerializer
>
> Hi Alexey,
>
> it is written once per state name in its meta information, apart from user
> data entries.
>
> Best,
> Andrey
>
> On 28 Nov 2018, at 04:56, Alexey Trenikhun <ye...@msn.com> wrote:
>
> Hello,
> Flink documentation states that “TypeSerializers and
> TypeSerializerConfigSnapshots are written as part of checkpoints along with
> the state values”, in context of MapState, does it mean TypeSerializer per
> each MapState entry or only once per state?
> Alexey
>
>
>