You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Muhammad Haseeb Asif <mh...@kth.se> on 2021/08/26 18:41:11 UTC

How heap state backend serialization happens during snapshotting

Hello,

I have been trying to understand how serialization works for the Flink Heap Keyed state backend when a checkpoint happens. I want to know how Flink knows which type of serializer to use to serialize the values for different states during the snapshot.

I have debugged both RocksDB and heap keyed state backend. I can see that state is stored in the COW(Copy on Write) data structure which is copied. It does create a proxy (serializationProxy) for serialization that is used to serialized all the values for different state types (value, list). I cannot understand how the proxy is doing serialization for an individual value and the whole list.



Thanks



Re: How heap state backend serialization happens during snapshotting

Posted by JING ZHANG <be...@gmail.com>.
Hello,
Let me share my views on this issue, correct me if there is anything wrong.

To get a state handle, you have to create a StateDescriptor. This holds the
name of the state, the type of the values that the state holds and other
information. You could pass the type serializer or the type information
when specify the type for the values in the state. Type Information could
generates serializer later. Please see more information in [1].

Snapshot for all meta information about one state in a state backend would
be stored in `StateMetaInfoSnapshot` which would be used in
serializationProxy during snapshotting. You could find more information in
class `OperatorBackendSerializationProxy` and class
`KeyedBackendSerializationProxy`.

Welcome to further discussion.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/fault-tolerance/state/

Best regards,
JING ZHANG

Muhammad Haseeb Asif <mh...@kth.se> 于2021年8月27日周五 上午2:41写道:

> Hello,
>
> I have been trying to understand how serialization works for the Flink
> Heap Keyed state backend when a checkpoint happens. I want to know how
> Flink knows which type of serializer to use to serialize the values for
> different states during the snapshot.
>
> I have debugged both RocksDB and heap keyed state backend. I can see that
> state is stored in the COW(Copy on Write) data structure which is copied.
> It does create a proxy (serializationProxy) for serialization that is used
> to serialized all the values for different state types (value, list). I
> cannot understand how the proxy is doing serialization for an individual
> value and the whole list.
>
>
>
> Thanks
>
>
>