You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Stefan Richter (JIRA)" <ji...@apache.org> on 2017/05/08 09:40:04 UTC

[jira] [Created] (FLINK-6484) Send only the registry keys for already registered files in incremental checkpointing

Stefan Richter created FLINK-6484:
-------------------------------------

             Summary: Send only the registry keys for already registered files in incremental checkpointing
                 Key: FLINK-6484
                 URL: https://issues.apache.org/jira/browse/FLINK-6484
             Project: Flink
          Issue Type: Improvement
          Components: State Backends, Checkpointing
            Reporter: Stefan Richter


State handles for files that are (potentially) shared across multiple incremental snapshots are registered under an explicit key in the {{SharedStateRegistry}}. Furthermore, the backend is aware which of those files are new (unregistered) and which are just references to a previous snapshot (already) registered.

The current implementation of incremental checkpoints in RocksDB always includes _all_ state handles (unregistered and previously registered) that are referenced by the snapshot in the ack message to the checkpoint coordinator, assuming that state handles are lightweight pointers.

While this assumption is true in general, there are notable exceptions such as the {{ByteStreamStateHandle}}, which pack the actual payload data into the state handle. While the maximum capacity for each {{ByteStreamStateHandle}} is limited (around 4MB), multiple handles can be part of a snapshot. This makes incremental snapshots over {{ByteStreamStateHandle}} essentially not incremental and can lead to huge ack messages.

To avoid this issue, I propose that we only send the registration key for all previously registered state handles and fetch the corresponding state handle from the {{SharedStateRegistry}} in the checkpoint coordinator to insert them into the checkpoint data.
 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)