You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Stefan Richter (JIRA)" <ji...@apache.org> on 2017/05/08 10:30:04 UTC

[jira] [Created] (FLINK-6485) Use buffering to avoid frequent memtable flushes for short intervals in RockdDB incremental checkpoints

Stefan Richter created FLINK-6485:
-------------------------------------

Summary: Use buffering to avoid frequent memtable flushes for short intervals in RockdDB incremental checkpoints
Key: FLINK-6485
URL: https://issues.apache.org/jira/browse/FLINK-6485
Project: Flink
Issue Type: Improvement
Components: State Backends, Checkpointing
Reporter: Stefan Richter

The current implementation of incremental checkpoitns in RocksDB needs to flush the memtable to disk prior to a checkpoint and this will generate a SST file.

What is required for fast checkpoint intervals is an alternative mechanism to quickly determine a delta from the previous incremental checkpoint to avoid this frequent flushing. This could be implemented through custom buffering inside the backend, e.g. a changelog buffer that is maintain up to a certain size.

The buffer's content becomes part of the private state in the incremental snapshot and the buffer is dropped i) after each checkpoint or ii) after exceeding a certain size that justifies flushing and writing a new SST file.

This mechanism should not be blocking, which we can achieve in the following way:

1) We have a clear upper limit to the buffer size (e.g. 64MB), once the limit of diffs is reached, we can drop the buffer because we can assume enough work was done to justify a new SST file

2) We write the buffer to a local FS, so we can expect this to be reasonable fast and that it will not suffer from the kind of blocking that we have in DFS. I mean technically, also flushing the SST file can block. Then, in the async part, we can transfer the locally written buffer file to DFS.

There might be other mechanisms in RocksDB that we could exploit for this, such as the write ahead log, but this could be already be a good solution.

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)