You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "fanrui (Jira)" <ji...@apache.org> on 2021/03/11 04:04:00 UTC
[jira] [Created] (FLINK-21726) Fix checkpoint stuck
fanrui created FLINK-21726:
------------------------------
Summary: Fix checkpoint stuck
Key: FLINK-21726
URL: https://issues.apache.org/jira/browse/FLINK-21726
Project: Flink
Issue Type: Bug
Components: Runtime / State Backends
Reporter: fanrui
Fix For: 1.13.0
h1. 1. Bug description:
When RocksDB Checkpoint, it may be stuck in `WaitUntilFlushWouldNotStallWrites` method.
h1.
2. Simple analysis of the reasons:
h2.
2.1 Configuration parameters:
{code:java}
# Flink yaml:
state.backend.rocksdb.predefined-options: SPINNING_DISK_OPTIMIZED_HIGH_MEM
state.backend.rocksdb.compaction.style: UNIVERSAL
# corresponding RocksDB config
Compaction Style : Universal
max_write_buffer_number : 4
min_write_buffer_number_to_merge : 3{code}
Checkpoint is usually very fast. When the Checkpoint is executed, `WaitUntilFlushWouldNotStallWrites` is called. If there are 2 Immutable MemTables, which are less than `min_write_buffer_number_to_merge`, they will not be flushed. But will enter this code.
{code:java}
// method: GetWriteStallConditionAndCause
if (mutable_cf_options.max_write_buffer_number> 3 &&
num_unflushed_memtables >=
mutable_cf_options.max_write_buffer_number-1) {
return {WriteStallCondition::kDelayed, WriteStallCause::kMemtableLimit};
}
{code}
code link: https://github.com/facebook/rocksdb/blob/fbed72f03c3d9e4fdca3e5993587ef2559ba6ab9/db/column_family.cc#L847
Checkpoint thought there was a FlushJob, but it didn't. So will always wait.
h2.
2.2 solution:
Increase the restriction: the `number of Immutable MemTable` >= `min_write_buffer_number_to_merge will wait`.
The rocksdb community has fixed this bug, link: https://github.com/facebook/rocksdb/pull/7921
h2.
2.3 Code that can reproduce the problem:
https://github.com/1996fanrui/fanrui-learning/blob/flink-1.12/module-java/src/main/java/com/dream/rocksdb/RocksDBCheckpointStuck.java
h1. 3. Interesting point
This bug will be triggered only when `the number of sorted runs >= level0_file_num_compaction_trigger`.
Because there is a break in WaitUntilFlushWouldNotStallWrites.
{code:java}
if (cfd->imm()->NumNotFlushed() <
cfd->ioptions()->min_write_buffer_number_to_merge &&
vstorage->l0_delay_trigger_count() <
mutable_cf_options.level0_file_num_compaction_trigger) {
break;
}
{code}
code link: https://github.com/facebook/rocksdb/blob/fbed72f03c3d9e4fdca3e5993587ef2559ba6ab9/db/db_impl/db_impl_compaction_flush.cc#L1974
Universal may have `l0_delay_trigger_count() >= level0_file_num_compaction_trigger`, so this bug is triggered.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)