You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/30 09:03:33 UTC

[GitHub] [flink] carp84 commented on a change in pull request #10498: [FLINK-14495][docs] Add documentation for memory control of RocksDB state backend

carp84 commented on a change in pull request #10498: [FLINK-14495][docs] Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10498#discussion_r372819133

##########
File path: docs/ops/state/large_state_tuning.md
##########
@@ -210,6 +211,71 @@ and not from the JVM. Any memory you assign to RocksDB will have to be accounted
of the TaskManagers by the same amount. Not doing that may result in YARN/Mesos/etc terminating the JVM processes for
allocating more memory than configured.

+### Bounding RocksDB Memory Usage
+
+RocksDB allocates native memory outside of the JVM, which could lead the process to exceed the total memory budget.
+This can be especially problematic in containerized environments such as Kubernetes that kill processes who exceed their memory budgets.
+Flink limit total memory usage of RocksDB instance(s) per slot by leveraging shareable [cache](https://github.com/facebook/rocksdb/wiki/Block-Cache)
+and [write buffer manager](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager) among all instances in a single slot by default.
+The shared cache will place an upper limit on the [three components](https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB) that use the majority of memory
+when RocksDB is deployed as a state backend: block cache, index and bloom filters, and MemTables.
+This feature is enabled by default and could be controlled by two ways:
+ - Integrate with managed memory of task manager: turn `state.backend.rocksdb.memory.managed` as true. If so, RocksDB state backend will use the managed memory budget of the task slot to set the capacity of that shared cache object.
+ This operation is enabled by default, which means Flink would always choose to integrate RocksDB memory usage with the managed memory first.
+ - Not integrated with managed memory: configure the memory size of `state.backend.rocksdb.memory.fixed-per-slot` to set the fixed total amount of memory per slot.
+ This option will override `state.backend.rocksdb.memory.managed` option when configured and ignore calculated managed memory per slot from task manager.
+ User could also configure `taskmanager.memory.task.off-heap.size` to set additional quota in off-heap memory, which should be equal to `taskmanager.numberOfTaskSlots` * ``state.backend.rocksdb.memory.fixed-per-slot``, to fit in Flink's memory model.
+
+Flink also provides two parameters to tune the memory fraction of MemTable and index & filters:
+ - `state.backend.rocksdb.memory.write-buffer-ratio`, by default `0.5`. If RocksDB memory bounded feature is turned on, 50% of memory size would be used by write buffer manager by default.
+ - `state.backend.rocksdb.memory.high-prio-pool-ratio`, by default `0.1`.
+ If RocksDB memory bounded feature is turned on, 10% 0f memory size would be set as high priority for index and filters in shared block cache by default.
+ By enabling this, index and filters would not need to compete against data blocks for staying in cache to minimize performance problem if those index and filters are evicted by data blocks frequently.

Review comment:
Flink also provides two parameters to tune the memory fraction of MemTable and index & filters along with the bounding RocksDB memory usage feature:
- `state.backend.rocksdb.memory.write-buffer-ratio`, by default `0.5`, which means 50% of the given memory would be used by write buffer manager.
- `state.backend.rocksdb.memory.high-prio-pool-ratio`, by default `0.1`, which means 10% 0f the given memory would be set as high priority for index and filters in shared block cache. We strongly suggest not to set this to zero, to prevent index and filters from competing against data blocks for staying in cache and causing performance issues.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services