You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ta...@apache.org on 2022/03/21 06:25:04 UTC

[flink] branch release-1.15 updated: [FLINK-5151][docs] Add descriptions about object mutations of state backends

This is an automated email from the ASF dual-hosted git repository.

tangyun pushed a commit to branch release-1.15
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.15 by this push:
     new 3cdcbaf1 [FLINK-5151][docs] Add descriptions about object mutations of state backends
3cdcbaf1 is described below

commit 3cdcbaf1fb874d751f3ae625cfebf67384a8abcb
Author: Hangxiang Yu <ma...@gmail.com>
AuthorDate: Fri Mar 18 12:59:37 2022 +0800

    [FLINK-5151][docs] Add descriptions about object mutations of state backends
    
    This closes #19143.
---
 docs/content.zh/docs/ops/state/state_backends.md | 3 +++
 docs/content/docs/ops/state/state_backends.md    | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/docs/content.zh/docs/ops/state/state_backends.md b/docs/content.zh/docs/ops/state/state_backends.md
index e6d6764..f5776db 100644
--- a/docs/content.zh/docs/ops/state/state_backends.md
+++ b/docs/content.zh/docs/ops/state/state_backends.md
@@ -59,6 +59,8 @@ HashMapStateBackend 的适用场景:
 
 建议同时将 [managed memory]({{< ref "docs/deployment/memory/mem_setup_tm" >}}#managed-memory) 设为0,以保证将最大限度的内存分配给 JVM 上的用户代码。
 
+与 EmbeddedRocksDBStateBackend 不同的是,由于 HashMapStateBackend 将数据以对象形式存储在堆中,因此重用这些对象数据是不安全的。
+
 ### EmbeddedRocksDBStateBackend
 
 EmbeddedRocksDBStateBackend 将正在运行中的状态数据保存在 [RocksDB](http://rocksdb.org) 数据库中,RocksDB 数据库默认将数据存储在 TaskManager 的数据目录。
@@ -79,6 +81,7 @@ EmbeddedRocksDBStateBackend 的适用场景:
 注意,你可以保留的状态大小仅受磁盘空间的限制。与状态存储在内存中的 HashMapStateBackend 相比,EmbeddedRocksDBStateBackend 允许存储非常大的状态。
 然而,这也意味着使用 EmbeddedRocksDBStateBackend 将会使应用程序的最大吞吐量降低。
 所有的读写都必须序列化、反序列化操作,这个比基于堆内存的 state backend 的效率要低很多。
+同时因为存在这些序列化、反序列化操作,重用放入 EmbeddedRocksDBStateBackend 的对象是安全的。
 
 请同时参考 [Task Executor 内存配置]({{< ref "docs/deployment/memory/mem_tuning" >}}#rocksdb-state-backend) 中关于 EmbeddedRocksDBStateBackend 的建议。
 
diff --git a/docs/content/docs/ops/state/state_backends.md b/docs/content/docs/ops/state/state_backends.md
index aa06c909..065c7a4 100644
--- a/docs/content/docs/ops/state/state_backends.md
+++ b/docs/content/docs/ops/state/state_backends.md
@@ -61,6 +61,8 @@ The HashMapStateBackend is encouraged for:
 It is also recommended to set [managed memory]({{< ref "docs/deployment/memory/mem_setup_tm" >}}#managed-memory) to zero.
 This will ensure that the maximum amount of memory is allocated for user code on the JVM.
 
+Unlike EmbeddedRocksDBStateBackend, the HashMapStateBackend stores data as objects on the heap so that it is unsafe to reuse objects.
+
 ### The EmbeddedRocksDBStateBackend
 
 The EmbeddedRocksDBStateBackend holds in-flight data in a [RocksDB](http://rocksdb.org) database
@@ -83,7 +85,7 @@ Note that the amount of state that you can keep is only limited by the amount of
 This allows keeping very large state, compared to the HashMapStateBackend that keeps state in memory.
 This also means, however, that the maximum throughput that can be achieved will be lower with
 this state backend. All reads/writes from/to this backend have to go through de-/serialization to retrieve/store the state objects, which is also more expensive than always working with the
-on-heap representation as the heap-based backends are doing.
+on-heap representation as the heap-based backends are doing. It's safe for EmbeddedRocksDBStateBackend to reuse objects due to the de-/serialization.
 
 Check also recommendations about the [task executor memory configuration]({{< ref "docs/deployment/memory/mem_tuning" >}}#rocksdb-state-backend) for the EmbeddedRocksDBStateBackend.