You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/31 22:51:22 UTC

[GitHub] [flink] StephanEwen opened a new pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

StephanEwen opened a new pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987
 
 
   ## What is the purpose of the change
   
   This is an extension of #10498 with some more edits and some restructuring.
   
   The applied changes are
     - Move most of the in-depth RocksDB contents to the State Backends docs (RocksDB section).
     - Let the "tuning large state" docs only talk about actual tuning and trouble-shooting.
     - Add some more introductory notes on the motivation and what users should expect and do.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373759442
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.zh.md
 ##########
 @@ -210,6 +211,67 @@ and not from the JVM. Any memory you assign to RocksDB will have to be accounted
 of the TaskManagers by the same amount. Not doing that may result in YARN/Mesos/etc terminating the JVM processes for
 allocating more memory than configured.
 
+### Bounding RocksDB Memory Usage
 
 Review comment:
   The adjustment/reorganization of sections seems not reflected in this Chinese doc accordingly...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373783320
 
 

 ##########
 File path: docs/ops/state/state_backends.zh.md
 ##########
 @@ -119,6 +119,8 @@ RocksDBStateBackend 是目前唯一支持增量 CheckPoint 的 State Backend (
 
 可以使用一些 RocksDB 的本地指标(metrics),但默认是关闭的。你能在 [这里]({{ site.baseurl }}/zh/ops/config.html#rocksdb-native-metrics) 找到关于 RocksDB 本地指标的文档。
 
+The total memory amount of RocksDB instance(s) per slot can also be bounded, please refer to documentation [here](large_state_tuning.html#bounding-rocksdb-memory-usage) for details.
 
 Review comment:
   Done through #10990 .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373879710
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.zh.md
 ##########
 @@ -210,6 +211,67 @@ and not from the JVM. Any memory you assign to RocksDB will have to be accounted
 of the TaskManagers by the same amount. Not doing that may result in YARN/Mesos/etc terminating the JVM processes for
 allocating more memory than configured.
 
+### Bounding RocksDB Memory Usage
+
+RocksDB allocates native memory outside of the JVM, which could lead the process to exceed the total memory budget.
+This can be especially problematic in containerized environments such as Kubernetes that kill processes who exceed their memory budgets.
+
+Flink limit total memory usage of RocksDB instance(s) per slot by leveraging shareable [cache](https://github.com/facebook/rocksdb/wiki/Block-Cache)
+and [write buffer manager](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager) among all instances in a single slot by default.
+The shared cache will place an upper limit on the [three components](https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB) that use the majority of memory
+when RocksDB is deployed as a state backend: block cache, index and bloom filters, and MemTables. 
+
+This feature is enabled by default along with managed memory. Flink will use the managed memory budget as the per-slot memory limit for RocksDB state backend(s).
+
+Flink also provides two parameters to tune the memory fraction of MemTable and index & filters along with the bounding RocksDB memory usage feature:
+  - `state.backend.rocksdb.memory.write-buffer-ratio`, by default `0.5`, which means 50% of the given memory would be used by write buffer manager.
+  - `state.backend.rocksdb.memory.high-prio-pool-ratio`, by default `0.1`, which means 10% of the given memory would be set as high priority for index and filters in shared block cache.
+  We strongly suggest not to set this to zero, to prevent index and filters from competing against data blocks for staying in cache and causing performance issues.
+  Moreover, the L0 level filter and index are pinned into the cache by default to mitigate performance problems,
+  more details please refer to the [RocksDB-documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-filter-and-compression-dictionary-blocks).
+
+<span class="label label-info">Note</span> When bounded RocksDB memory usage is enabled by default,
+the shared `cache` and `write buffer manager` will override customized settings of block cache and write buffer via `PredefinedOptions` and `OptionsFactory`.
+
+*Experts only*: To control memory manually instead of using managed memory, user can set `state.backend.rocksdb.memory.managed` as `false` and control via `ColumnFamilyOptions`.
+Or to save some manual calculation, through the `state.backend.rocksdb.memory.fixed-per-slot` option which will override `state.backend.rocksdb.memory.managed` when configured.
+With the later method, please tune down `taskmanager.memory.managed.size` or `taskmanager.memory.managed.fraction` to `0` 
+and increase `taskmanager.memory.task.off-heap.size` by "`taskmanager.numberOfTaskSlots` * `state.backend.rocksdb.memory.fixed-per-slot`" accordingly.
+
+#### Tune performance when bounding RocksDB memory usage.
+
+There might existed performance regression compared with previous no-memory-limit case if you have too many states per slot.
+- If you observed this behavior and not running jobs in containerized environment or don't care about the over-limit memory usage.
+The easiest way to wipe out the performance regression is to disable memory bound for RocksDB, e.g. turn `state.backend.rocksdb.memory.managed` as `false`.
+Moreover, please refer to [memory configuration migration guide](WIP) to know how to keep backward compatibility to previous memory configuration.
 
 Review comment:
   This is an artifact of not copying the English text to the Chinese page as a placeholder.
   Will fix that...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#issuecomment-580956525
 
 
   <!--
   Meta data
   Hash:0b6edc798a49c9f21887e7dfe2cd81f879153c07 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4744 TriggerType:PUSH TriggerID:0b6edc798a49c9f21887e7dfe2cd81f879153c07
   Hash:0b6edc798a49c9f21887e7dfe2cd81f879153c07 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/147004602 TriggerType:PUSH TriggerID:0b6edc798a49c9f21887e7dfe2cd81f879153c07
   -->
   ## CI report:
   
   * 0b6edc798a49c9f21887e7dfe2cd81f879153c07 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/147004602) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4744) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#issuecomment-580946351
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 0b6edc798a49c9f21887e7dfe2cd81f879153c07 (Fri Jan 31 22:53:37 UTC 2020)
   
    ✅no warnings
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 closed pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 closed pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373759642
 
 

 ##########
 File path: docs/ops/state/state_backends.md
 ##########
 @@ -186,8 +188,145 @@ state.backend: filesystem
 state.checkpoints.dir: hdfs://namenode:40010/flink/checkpoints
 {% endhighlight %}
 
-#### RocksDB State Backend Config Options
 
-{% include generated/rocks_db_configuration.html %}
+# RocksDB State Backend Details
+
+*This section describes the RocksDB state backend in more detail.*
+
+### Incremental Checkpoints
+
+RocksDB supports *Incremental Checkpoints*, which can dramatically reduce the checkpointing time in comparison to full checkpoints.
+Instead of producing a full, self-contained backup of the state backend, incremental checkpoints only record the changes that happened since the latest completed checkpoint.
+
+An incremental checkpoint builds upon (typically multiple) previous checkpoints. Flink leverages RocksDB's internal compaction mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+
+Recovery time of incremental checkpoints may be longer or shorter comapared to full checkpoints. If your network bandwidth is the bottleneck, it may take a bit longer to restore from an incremental checkpoint, because it implies fetching more data (more deltas). Restoring from an incremental checkpoint is faster, if the bottleneck is your CPU or IOPs, because restoring from an incremental checkpoint means not re-building the local RocksDB tables from Flink's canonical key/value snapshot format(used in savepoints and full checkpoints).
 
 Review comment:
   missing a blank in the last line:
   `format(used in savepoints and full checkpoints)`
   -> `format (used in savepoints and full checkpoints)`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373755951
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.md
 ##########
 @@ -118,98 +118,70 @@ Other state like keyed state is still snapshotted asynchronously. Please note th
 The state storage workhorse of many large scale Flink streaming applications is the *RocksDB State Backend*.
 The backend scales well beyond main memory and reliably stores large [keyed state](../../dev/stream/state/state.html).
 
-Unfortunately, RocksDB's performance can vary with configuration, and there is little documentation on how to tune
-RocksDB properly. For example, the default configuration is tailored towards SSDs and performs suboptimal
-on spinning disks.
+RocksDB's performance can vary with configuration, this section outlines some best-practices for tuning jobs that use the RocksDB State Backend.
 
-**Incremental Checkpoints**
+### Incremental Checkpoints
 
-Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, at the cost of a (potentially) longer
-recovery time. The core idea is that incremental checkpoints only record all changes to the previous completed checkpoint, instead of
-producing a full, self-contained backup of the state backend. Like this, incremental checkpoints build upon previous checkpoints. Flink leverages
-RocksDB's internal backup mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink
-does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+When it comes to reducing the time that checkpoints take, activating incremental checkpoints should be one of the first considerations.
+Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, because incremental checkpoints only record the changes compared to the previous completed checkpoint, instead of producing a full, self-contained backup of the state backend.
 
-While we strongly encourage the use of incremental checkpoints for large state, please note that this is a new feature and currently not enabled 
-by default. To enable this feature, users can instantiate a `RocksDBStateBackend` with the corresponding boolean flag in the constructor set to `true`, e.g.:
+See [Incremental Checkpoints in RocksDB]({{ site.baseurl }}/ops/state/state_backends.html#incremental-checkpoints) for more background information.
 
-{% highlight java %}
-    RocksDBStateBackend backend =
-        new RocksDBStateBackend(filebackend, true);
-{% endhighlight %}
-
-**RocksDB Timers**
+### Timers in RocksDB or on JVM Heap
 
-For RocksDB, a user can chose whether timers are stored on the heap or inside RocksDB (default). Heap-based timers can have a better performance for smaller numbers of
-timers, while storing timers inside RocksDB offers higher scalability as the number of timers in RocksDB can exceed the available main memory (spilling to disk).
+Timers are stored in RocksDB by default, which is the more robust and scalable choice.
 
-When using RockDB as state backend, the type of timer storage can be selected through Flink's configuration via option key `state.backend.rocksdb.timer-service.factory`.
-Possible choices are `heap` (to store timers on the heap, default) and `rocksdb` (to store timers in RocksDB).
+When performance-tuning jobs that have few timers only (no windows, not using timers in ProcessFunction), putting those timers on the heap can increase performance.
+Use this feature carefully, as heap-based timers may increase checkpointing times and naturally cannot scale beyond memory.
 
-<span class="label label-info">Note</span> *The combination RocksDB state backend with heap-based timers currently does NOT support asynchronous snapshots for the timers state.
-Other state like keyed state is still snapshotted asynchronously. Please note that this is not a regression from previous versions and will be resolved with `FLINK-10026`.*
+See [this section]({{ site.baseurl }}/ops/state/state_backends.html#timers-heap-vs-rocksdb) for details how to configure heap-based timers.
 
-**Predefined Options**
+### Tuning RocksDB Memory
 
-Flink provides some predefined collections of option for RocksDB for different settings, and there existed two ways
-to pass these predefined options to RocksDB:
-  - Configure the predefined options through `flink-conf.yaml` via option key `state.backend.rocksdb.predefined-options`.
-    The default value of this option is `DEFAULT` which means `PredefinedOptions.DEFAULT`.
-  - Set the predefined options programmatically, e.g. `RocksDBStateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED_HIGH_MEM)`.
+The performance of the RocksDB State Backend much depends on the amount of memory that it has available. To increase performance, adding memory can help a lot, or adjusting to which functions memory goes.
 
-We expect to accumulate more such profiles over time. Feel free to contribute such predefined option profiles when you
-found a set of options that work well and seem representative for certain workloads.
+By default, the RocksDB State Backend uses Flink's managed memory budget for RocksDBs buffers and caches (`state.backend.rocksdb.memory.managed: true`). Please refer to the [RocksDB Memory Management]({{ site.baseurl }}/ops/state/state_backends.html#memory-management) for background on how that mechanism works.
 
-<span class="label label-info">Note</span> Predefined options which set programmatically would override the one configured via `flink-conf.yaml`.
+To tune memory-related performance issues, the following steps may be helpful:
 
-**Passing Options Factory to RocksDB**
-
-There existed two ways to pass options factory to RocksDB in Flink:
+  - The first step to try and increase performance should be to increase the amout of managed memory. This usually improves the situation a lot, without opening up the complexity of tuning low-level RocksDB options.
 
 Review comment:
   minor typo: amout -> amount

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
sjwiesman commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373731353
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.zh.md
 ##########
 @@ -210,6 +211,67 @@ and not from the JVM. Any memory you assign to RocksDB will have to be accounted
 of the TaskManagers by the same amount. Not doing that may result in YARN/Mesos/etc terminating the JVM processes for
 allocating more memory than configured.
 
+### Bounding RocksDB Memory Usage
+
+RocksDB allocates native memory outside of the JVM, which could lead the process to exceed the total memory budget.
+This can be especially problematic in containerized environments such as Kubernetes that kill processes who exceed their memory budgets.
+
+Flink limit total memory usage of RocksDB instance(s) per slot by leveraging shareable [cache](https://github.com/facebook/rocksdb/wiki/Block-Cache)
+and [write buffer manager](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager) among all instances in a single slot by default.
+The shared cache will place an upper limit on the [three components](https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB) that use the majority of memory
+when RocksDB is deployed as a state backend: block cache, index and bloom filters, and MemTables. 
+
+This feature is enabled by default along with managed memory. Flink will use the managed memory budget as the per-slot memory limit for RocksDB state backend(s).
+
+Flink also provides two parameters to tune the memory fraction of MemTable and index & filters along with the bounding RocksDB memory usage feature:
+  - `state.backend.rocksdb.memory.write-buffer-ratio`, by default `0.5`, which means 50% of the given memory would be used by write buffer manager.
+  - `state.backend.rocksdb.memory.high-prio-pool-ratio`, by default `0.1`, which means 10% of the given memory would be set as high priority for index and filters in shared block cache.
+  We strongly suggest not to set this to zero, to prevent index and filters from competing against data blocks for staying in cache and causing performance issues.
+  Moreover, the L0 level filter and index are pinned into the cache by default to mitigate performance problems,
+  more details please refer to the [RocksDB-documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-filter-and-compression-dictionary-blocks).
+
+<span class="label label-info">Note</span> When bounded RocksDB memory usage is enabled by default,
+the shared `cache` and `write buffer manager` will override customized settings of block cache and write buffer via `PredefinedOptions` and `OptionsFactory`.
+
+*Experts only*: To control memory manually instead of using managed memory, user can set `state.backend.rocksdb.memory.managed` as `false` and control via `ColumnFamilyOptions`.
+Or to save some manual calculation, through the `state.backend.rocksdb.memory.fixed-per-slot` option which will override `state.backend.rocksdb.memory.managed` when configured.
+With the later method, please tune down `taskmanager.memory.managed.size` or `taskmanager.memory.managed.fraction` to `0` 
+and increase `taskmanager.memory.task.off-heap.size` by "`taskmanager.numberOfTaskSlots` * `state.backend.rocksdb.memory.fixed-per-slot`" accordingly.
+
+#### Tune performance when bounding RocksDB memory usage.
+
+There might existed performance regression compared with previous no-memory-limit case if you have too many states per slot.
+- If you observed this behavior and not running jobs in containerized environment or don't care about the over-limit memory usage.
+The easiest way to wipe out the performance regression is to disable memory bound for RocksDB, e.g. turn `state.backend.rocksdb.memory.managed` as `false`.
+Moreover, please refer to [memory configuration migration guide](WIP) to know how to keep backward compatibility to previous memory configuration.
 
 Review comment:
   This will break the nightly tests because it’s an invalid link

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373760008
 
 

 ##########
 File path: docs/ops/state/state_backends.zh.md
 ##########
 @@ -119,6 +119,8 @@ RocksDBStateBackend 是目前唯一支持增量 CheckPoint 的 State Backend (
 
 可以使用一些 RocksDB 的本地指标(metrics),但默认是关闭的。你能在 [这里]({{ site.baseurl }}/zh/ops/config.html#rocksdb-native-metrics) 找到关于 RocksDB 本地指标的文档。
 
+The total memory amount of RocksDB instance(s) per slot can also be bounded, please refer to documentation [here](large_state_tuning.html#bounding-rocksdb-memory-usage) for details.
 
 Review comment:
   Let me prepare a supplement commit for the Chinese translation adjustment accordingly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#issuecomment-580956525
 
 
   <!--
   Meta data
   Hash:0b6edc798a49c9f21887e7dfe2cd81f879153c07 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4744 TriggerType:PUSH TriggerID:0b6edc798a49c9f21887e7dfe2cd81f879153c07
   Hash:0b6edc798a49c9f21887e7dfe2cd81f879153c07 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/147004602 TriggerType:PUSH TriggerID:0b6edc798a49c9f21887e7dfe2cd81f879153c07
   -->
   ## CI report:
   
   * 0b6edc798a49c9f21887e7dfe2cd81f879153c07 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/147004602) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4744) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373879861
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.zh.md
 ##########
 @@ -210,6 +211,67 @@ and not from the JVM. Any memory you assign to RocksDB will have to be accounted
 of the TaskManagers by the same amount. Not doing that may result in YARN/Mesos/etc terminating the JVM processes for
 allocating more memory than configured.
 
+### Bounding RocksDB Memory Usage
 
 Review comment:
   Will do that during merging.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373755840
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.md
 ##########
 @@ -118,98 +118,70 @@ Other state like keyed state is still snapshotted asynchronously. Please note th
 The state storage workhorse of many large scale Flink streaming applications is the *RocksDB State Backend*.
 The backend scales well beyond main memory and reliably stores large [keyed state](../../dev/stream/state/state.html).
 
-Unfortunately, RocksDB's performance can vary with configuration, and there is little documentation on how to tune
-RocksDB properly. For example, the default configuration is tailored towards SSDs and performs suboptimal
-on spinning disks.
+RocksDB's performance can vary with configuration, this section outlines some best-practices for tuning jobs that use the RocksDB State Backend.
 
-**Incremental Checkpoints**
+### Incremental Checkpoints
 
-Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, at the cost of a (potentially) longer
-recovery time. The core idea is that incremental checkpoints only record all changes to the previous completed checkpoint, instead of
-producing a full, self-contained backup of the state backend. Like this, incremental checkpoints build upon previous checkpoints. Flink leverages
-RocksDB's internal backup mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink
-does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+When it comes to reducing the time that checkpoints take, activating incremental checkpoints should be one of the first considerations.
+Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, because incremental checkpoints only record the changes compared to the previous completed checkpoint, instead of producing a full, self-contained backup of the state backend.
 
-While we strongly encourage the use of incremental checkpoints for large state, please note that this is a new feature and currently not enabled 
-by default. To enable this feature, users can instantiate a `RocksDBStateBackend` with the corresponding boolean flag in the constructor set to `true`, e.g.:
+See [Incremental Checkpoints in RocksDB]({{ site.baseurl }}/ops/state/state_backends.html#incremental-checkpoints) for more background information.
 
-{% highlight java %}
-    RocksDBStateBackend backend =
-        new RocksDBStateBackend(filebackend, true);
-{% endhighlight %}
-
-**RocksDB Timers**
+### Timers in RocksDB or on JVM Heap
 
-For RocksDB, a user can chose whether timers are stored on the heap or inside RocksDB (default). Heap-based timers can have a better performance for smaller numbers of
-timers, while storing timers inside RocksDB offers higher scalability as the number of timers in RocksDB can exceed the available main memory (spilling to disk).
+Timers are stored in RocksDB by default, which is the more robust and scalable choice.
 
-When using RockDB as state backend, the type of timer storage can be selected through Flink's configuration via option key `state.backend.rocksdb.timer-service.factory`.
-Possible choices are `heap` (to store timers on the heap, default) and `rocksdb` (to store timers in RocksDB).
+When performance-tuning jobs that have few timers only (no windows, not using timers in ProcessFunction), putting those timers on the heap can increase performance.
+Use this feature carefully, as heap-based timers may increase checkpointing times and naturally cannot scale beyond memory.
 
-<span class="label label-info">Note</span> *The combination RocksDB state backend with heap-based timers currently does NOT support asynchronous snapshots for the timers state.
-Other state like keyed state is still snapshotted asynchronously. Please note that this is not a regression from previous versions and will be resolved with `FLINK-10026`.*
+See [this section]({{ site.baseurl }}/ops/state/state_backends.html#timers-heap-vs-rocksdb) for details how to configure heap-based timers.
 
 Review comment:
   for details **_of_** how to configure

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373759898
 
 

 ##########
 File path: docs/ops/state/state_backends.md
 ##########
 @@ -186,8 +188,145 @@ state.backend: filesystem
 state.checkpoints.dir: hdfs://namenode:40010/flink/checkpoints
 {% endhighlight %}
 
-#### RocksDB State Backend Config Options
 
-{% include generated/rocks_db_configuration.html %}
+# RocksDB State Backend Details
+
+*This section describes the RocksDB state backend in more detail.*
+
+### Incremental Checkpoints
+
+RocksDB supports *Incremental Checkpoints*, which can dramatically reduce the checkpointing time in comparison to full checkpoints.
+Instead of producing a full, self-contained backup of the state backend, incremental checkpoints only record the changes that happened since the latest completed checkpoint.
+
+An incremental checkpoint builds upon (typically multiple) previous checkpoints. Flink leverages RocksDB's internal compaction mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+
+Recovery time of incremental checkpoints may be longer or shorter comapared to full checkpoints. If your network bandwidth is the bottleneck, it may take a bit longer to restore from an incremental checkpoint, because it implies fetching more data (more deltas). Restoring from an incremental checkpoint is faster, if the bottleneck is your CPU or IOPs, because restoring from an incremental checkpoint means not re-building the local RocksDB tables from Flink's canonical key/value snapshot format(used in savepoints and full checkpoints).
+
+While we encourage the use of incremental checkpoints for large state, you need to enable this feature manually:
+  - Setting a default in your `flink-conf.yaml`: `state.backend.incremental: true`
+  - Configuring this in code (overrides the config default): `RocksDBStateBackend backend = new RocksDBStateBackend(filebackend, true);`
+
+### Memory Management
+
+Flink aims to control the total process memory consumption to make sure that the Flink TaskManagers have a well-behaved memory footprint. That means staying within the limits enforced by the environment (Docker/Kubernetes, Yarn, etc) to not get killed for consuming too much memory, but also to not under-utilize memory (unnecessary spilling to disk, wasted caching opportunities, reduced performance).
+
+To achieve that, Flink by default configures RocksDB's memory allocation to the amount of managed memory of the TaskManager (or, more precisely, task slot). This should give good out-of-the-box experience for most applications, meaning most applications should not need to tune any of the detailed RocksDB settings. The primary mechanism for improving memory-related performance issues would be to simply increase Flink's managed memory.
+
+Uses can choose to deactivate that feature and let RocksDB allocate memory independently per ColumnFamily (one per state per operator). This offers expert users ultimately more fine grained control over RocksDB, but means that users need to take care themselves that the overall memory consumption does not exceed the limits of the environment. See [large state tuning]({{ site.baseurl }}/ops/state/large_state_tuning.html#tuning-rocksdb-memory) for some guideline about large state performance tuning.
 
 Review comment:
   Uses -> Users, at the very beginning.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] Myasuka commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
Myasuka commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373792780
 
 

 ##########
 File path: docs/ops/state/state_backends.md
 ##########
 @@ -188,8 +188,145 @@ state.backend: filesystem
 state.checkpoints.dir: hdfs://namenode:40010/flink/checkpoints
 {% endhighlight %}
 
-#### RocksDB State Backend Config Options
 
-{% include generated/rocks_db_configuration.html %}
+# RocksDB State Backend Details
+
+*This section describes the RocksDB state backend in more detail.*
+
+### Incremental Checkpoints
+
+RocksDB supports *Incremental Checkpoints*, which can dramatically reduce the checkpointing time in comparison to full checkpoints.
+Instead of producing a full, self-contained backup of the state backend, incremental checkpoints only record the changes that happened since the latest completed checkpoint.
+
+An incremental checkpoint builds upon (typically multiple) previous checkpoints. Flink leverages RocksDB's internal compaction mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+
+Recovery time of incremental checkpoints may be longer or shorter comapared to full checkpoints. If your network bandwidth is the bottleneck, it may take a bit longer to restore from an incremental checkpoint, because it implies fetching more data (more deltas). Restoring from an incremental checkpoint is faster, if the bottleneck is your CPU or IOPs, because restoring from an incremental checkpoint means not re-building the local RocksDB tables from Flink's canonical key/value snapshot format(used in savepoints and full checkpoints).
+
+While we encourage the use of incremental checkpoints for large state, you need to enable this feature manually:
+  - Setting a default in your `flink-conf.yaml`: `state.backend.incremental: true`
+  - Configuring this in code (overrides the config default): `RocksDBStateBackend backend = new RocksDBStateBackend(filebackend, true);`
 
 Review comment:
   As `RocksDBStateBackend(AbstractStateBackend, boolean)` has been tagged as deprecated, how about using `RocksDBStateBackend(StateBackend, TernaryBoolean)` to explain?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373756867
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.md
 ##########
 @@ -118,98 +118,70 @@ Other state like keyed state is still snapshotted asynchronously. Please note th
 The state storage workhorse of many large scale Flink streaming applications is the *RocksDB State Backend*.
 The backend scales well beyond main memory and reliably stores large [keyed state](../../dev/stream/state/state.html).
 
-Unfortunately, RocksDB's performance can vary with configuration, and there is little documentation on how to tune
-RocksDB properly. For example, the default configuration is tailored towards SSDs and performs suboptimal
-on spinning disks.
+RocksDB's performance can vary with configuration, this section outlines some best-practices for tuning jobs that use the RocksDB State Backend.
 
-**Incremental Checkpoints**
+### Incremental Checkpoints
 
-Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, at the cost of a (potentially) longer
-recovery time. The core idea is that incremental checkpoints only record all changes to the previous completed checkpoint, instead of
-producing a full, self-contained backup of the state backend. Like this, incremental checkpoints build upon previous checkpoints. Flink leverages
-RocksDB's internal backup mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink
-does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+When it comes to reducing the time that checkpoints take, activating incremental checkpoints should be one of the first considerations.
+Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, because incremental checkpoints only record the changes compared to the previous completed checkpoint, instead of producing a full, self-contained backup of the state backend.
 
-While we strongly encourage the use of incremental checkpoints for large state, please note that this is a new feature and currently not enabled 
-by default. To enable this feature, users can instantiate a `RocksDBStateBackend` with the corresponding boolean flag in the constructor set to `true`, e.g.:
+See [Incremental Checkpoints in RocksDB]({{ site.baseurl }}/ops/state/state_backends.html#incremental-checkpoints) for more background information.
 
-{% highlight java %}
-    RocksDBStateBackend backend =
-        new RocksDBStateBackend(filebackend, true);
-{% endhighlight %}
-
-**RocksDB Timers**
+### Timers in RocksDB or on JVM Heap
 
-For RocksDB, a user can chose whether timers are stored on the heap or inside RocksDB (default). Heap-based timers can have a better performance for smaller numbers of
-timers, while storing timers inside RocksDB offers higher scalability as the number of timers in RocksDB can exceed the available main memory (spilling to disk).
+Timers are stored in RocksDB by default, which is the more robust and scalable choice.
 
-When using RockDB as state backend, the type of timer storage can be selected through Flink's configuration via option key `state.backend.rocksdb.timer-service.factory`.
-Possible choices are `heap` (to store timers on the heap, default) and `rocksdb` (to store timers in RocksDB).
+When performance-tuning jobs that have few timers only (no windows, not using timers in ProcessFunction), putting those timers on the heap can increase performance.
+Use this feature carefully, as heap-based timers may increase checkpointing times and naturally cannot scale beyond memory.
 
-<span class="label label-info">Note</span> *The combination RocksDB state backend with heap-based timers currently does NOT support asynchronous snapshots for the timers state.
-Other state like keyed state is still snapshotted asynchronously. Please note that this is not a regression from previous versions and will be resolved with `FLINK-10026`.*
+See [this section]({{ site.baseurl }}/ops/state/state_backends.html#timers-heap-vs-rocksdb) for details how to configure heap-based timers.
 
-**Predefined Options**
+### Tuning RocksDB Memory
 
-Flink provides some predefined collections of option for RocksDB for different settings, and there existed two ways
-to pass these predefined options to RocksDB:
-  - Configure the predefined options through `flink-conf.yaml` via option key `state.backend.rocksdb.predefined-options`.
-    The default value of this option is `DEFAULT` which means `PredefinedOptions.DEFAULT`.
-  - Set the predefined options programmatically, e.g. `RocksDBStateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED_HIGH_MEM)`.
+The performance of the RocksDB State Backend much depends on the amount of memory that it has available. To increase performance, adding memory can help a lot, or adjusting to which functions memory goes.
 
-We expect to accumulate more such profiles over time. Feel free to contribute such predefined option profiles when you
-found a set of options that work well and seem representative for certain workloads.
+By default, the RocksDB State Backend uses Flink's managed memory budget for RocksDBs buffers and caches (`state.backend.rocksdb.memory.managed: true`). Please refer to the [RocksDB Memory Management]({{ site.baseurl }}/ops/state/state_backends.html#memory-management) for background on how that mechanism works.
 
-<span class="label label-info">Note</span> Predefined options which set programmatically would override the one configured via `flink-conf.yaml`.
+To tune memory-related performance issues, the following steps may be helpful:
 
-**Passing Options Factory to RocksDB**
-
-There existed two ways to pass options factory to RocksDB in Flink:
+  - The first step to try and increase performance should be to increase the amout of managed memory. This usually improves the situation a lot, without opening up the complexity of tuning low-level RocksDB options.
+    
+    Especially with large container/process sizes, much of the total memory can typically go to RocksDB, unless the application logic requires a lot of JVM heap itself. The default managed memory fraction *(0.4)* is conservative and can often be increased when using TaskManagers with multi-GB process sizes.
 
-  - Configure options factory through `flink-conf.yaml`. You could set the options factory class name via option key `state.backend.rocksdb.options-factory`.
-    The default value for this option is `org.apache.flink.contrib.streaming.state.DefaultConfigurableOptionsFactory`, and all candidate configurable options are defined in `RocksDBConfigurableOptions`.
-    Moreover, you could also define your customized and configurable options factory class like below and pass the class name to `state.backend.rocksdb.options-factory`.
+  - The number of write buffers in RocksDB depends on the number of states you have in your application (states across all operators in the pipeline). Each state corresponds to one ColumnFamily, which needs its own write buffers. Hence, applications with many states typically need more memory for the same performance.
 
-    {% highlight java %}
+  - You can try and compare the performance of RocksDB with managed memory to RocksDB with per-column-family memory by setting `state.backend.rocksdb.memory.managed: false`. Especially to test against a baseline (assuming no- or gracious container memory limits) or to test for regressions compared to earlier versions of Flink, this can be useful.
+  
+    Compared to the managed memory setup (constant memory pool), not using managed memory means that RocksDB allocates memory proportional to the number of states in the application (memory footprint changes with application changes). As a rule of thumb, the non-managed mode uses (unless ColumnFamily options are applied) roughly "140MB * num-states-across-all-tasks * num-slots". Timers count as state as well!
 
 Review comment:
   Does `140MB` comes from the default `max_write_buffer_number * write_buffer_size + arena_block_size`, say `2 * 64MB + 8MB = 136MB`? If so, this is roughly the **_upper bound_** of the memory usage, to be explicit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373774066
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.md
 ##########
 @@ -118,98 +118,70 @@ Other state like keyed state is still snapshotted asynchronously. Please note th
 The state storage workhorse of many large scale Flink streaming applications is the *RocksDB State Backend*.
 The backend scales well beyond main memory and reliably stores large [keyed state](../../dev/stream/state/state.html).
 
-Unfortunately, RocksDB's performance can vary with configuration, and there is little documentation on how to tune
-RocksDB properly. For example, the default configuration is tailored towards SSDs and performs suboptimal
-on spinning disks.
+RocksDB's performance can vary with configuration, this section outlines some best-practices for tuning jobs that use the RocksDB State Backend.
 
-**Incremental Checkpoints**
+### Incremental Checkpoints
 
-Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, at the cost of a (potentially) longer
-recovery time. The core idea is that incremental checkpoints only record all changes to the previous completed checkpoint, instead of
-producing a full, self-contained backup of the state backend. Like this, incremental checkpoints build upon previous checkpoints. Flink leverages
-RocksDB's internal backup mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink
-does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+When it comes to reducing the time that checkpoints take, activating incremental checkpoints should be one of the first considerations.
+Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, because incremental checkpoints only record the changes compared to the previous completed checkpoint, instead of producing a full, self-contained backup of the state backend.
 
-While we strongly encourage the use of incremental checkpoints for large state, please note that this is a new feature and currently not enabled 
-by default. To enable this feature, users can instantiate a `RocksDBStateBackend` with the corresponding boolean flag in the constructor set to `true`, e.g.:
+See [Incremental Checkpoints in RocksDB]({{ site.baseurl }}/ops/state/state_backends.html#incremental-checkpoints) for more background information.
 
-{% highlight java %}
-    RocksDBStateBackend backend =
-        new RocksDBStateBackend(filebackend, true);
-{% endhighlight %}
-
-**RocksDB Timers**
+### Timers in RocksDB or on JVM Heap
 
-For RocksDB, a user can chose whether timers are stored on the heap or inside RocksDB (default). Heap-based timers can have a better performance for smaller numbers of
-timers, while storing timers inside RocksDB offers higher scalability as the number of timers in RocksDB can exceed the available main memory (spilling to disk).
+Timers are stored in RocksDB by default, which is the more robust and scalable choice.
 
-When using RockDB as state backend, the type of timer storage can be selected through Flink's configuration via option key `state.backend.rocksdb.timer-service.factory`.
-Possible choices are `heap` (to store timers on the heap, default) and `rocksdb` (to store timers in RocksDB).
+When performance-tuning jobs that have few timers only (no windows, not using timers in ProcessFunction), putting those timers on the heap can increase performance.
+Use this feature carefully, as heap-based timers may increase checkpointing times and naturally cannot scale beyond memory.
 
-<span class="label label-info">Note</span> *The combination RocksDB state backend with heap-based timers currently does NOT support asynchronous snapshots for the timers state.
-Other state like keyed state is still snapshotted asynchronously. Please note that this is not a regression from previous versions and will be resolved with `FLINK-10026`.*
+See [this section]({{ site.baseurl }}/ops/state/state_backends.html#timers-heap-vs-rocksdb) for details how to configure heap-based timers.
 
 Review comment:
   for details **on** how to configure

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#issuecomment-580956525
 
 
   <!--
   Meta data
   Hash:0b6edc798a49c9f21887e7dfe2cd81f879153c07 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4744 TriggerType:PUSH TriggerID:0b6edc798a49c9f21887e7dfe2cd81f879153c07
   Hash:0b6edc798a49c9f21887e7dfe2cd81f879153c07 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/147004602 TriggerType:PUSH TriggerID:0b6edc798a49c9f21887e7dfe2cd81f879153c07
   -->
   ## CI report:
   
   * 0b6edc798a49c9f21887e7dfe2cd81f879153c07 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/147004602) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4744) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#issuecomment-580945795
 
 
   @carp84 @Myasuka FYI

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r374066513
 
 

 ##########
 File path: docs/ops/state/state_backends.md
 ##########
 @@ -188,8 +188,145 @@ state.backend: filesystem
 state.checkpoints.dir: hdfs://namenode:40010/flink/checkpoints
 {% endhighlight %}
 
-#### RocksDB State Backend Config Options
 
-{% include generated/rocks_db_configuration.html %}
+# RocksDB State Backend Details
+
+*This section describes the RocksDB state backend in more detail.*
+
+### Incremental Checkpoints
+
+RocksDB supports *Incremental Checkpoints*, which can dramatically reduce the checkpointing time in comparison to full checkpoints.
+Instead of producing a full, self-contained backup of the state backend, incremental checkpoints only record the changes that happened since the latest completed checkpoint.
+
+An incremental checkpoint builds upon (typically multiple) previous checkpoints. Flink leverages RocksDB's internal compaction mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+
+Recovery time of incremental checkpoints may be longer or shorter comapared to full checkpoints. If your network bandwidth is the bottleneck, it may take a bit longer to restore from an incremental checkpoint, because it implies fetching more data (more deltas). Restoring from an incremental checkpoint is faster, if the bottleneck is your CPU or IOPs, because restoring from an incremental checkpoint means not re-building the local RocksDB tables from Flink's canonical key/value snapshot format(used in savepoints and full checkpoints).
+
+While we encourage the use of incremental checkpoints for large state, you need to enable this feature manually:
+  - Setting a default in your `flink-conf.yaml`: `state.backend.incremental: true`
+  - Configuring this in code (overrides the config default): `RocksDBStateBackend backend = new RocksDBStateBackend(filebackend, true);`
 
 Review comment:
   May be simpler to refer to `RocksDBStateBackend(checkpointDataUri, enableIncrementalCheckpointing)`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#issuecomment-580956525
 
 
   <!--
   Meta data
   Hash:0b6edc798a49c9f21887e7dfe2cd81f879153c07 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:0b6edc798a49c9f21887e7dfe2cd81f879153c07
   -->
   ## CI report:
   
   * 0b6edc798a49c9f21887e7dfe2cd81f879153c07 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] Myasuka commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
Myasuka commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373792878
 
 

 ##########
 File path: docs/ops/state/state_backends.md
 ##########
 @@ -188,8 +188,145 @@ state.backend: filesystem
 state.checkpoints.dir: hdfs://namenode:40010/flink/checkpoints
 {% endhighlight %}
 
-#### RocksDB State Backend Config Options
 
-{% include generated/rocks_db_configuration.html %}
+# RocksDB State Backend Details
+
+*This section describes the RocksDB state backend in more detail.*
+
+### Incremental Checkpoints
+
+RocksDB supports *Incremental Checkpoints*, which can dramatically reduce the checkpointing time in comparison to full checkpoints.
+Instead of producing a full, self-contained backup of the state backend, incremental checkpoints only record the changes that happened since the latest completed checkpoint.
+
+An incremental checkpoint builds upon (typically multiple) previous checkpoints. Flink leverages RocksDB's internal compaction mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically.
+
+Recovery time of incremental checkpoints may be longer or shorter comapared to full checkpoints. If your network bandwidth is the bottleneck, it may take a bit longer to restore from an incremental checkpoint, because it implies fetching more data (more deltas). Restoring from an incremental checkpoint is faster, if the bottleneck is your CPU or IOPs, because restoring from an incremental checkpoint means not re-building the local RocksDB tables from Flink's canonical key/value snapshot format(used in savepoints and full checkpoints).
+
+While we encourage the use of incremental checkpoints for large state, you need to enable this feature manually:
+  - Setting a default in your `flink-conf.yaml`: `state.backend.incremental: true`
 
 Review comment:
   I think we should describe these two options as a `either-or` not a `and` relation

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on a change in pull request #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373879845
 
 

 ##########
 File path: docs/ops/state/large_state_tuning.zh.md
 ##########
 @@ -210,6 +211,67 @@ and not from the JVM. Any memory you assign to RocksDB will have to be accounted
 of the TaskManagers by the same amount. Not doing that may result in YARN/Mesos/etc terminating the JVM processes for
 allocating more memory than configured.
 
+### Bounding RocksDB Memory Usage
 
 Review comment:
   Yes, I forgot to copy the English version to the Chinese page to keep them in sync.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend

Posted by GitBox <gi...@apache.org>.
carp84 commented on issue #10987: [FLINK-14495][docs] (EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#issuecomment-581730962
 
 
   Close since this is already merged via:
   master: 9afcf31fdc
   release-1.10: a9040e79e7

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services