You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by bb...@apache.org on 2019/05/30 11:28:55 UTC

[kafka] branch trunk updated: MINOR: Extend RocksDB section of Memory Management Docs (#6793)

This is an automated email from the ASF dual-hosted git repository.

bbejeck pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/kafka.git


The following commit(s) were added to refs/heads/trunk by this push:
     new 932a1b7  MINOR: Extend RocksDB section of Memory Management Docs (#6793)
932a1b7 is described below

commit 932a1b7d7e2b7ea8c145552c3d050a0999ce13dc
Author: A. Sophie Blee-Goldman <ab...@gmail.com>
AuthorDate: Thu May 30 04:28:43 2019 -0700

    MINOR: Extend RocksDB section of Memory Management Docs (#6793)
    
    Now that we can configure RocksDB to bound the total memory we should include docs describing how, as well as touching on some possible options that should be considered when taking advantage of this feature.
    
    Reviewers: Guozhang Wang <wa...@gmail.com>, Jim Galasyn <ji...@confluent.io>, Bill Bejeck <bb...@gmail.com>
---
 docs/streams/developer-guide/memory-mgmt.html | 61 ++++++++++++++++++++++++---
 1 file changed, 56 insertions(+), 5 deletions(-)

diff --git a/docs/streams/developer-guide/memory-mgmt.html b/docs/streams/developer-guide/memory-mgmt.html
index f21ed34..68c379b 100644
--- a/docs/streams/developer-guide/memory-mgmt.html
+++ b/docs/streams/developer-guide/memory-mgmt.html
@@ -167,7 +167,61 @@
 </pre></div>
       </div>
     </div>
-    <div class="section" id="other-memory-usage">
+    <div class="section" id="rocksdb">
+      <h2><a class="toc-backref" href="#id3">RocksDB</a><a class="headerlink" href="#rocksdb" title="Permalink to this headline"></a></h2>
+      <p> Each instance of RocksDB allocates off-heap memory for a block cache (with data), index and filter blocks, and memtable (write buffer). Critical configs (for RocksDB version 4.1.0) include
+        <code class="docutils literal"><span class="pre">block_cache_size</span></code>, <code class="docutils literal"><span class="pre">write_buffer_size</span></code> and <code class="docutils literal"><span class="pre">max_write_buffer_number</span></code>.  These can be specified through the
+        <code class="docutils literal"><span class="pre">rocksdb.config.setter</span></code> configuration.</li>
+      <p> As of 2.3.0 the memory usage across all instances can be bounded, limiting the total off-heap memory of your Streams app. To do so you must configure RocksDB to cache the index and filter blocks in the block cache, limit the memtable memory through a shared <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager">WriteBufferManager</a> and count its memory against the block cache, and then pass the same Cache object to each instance. Se [...]
+
+      <div class="highlight-java"><div class="highlight"><pre><span></span>    <span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">BoundedMemoryRocksDBConfig</span> <span class="kd">implements</span> <span class="n">RocksDBConfigSetter</span> <span class="o">{</span>
+
+       <span class="kd">private</span> <span class="kt">static</span> <span class="n">org.rocksdb.Cache</span> <span class="n">cache</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">LRUCache</span><span class="o">(</span><span class="mi">TOTAL_OFF_HEAP_MEMORY</span><span class="o">,</span> <span class="n">-1</span><span class="o">,</span> <span class="n">fal [...]
+       <span class="kd">private</span> <span class="kt">static</span> <span class="n">org.rocksdb.WriteBufferManager</span> <span class="n">writeBufferManager</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">WriteBufferManager</span><span class="o">(</span><span class="mi">TOTAL_MEMTABLE_MEMORY</span><span class="o">,</span> cache<span class="o">);</span>
+       <span class="kd">private</span> <span class="n">org.rocksdb.Filter</span> <span class="n">filter</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">BloomFilter</span><span class="o">();</span>
+
+       <span class="nd">@Override</span>
+       <span class="kd">public</span> <span class="kt">void</span> <span class="nf">setConfig</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span [...]
+
+         <span class="n">BlockBasedTableConfig</span> <span class="n">tableConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">BlockBasedTableConfig</span><span class="o">();</span>
+
+         <span class="c1"> // These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)</span>
+         <span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockCache</span><span class="o">(</span><span class="mi">cache</span><span class="o">);</span>
+         <span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocks</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
+         <span class="n">options</span><span class="o">.</span><span class="na">setWriteBufferManager</span><span class="o">(</span><span class="mi">writeBufferManager</span><span class="o">);</span>
+
+         <span class="c1"> // These options are recommended to be set when bounding the total memory</span>
+         <span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocksWithHighPriority</span><span class="o">(</span><span class="mi">true</span><span class="o">);</span>
+         <span class="n">tableConfig</span><span class="o">.</span><span class="na">setPinTopLevelIndexAndFilter</span><span class="o">(</span><span class="mi">true</span><span class="o">);</span>
+         <span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockSize</span><span class="o">(</span><span class="mi">BLOCK_SIZE</span><span class="o">);</span><sup><a href="#fn3" id="ref3">3</a></sup>
+         <span class="n">options</span><span class="o">.</span><span class="na">setMaxWriteBufferNumber</span><span class="o">(</span><span class="mi">N_MEMTABLES</span><span class="o">);</span><sup><a href="#fn4" id="ref4">4</a></sup>
+         <span class="n">options</span><span class="o">.</span><span class="na">setWriteBufferSize</span><span class="o">(</span><span class="mi">MEMTABLE_SIZE</span><span class="o">);</span>
+
+         <span class="n">options</span><span class="o">.</span><span class="na">setTableFormatConfig</span><span class="o">(</span><span class="n">tableConfig</span><span class="o">);</span>
+       <span class="o">}</span>
+
+       <span class="nd">@Override</span>
+       <span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">)</span> <span class="o">{</span>
+         <span class="c1">// Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.</span>
+         <span class="c1">// The filter, however, is not shared and should be closed to avoid leaking memory.</span>
+         <span class="n">filter</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
+       <span class="o">}</span>
+    <span class="o">}</span>
+      </div>
+        <sup id="fn1">1. INDEX_FILTER_BLOCK_RATIO can be used to set a fraction of the block cache to set aside for "high priority" (aka index and filter) blocks, preventing them from being evicted by data blocks. See the full signature of the LRUCache constructor <a class="reference external" href="https://github.com/facebook/rocksdb/blob/master/java/src/main/java/org/rocksdb/LRUCache.java#L72">here</a>. </sup>
+        <br>
+        <sup id="fn2">2. This must be set in order for INDEX_FILTER_BLOCK_RATIO to take effect (see footnote 1) as described <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks">here</a></sup>
+        <br>
+        <sup id="fn3">3. You may want to modify the default <a class="reference external" href="https://github.com/apache/kafka/blob/2.3/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L79">block size</a> per these instructions from the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks">RocksDB GitHub</a>. A larger block size means index blocks will be smaller, but the cached dat [...]
+          <br>
+          <dl class="docutils">
+            <dt>Note:</dt>
+            While we recommend setting at least the above configs, the specific options that yield the best performance are workload dependent and you should consider experimenting with these to determine the best choices for your specific use case. Keep in mind that the optimal configs for one app may not apply to one with a different topology or input topic.
+            In addition to the recommended configs above, you may want to consider using partitioned index filters as described by the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters">RocksDB docs</a>
+
+          </dl>
+      </div>
+      <div class="section" id="other-memory-usage">
       <h2><a class="toc-backref" href="#id3">Other memory usage</a><a class="headerlink" href="#other-memory-usage" title="Permalink to this headline"></a></h2>
       <p>There are other modules inside Apache Kafka that allocate memory during runtime. They include the following:</p>
       <ul class="simple">
@@ -179,9 +233,6 @@
         <li>Deserialized objects buffering: after <code class="docutils literal"><span class="pre">consumer.poll()</span></code> returns records, they will be deserialized to extract
           timestamp and buffered in the streams space. Currently this is only indirectly controlled by
           <code class="docutils literal"><span class="pre">buffered.records.per.partition</span></code>.</li>
-        <li>RocksDB&#8217;s own memory usage, both on-heap and off-heap; critical configs (for RocksDB version 4.1.0) include
-          <code class="docutils literal"><span class="pre">block_cache_size</span></code>, <code class="docutils literal"><span class="pre">write_buffer_size</span></code> and <code class="docutils literal"><span class="pre">max_write_buffer_number</span></code>.  These can be specified through the
-          <code class="docutils literal"><span class="pre">rocksdb.config.setter</span></code> configuration.</li>
       </ul>
       <div class="admonition tip">
         <p><b>Tip</b></p>
@@ -237,4 +288,4 @@
         // Display docs subnav items
         $('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
     });
-</script>
\ No newline at end of file
+</script>