You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hbase.apache.org by st...@apache.org on 2014/07/19 01:58:50 UTC

[1/2] git commit: More suggestion folks use offheap block cache

Repository: hbase
Updated Branches:
  refs/heads/master 5f4e85d3f -> 60d3e3c90


More suggestion folks use offheap block cache


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/60d3e3c9
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/60d3e3c9
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/60d3e3c9

Branch: refs/heads/master
Commit: 60d3e3c905cc752d83a5a622bdb505400fb431b0
Parents: 19979d7
Author: stack <st...@apache.org>
Authored: Fri Jul 18 16:56:33 2014 -0700
Committer: stack <st...@apache.org>
Committed: Fri Jul 18 16:56:48 2014 -0700

----------------------------------------------------------------------
 src/main/docbkx/performance.xml | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/60d3e3c9/src/main/docbkx/performance.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index 04ca00c..c00b635 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@@ -183,6 +183,8 @@
           save a bit of YGC churn and allocate in the old gen directly. </para>
         <para>For more information about GC logs, see <xref
             linkend="trouble.log.gc" />. </para>
+    <para>Consider also enabling the offheap Block Cache.  This has been shown to mitigate
+        GC pause times.  See <xref linkend="block.cache" /></para>
       </section>
     </section>
   </section>
@@ -723,6 +725,8 @@ htable.close();
           <methodname>setCacheBlocks</methodname> method. For input Scans to MapReduce jobs, this
         should be <varname>false</varname>. For frequently accessed rows, it is advisable to use the
         block cache.</para>
+
+    <para>Cache more data by moving your Block Cache offheap.  See <xref linkend="offheap.blockcache" /></para>
     </section>
     <section
       xml:id="perf.hbase.client.rowkeyonly">

[2/2] git commit: Add doc on direct memory, the block cache UI additions, list block cache options, downplay slab cache even more

Posted by st...@apache.org.

Add doc on direct memory, the block cache UI additions, list block cache options, downplay slab cache even more


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/19979d77
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/19979d77
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/19979d77

Branch: refs/heads/master
Commit: 19979d770d72f5c69ccaca46190131f0dfbb1506
Parents: 5f4e85d
Author: stack <st...@apache.org>
Authored: Fri Jul 18 16:47:31 2014 -0700
Committer: stack <st...@apache.org>
Committed: Fri Jul 18 16:56:48 2014 -0700

----------------------------------------------------------------------
 src/main/docbkx/book.xml | 127 ++++++++++++++++++++++++++----------------
 1 file changed, 80 insertions(+), 47 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/19979d77/src/main/docbkx/book.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index 4c06dc6..7f8f0a1 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml
@@ -1948,42 +1948,43 @@ rs.close();
           LruBlockCache, BucketCache, and SlabCache, which are both (usually) offheap. This section
           discusses benefits and drawbacks of each implementation, how to choose the appropriate
           option, and configuration options for each.</para>
+
+      <note><title>Block Cache Reporting: UI</title>
+      <para>See the RegionServer UI for detail on caching deploy.  Since HBase 1.0, the
+          Block Cache detail has been significantly extended showing configurations,
+          sizings, current usage, and even detail on block counts and types.</para>
+  </note>
+
         <section>
+
           <title>Cache Choices</title>
           <para><classname>LruBlockCache</classname> is the original implementation, and is
-              entirely within the Java heap.  <classname>SlabCache</classname> and
-              <classname>BucketCache</classname> are mainly intended for keeping blockcache
-              data offheap, although BucketCache can also keep data onheap and in files.</para>
-          <para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
-          <para>BucketCache has seen more production deploys and has more deploy options. Fetching
-            will always be slower when fetching from BucketCache or SlabCache, as compared with the
-            native onheap LruBlockCache. However, latencies tend to be less erratic over time,
-            because there is less garbage collection.</para>
-          <para>Anecdotal evidence indicates that BucketCache requires less garbage collection than
-            SlabCache so should be even less erratic (than SlabCache or LruBlockCache).</para>
-          <para>SlabCache tends to do more garbage collections, because blocks are always moved
-              between L1 and L2, at least given the way <classname>DoubleBlockCache</classname>
-              currently works. When you enable SlabCache, you are enabling a two tier caching
-              system, an L1 cache which is implemented by an instance of LruBlockCache and
-              an offheap L2 cache which is implemented by SlabCache.  Management of these
-              two tiers and how blocks move between them is done by <classname>DoubleBlockCache</classname>
-              when you are using SlabCache. DoubleBlockCache works by caching all blocks in L1
-              AND L2.  When blocks are evicted from L1, they are moved to L2.  See
-              <xref linkend="offheap.blockcache.slabcache" /> for more detail on how DoubleBlockCache works.
+              entirely within the Java heap. <classname>BucketCache</classname> is mainly
+              intended for keeping blockcache data offheap, although BucketCache can also
+              keep data onheap and serve from a file-backed cache. There is also an older
+              offheap BlockCache, called SlabCache that has since been deprecated and
+              removed in HBase 1.0.
           </para>
-          <para>The hosting class for BucketCache is <classname>CombinedBlockCache</classname>.
-              It keeps all DATA blocks in the BucketCache and meta blocks -- INDEX and BLOOM blocks --
+
+          <para>Fetching will always be slower when fetching from BucketCache,
+              as compared with the native onheap LruBlockCache. However, latencies tend to be
+              less erratic across time, because there is less garbage collection. This is why
+              you'd use BucketCache, so your latencies are less erratic and to mitigate GCs
+              and heap fragmentation.  See Nick Dimiduk's <link
+              xlink:href="http://www.n10k.com/blog/blockcache-101/">BlockCache 101</link> for
+              comparisons running onheap vs offheap tests.
+              </para>
+
+              <para>When you enable BucketCache, you are enabling a two tier caching
+              system, an L1 cache which is implemented by an instance of LruBlockCache and
+              an offheap L2 cache which is implemented by BucketCache.  Management of these
+              two tiers and the policy that dictates how blocks move between them is done by
+              <classname>CombinedBlockCache</classname>. It keeps all DATA blocks in the L2
+              BucketCache and meta blocks -- INDEX and BLOOM blocks --
               onheap in the L1 <classname>LruBlockCache</classname>.
-          </para>
-          <para>Because the hosting class for each implementation
-              (<classname>DoubleBlockCache</classname> vs <classname>CombinedBlockCache</classname>)
-              works so differently, it is difficult to do a fair comparison between BucketCache and SlabCache.
-            See Nick Dimiduk's <link
-              xlink:href="http://www.n10k.com/blog/blockcache-101/">BlockCache 101</link> for some
-          numbers.</para>
-          <para>For more information about the off heap cache options, see <xref
-              linkend="offheap.blockcache" />.</para>
+              See <xref linkend="offheap.blockcache" /> for more detail on going offheap.</para>
         </section>
+
         <section xml:id="cache.configurations">
             <title>General Cache Configurations</title>
             <para>Apart from the cache implementaiton itself, you can set some general
@@ -1993,6 +1994,7 @@ rs.close();
               After setting any of these options, restart or rolling restart your cluster for the
               configuration to take effect. Check logs for errors or unexpected behavior.</para>
       </section>
+
         <section
           xml:id="block.cache.design">
           <title>LruBlockCache Design</title>
@@ -2136,7 +2138,7 @@ rs.close();
           xml:id="offheap.blockcache">
           <title>Offheap Block Cache</title>
           <section xml:id="offheap.blockcache.slabcache">
-            <title>Enable SlabCache</title>
+            <title>How to Enable SlabCache</title>
             <para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
             <para> SlabCache is originally described in <link
                 xlink:href="http://blog.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/">Caching
@@ -2160,29 +2162,39 @@ rs.close();
               Check logs for errors or unexpected behavior.</para>
           </section>
           <section xml:id="enable.bucketcache">
-            <title>Enable BucketCache</title>
-                <para>The usual deploy of BucketCache is via a
-                    managing class that sets up two caching tiers: an L1 onheap cache
-                    implemented by LruBlockCache and a second L2 cache implemented
-                    with BucketCache. The managing class is <link
-                xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html">CombinedBlockCache</link> by default. The just-previous link describes the mechanism of CombinedBlockCache. In short, it works
+            <title>How to Enable BucketCache</title>
+                <para>The usual deploy of BucketCache is via a managing class that sets up two caching tiers: an L1 onheap cache
+                    implemented by LruBlockCache and a second L2 cache implemented with BucketCache. The managing class is <link
+                        xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html">CombinedBlockCache</link> by default.
+            The just-previous link describes the caching 'policy' implemented by CombinedBlockCache. In short, it works
             by keeping meta blocks -- INDEX and BLOOM in the L1, onheap LruBlockCache tier -- and DATA
             blocks are kept in the L2, BucketCache tier. It is possible to amend this behavior in
-            HBase since version 1.0 and ask that a column family have both its meta and DATA blocks hosted onheap in the L1 tier by
+            HBase since version 1.0 and ask that a column family has both its meta and DATA blocks hosted onheap in the L1 tier by
             setting <varname>cacheDataInL1</varname> via <programlisting>(HColumnDescriptor.setCacheDataInL1(true)</programlisting>
             or in the shell, creating or amending column families setting <varname>CACHE_DATA_IN_L1</varname>
             to true: e.g. <programlisting>hbase(main):003:0> create 't', {NAME => 't', CONFIGURATION => {CACHE_DATA_IN_L1 => 'true'}}</programlisting></para>
-        <para>The BucketCache deploy can be onheap, offheap, or file based. You set which via the
-            <varname>hbase.bucketcache.ioengine</varname> setting it to
-            <varname>heap</varname> for BucketCache running as part of the java heap,
-            <varname>offheap</varname> for BucketCache to make allocations offheap,
-            and <varname>file:PATH_TO_FILE</varname> for BucketCache to use a file
-            (Useful in particular if you have some fast i/o attached to the box such
+
+        <para>The BucketCache Block Cache can be deployed onheap, offheap, or file based.
+            You set which via the
+            <varname>hbase.bucketcache.ioengine</varname> setting.  Setting it to
+            <varname>heap</varname> will have BucketCache deployed inside the 
+            allocated java heap. Setting it to <varname>offheap</varname> will have
+            BucketCache make its allocations offheap,
+            and an ioengine setting of <varname>file:PATH_TO_FILE</varname> will direct
+            BucketCache to use a file caching (Useful in particular if you have some fast i/o attached to the box such
             as SSDs).
         </para>
-        <para>To disable CombinedBlockCache, and use the BucketCache as a strict L2 cache to the L1
-              LruBlockCache, set <varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
-                <literal>false</literal>. In this mode, on eviction from L1, blocks go to L2.</para>
+        <para xml:id="raw.l1.l2">It is possible to deploy an L1+L2 setup where we bypass the CombinedBlockCache
+            policy and have BucketCache working as a strict L2 cache to the L1
+              LruBlockCache. For such a setup, set <varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
+              <literal>false</literal>. In this mode, on eviction from L1, blocks go to L2.
+              When a block is cached, it is cached first in L1. When we go to look for a cached block,
+              we look first in L1 and if none found, then search L2.  Let us call this deploy format,
+              <emphasis><indexterm><primary>Raw L1+L2</primary></indexterm></emphasis>.</para>
+          <para>Other BucketCache configs include: specifying a location to persist cache to across
+              restarts, how many threads to use writing the cache, etc.  See the
+              <link xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html">CacheConfig.html</link>
+              class for configuration options and descriptions.</para>
 
             <procedure>
               <title>BucketCache Example Configuration</title>
@@ -2230,6 +2242,27 @@ rs.close();
                 In other words, you configure the L1 LruBlockCache as you would normally,
                 as you would when there is no L2 BucketCache present.
             </para>
+            <note xml:id="direct.memory">
+                <title>Direct Memory Usage In HBase</title>
+                <para>The default maximum direct memory varies by JVM.  Traditionally it is 64M
+                    or some relation to allocated heap size (-Xmx) or no limit at all (JDK7 apparently).
+                    HBase servers use direct memory, in particular short-circuit reading, the hosted DFSClient will
+                    allocate direct memory buffers.  If you do offheap block caching, you'll
+                    be making use of direct memory.  Starting your JVM, make sure
+                    the <varname>-XX:MaxDirectMemorySize</varname> setting in
+                    <filename>conf/hbase-env.sh</filename> is set to some value that is
+                    higher than what you have allocated to your offheap blockcache
+                    (<varname>hbase.bucketcache.size</varname>).  It should be larger than your offheap block
+                    cache and then some for DFSClient usage (How much the DFSClient uses is not
+                    easy to quantify; it is the number of open hfiles * <varname>hbase.dfs.client.read.shortcircuit.buffer.size</varname>
+                    where hbase.dfs.client.read.shortcircuit.buffer.size is set to 128k in HBase -- see <filename>hbase-default.xml</filename>
+                    default configurations).
+                </para>
+                <para>You can see how much memory -- onheap and offheap/direct -- a RegionServer is configured to use
+                    and how much it is using at any one time by looking at the
+                    <emphasis>Server Metrics: Memory</emphasis> tab in the UI.
+                </para>
+            </note>
           </section>
         </section>
       </section>