You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2010/10/16 23:55:34 UTC

svn commit: r1023380 - in /hbase/trunk: CHANGES.txt src/docbkx/book.xml

Author: stack
Date: Sat Oct 16 21:55:34 2010
New Revision: 1023380

URL: http://svn.apache.org/viewvc?rev=1023380&view=rev
Log:
HBASE-3097 Merge in hbase-1200 doc on bloomfilters into hbase book

Modified:
    hbase/trunk/CHANGES.txt
    hbase/trunk/src/docbkx/book.xml

Modified: hbase/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hbase/trunk/CHANGES.txt?rev=1023380&r1=1023379&r2=1023380&view=diff
==============================================================================
--- hbase/trunk/CHANGES.txt (original)
+++ hbase/trunk/CHANGES.txt Sat Oct 16 21:55:34 2010
@@ -1006,6 +1006,7 @@ Release 0.21.0 - Unreleased
    HBASE-2968  No standard family filter provided (Andrey Stepachev)
    HBASE-3088  TestAvroServer and TestThriftServer broken because use same
                table in all tests and tests enable/disable/delete
+   HBASE-3097  Merge in hbase-1200 doc on bloomfilters into hbase book
 
   NEW FEATURES
    HBASE-1961  HBase EC2 scripts

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1023380&r1=1023379&r2=1023380&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Sat Oct 16 21:55:34 2010
@@ -33,6 +33,12 @@
     </section>
   </chapter>
 
+  <chapter>
+    <title>The HBase Shell</title>
+
+    <para></para>
+  </chapter>
+
   <chapter xml:id="filesystem">
     <title>Filesystem Format</title>
 
@@ -750,4 +756,129 @@
       </section>
     </section>
   </chapter>
+
+  <chapter>
+    <title>Bloom Filters</title>
+
+    <para>Bloom filters were developed over in <link
+    xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
+    Add bloomfilters</link>.<footnote>
+        <para>For description of the development process -- why static blooms
+        rather than dynamic -- and for an overview of the unique properties
+        that pertain to blooms in HBase, as well as possible future
+        directions, see the <emphasis>Development Process</emphasis> section
+        of the document <link
+        xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
+        in HBase</link> attached to <link
+        xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
+      </footnote><footnote>
+        <para>The bloom filters described here are actually version two of
+        blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
+        option based on work done by the <link
+        xlink:href="http://www.one-lab.org">European Commission One-Lab
+        Project 034819</link>. The core of the HBase bloom work was later
+        pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
+        Version 1 of HBase blooms never worked that well. Version 2 is a
+        rewrite from scratch though again it starts with the one-lab
+        work.</para>
+      </footnote></para>
+
+    <section>
+      <title>Configurations</title>
+
+      <para>Blooms are enabled by specifying options on a column family in the
+      HBase shell or in </para>
+
+      <section>
+        <title><code>HColumnDescriptor</code> option</title>
+
+        <para>Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW |
+        ROWCOL)</code> to enable blooms per Column Family. Default =
+        <varname>NONE</varname> for no bloom filters. If
+        <varname>ROW</varname>, the hash of the row will be added to the bloom
+        on each insert. If <varname>ROWCOL</varname>, the hash of the row +
+        column family + column family qualifier will be added to the bloom on
+        each key insert.</para>
+      </section>
+
+      <section>
+        <title><varname>io.hfile.bloom.enabled</varname> global kill
+        switch</title>
+
+        <para><code>io.hfile.bloom.enabled</code> in
+        <classname>Configuration</classname> serves as the kill switch in case
+        something goes wrong. Default = <varname>true</varname>.</para>
+      </section>
+
+      <section>
+        <title><varname>io.hfile.bloom.error.rate</varname></title>
+
+        <para><varname>io.hfile.bloom.error.rate</varname> = average false
+        positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
+        bit per bloom entry.</para>
+      </section>
+
+      <section>
+        <title><varname>io.hfile.bloom.max.fold</varname></title>
+
+        <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
+        fold rate. Most people should leave this alone. Default = 7, or can
+        collapse to at least 1/128th of original size. See the
+        <emphasis>Development Process</emphasis> section of the document <link
+        xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
+        in HBase</link> for more on what this option means.</para>
+      </section>
+    </section>
+
+    <section>
+      <title>Bloom StoreFile footprint</title>
+
+      <para>Bloom filters add an entry to the <classname>StoreFile</classname>
+      general <classname>FileInfo</classname> data structure and then two
+      extra entries to the <classname>StoreFile</classname> metadata
+      section.</para>
+
+      <section>
+        <title>BloomFilter in the <classname>StoreFile</classname>
+        <classname>FileInfo</classname> data structure</title>
+
+        <section>
+          <title><varname>BLOOM_FILTER_TYPE</varname></title>
+
+          <para><classname>FileInfo</classname> has a
+          <varname>BLOOM_FILTER_TYPE</varname> entry which is set to
+          <varname>NONE</varname>, <varname>ROW</varname> or
+          <varname>ROWCOL.</varname></para>
+        </section>
+      </section>
+
+      <section>
+        <title>BloomFilter entries in <classname>StoreFile</classname>
+        metadata</title>
+
+        <section>
+          <title><varname>BLOOM_FILTER_META</varname></title>
+
+          <para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
+          Function used, etc. Its small in size and is cached on
+          <classname>StoreFile.Reader</classname> load</para>
+        </section>
+
+        <section>
+          <title><varname>BLOOM_FILTER_DATA</varname></title>
+
+          <para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
+          data. Obtained on-demand. Stored in the LRU cache, if it is enabled
+          (Its enabled by default).</para>
+        </section>
+      </section>
+    </section>
+  </chapter>
+
+  <appendix>
+    <title>Tools</title>
+
+    <para>Here we list HBase tools for administration, analysis, fixup, and
+    debugging.</para>
+  </appendix>
 </book>