You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2010/10/16 23:55:34 UTC
svn commit: r1023380 - in /hbase/trunk: CHANGES.txt src/docbkx/book.xml
Author: stack
Date: Sat Oct 16 21:55:34 2010
New Revision: 1023380
URL: http://svn.apache.org/viewvc?rev=1023380&view=rev
Log:
HBASE-3097 Merge in hbase-1200 doc on bloomfilters into hbase book
Modified:
hbase/trunk/CHANGES.txt
hbase/trunk/src/docbkx/book.xml
Modified: hbase/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hbase/trunk/CHANGES.txt?rev=1023380&r1=1023379&r2=1023380&view=diff
==============================================================================
--- hbase/trunk/CHANGES.txt (original)
+++ hbase/trunk/CHANGES.txt Sat Oct 16 21:55:34 2010
@@ -1006,6 +1006,7 @@ Release 0.21.0 - Unreleased
HBASE-2968 No standard family filter provided (Andrey Stepachev)
HBASE-3088 TestAvroServer and TestThriftServer broken because use same
table in all tests and tests enable/disable/delete
+ HBASE-3097 Merge in hbase-1200 doc on bloomfilters into hbase book
NEW FEATURES
HBASE-1961 HBase EC2 scripts
Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1023380&r1=1023379&r2=1023380&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Sat Oct 16 21:55:34 2010
@@ -33,6 +33,12 @@
</section>
</chapter>
+ <chapter>
+ <title>The HBase Shell</title>
+
+ <para></para>
+ </chapter>
+
<chapter xml:id="filesystem">
<title>Filesystem Format</title>
@@ -750,4 +756,129 @@
</section>
</section>
</chapter>
+
+ <chapter>
+ <title>Bloom Filters</title>
+
+ <para>Bloom filters were developed over in <link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
+ Add bloomfilters</link>.<footnote>
+ <para>For description of the development process -- why static blooms
+ rather than dynamic -- and for an overview of the unique properties
+ that pertain to blooms in HBase, as well as possible future
+ directions, see the <emphasis>Development Process</emphasis> section
+ of the document <link
+ xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
+ in HBase</link> attached to <link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
+ </footnote><footnote>
+ <para>The bloom filters described here are actually version two of
+ blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
+ option based on work done by the <link
+ xlink:href="http://www.one-lab.org">European Commission One-Lab
+ Project 034819</link>. The core of the HBase bloom work was later
+ pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
+ Version 1 of HBase blooms never worked that well. Version 2 is a
+ rewrite from scratch though again it starts with the one-lab
+ work.</para>
+ </footnote></para>
+
+ <section>
+ <title>Configurations</title>
+
+ <para>Blooms are enabled by specifying options on a column family in the
+ HBase shell or in </para>
+
+ <section>
+ <title><code>HColumnDescriptor</code> option</title>
+
+ <para>Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW |
+ ROWCOL)</code> to enable blooms per Column Family. Default =
+ <varname>NONE</varname> for no bloom filters. If
+ <varname>ROW</varname>, the hash of the row will be added to the bloom
+ on each insert. If <varname>ROWCOL</varname>, the hash of the row +
+ column family + column family qualifier will be added to the bloom on
+ each key insert.</para>
+ </section>
+
+ <section>
+ <title><varname>io.hfile.bloom.enabled</varname> global kill
+ switch</title>
+
+ <para><code>io.hfile.bloom.enabled</code> in
+ <classname>Configuration</classname> serves as the kill switch in case
+ something goes wrong. Default = <varname>true</varname>.</para>
+ </section>
+
+ <section>
+ <title><varname>io.hfile.bloom.error.rate</varname></title>
+
+ <para><varname>io.hfile.bloom.error.rate</varname> = average false
+ positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
+ bit per bloom entry.</para>
+ </section>
+
+ <section>
+ <title><varname>io.hfile.bloom.max.fold</varname></title>
+
+ <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
+ fold rate. Most people should leave this alone. Default = 7, or can
+ collapse to at least 1/128th of original size. See the
+ <emphasis>Development Process</emphasis> section of the document <link
+ xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
+ in HBase</link> for more on what this option means.</para>
+ </section>
+ </section>
+
+ <section>
+ <title>Bloom StoreFile footprint</title>
+
+ <para>Bloom filters add an entry to the <classname>StoreFile</classname>
+ general <classname>FileInfo</classname> data structure and then two
+ extra entries to the <classname>StoreFile</classname> metadata
+ section.</para>
+
+ <section>
+ <title>BloomFilter in the <classname>StoreFile</classname>
+ <classname>FileInfo</classname> data structure</title>
+
+ <section>
+ <title><varname>BLOOM_FILTER_TYPE</varname></title>
+
+ <para><classname>FileInfo</classname> has a
+ <varname>BLOOM_FILTER_TYPE</varname> entry which is set to
+ <varname>NONE</varname>, <varname>ROW</varname> or
+ <varname>ROWCOL.</varname></para>
+ </section>
+ </section>
+
+ <section>
+ <title>BloomFilter entries in <classname>StoreFile</classname>
+ metadata</title>
+
+ <section>
+ <title><varname>BLOOM_FILTER_META</varname></title>
+
+ <para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
+ Function used, etc. Its small in size and is cached on
+ <classname>StoreFile.Reader</classname> load</para>
+ </section>
+
+ <section>
+ <title><varname>BLOOM_FILTER_DATA</varname></title>
+
+ <para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
+ data. Obtained on-demand. Stored in the LRU cache, if it is enabled
+ (Its enabled by default).</para>
+ </section>
+ </section>
+ </section>
+ </chapter>
+
+ <appendix>
+ <title>Tools</title>
+
+ <para>Here we list HBase tools for administration, analysis, fixup, and
+ debugging.</para>
+ </appendix>
</book>