You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/10/15 23:04:46 UTC

[Lucene-hadoop Wiki] Update of "Hbase/UsingBloomFilters" by JimKellerman

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/UsingBloomFilters

The comment on the change is:
Initial content

------------------------------------------------------------------------------
- Describe Hbase/UsingBloomFilters here.
+ Bloom filters can be enabled on a per-column family basis in Hbase. 
+ There are three bloom filter variants supported:
+  1. A [http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal bloom filter] as defined by Bloom in 1970.
+  1. A [http://portal.acm.org/citation.cfm?id=343571.343572 counting bloom filter] as defined by Fan et al. in a ToN 2000 paper.
+  1. A [http://www-rp.lip6.fr/site_npa/site_rp/_publications/740-rbf_cameraready.pdf retouched bloom filter] as described in the CoNEXT 2006 paper.
  
- TBS.
+ There are two ways in which a bloom filter can be instantiated:
+  1. by supplying the estimated number of values, in which case HBase selects the number of hash functions to be 4 and computes the vector size from the formula
+   {{{size = number-of-values * number-of-hashfunctions / ln(2) }}}
  
+  This formula was presented in [http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/BloomFilterSurvey.pdf Network Applications of Bloom Filters: A Survey, by Broder and Mitzenmacher]
+  1.#2 by specifying the vector size and the number of hash functions explicitly.
+ 
+ Both of these techniques are demonstrated in the Junit test hbase.!TestBloomFilters.
+ 
+ '''Additional Resources:'''
+ 
+  1. [http://www.cc.gatech.edu/~manolios/bloom-filters/calculator.html Bloom Filter Calculator]
+  1. [http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html Bloom Filters - the math]
+  1. [http://www.flipcode.com/articles/article_bloomfilters.shtml Coding Bloom Filters]
+