You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "David G. Boney" <ch...@austin-acm-sigkdd.org> on 2013/01/16 00:24:52 UTC

Bloom filter based scanner/filter

I am building a data cube on top of HBase. All access to the data is by map/reduce jobs. I want to build a scanner where its first matching criteria is based on the set intersection of bloom filters, followed by additional matching criteria specified in the current filter architecture. First, I run a map/reduce job on table A. For every row I match in table A, I add the row key to a bloom filter. I then do a map/reduce job on table B, where the row keys are over the same domain as table A. I want to build a scanner that can use the builtin Bloom filters in HBase. When the scanner goes to get the block of data to which a row key based bloom filter is attached, it does a set intersection with the table A bloom filter to see if any of the keys from Table A are in the block. If so, the block is read in and the the scanner does addition matching on the rows according to the filter.

This is a simplification of my problem. I am trying to find out what the complexity of implementing such a feature would be in HBase.
-----------------
Sincerely,
David G. Boney
Chair, Austin ACM SIGKDD
chair@austin-acm-sigkdd.org
http://www.meetup.com/Austin-ACM-SIGKDD/
http://tech.groups.yahoo.com/group/austinsigkdd/