You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Ryan Rawson <ry...@gmail.com> on 2009/03/05 10:41:07 UTC

Re: [jira] Assigned: (HBASE-1200) Add bloomfilters to hfile; use dynamicbloomfilter instead of base bloomfilter; depend on hadoop 0.20

I'm going to give this a shot tomorrow.

Plan is to use row:cf:colqual as the bloom-filter 'key'.  That way we can
test if any specific row/col is in any specific file.  I might also add
'row' only as another bloom filter to test.

Note that in general this would only be useful once we know that a specific
row/column exists and want to optimize how many files we have to seek/read.

-ryan

On Thu, Mar 5, 2009 at 1:27 AM, ryan rawson (JIRA) <ji...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/HBASE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> ryan rawson reassigned HBASE-1200:
> ----------------------------------
>
>    Assignee: ryan rawson  (was: stack)
>
> > Add bloomfilters to hfile; use dynamicbloomfilter instead of base
> bloomfilter; depend on hadoop 0.20
> >
> ----------------------------------------------------------------------------------------------------
> >
> >                 Key: HBASE-1200
> >                 URL: https://issues.apache.org/jira/browse/HBASE-1200
> >             Project: Hadoop HBase
> >          Issue Type: Task
> >            Reporter: stack
> >            Assignee: ryan rawson
> >             Fix For: 0.20.0
> >
> >
> > Add bloomfiltering to hfile.  Should it be optional or on always?
>  Currently, we bloom filter rows only, not the column + ts component, which
> seems good place to start but we size the bloomfilter with the number of
> entries we are about to flush which seems like usually we'd be making a
> filter too big.  How to figure how many rows in the flush?   We should use
> the DynamicBloomFilter as Andrezj does up in hadoop BloomFilterMapFile.
>  Start small and let it resize as entries are added.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>