You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bryan Beaudreault <bb...@hubspot.com> on 2012/02/16 03:18:26 UTC

Scans and Bloom Filter

Hello,

We are looking at Bloom Filters and wondering if they are helpful when
doing a sequential read (multi-row scan) or only when doing a Get for a
single row.  It logically makes sense that it would only affect (or to
greater affect) getting a single row since it is a way for determining if
you have to read a whole store file when fetching a key.  But, we are told
that Scan and Get are essentially the same code on the backend, so I
imagine both will check the Blooms if they exist.

Also, would a ROWCOL bloom be more effective if you are often doing
multi-row scans but always with specifying only a subset of columns in
those rows?

Thanks,

Bryan

Re: Scans and Bloom Filter

Posted by Doug Meil <do...@explorysmedical.com>.
Good stuff Nicholas, I'll add this to the book.





On 2/16/12 3:52 PM, "Nicolas Spiegelberg" <ns...@fb.com> wrote:

>Bryan,
>
>Currently, ROW & ROWCOL Bloom Filters are only checked for explicit,
>single-row 'Get' scans.  ROWCOL BFs are only checked when you're querying
>for explicit column qualifiers (vs getting the entire row).  This is
>because multi-row scans & full-row scans are implicit queries.  To
>clarify: 
>
>With a multirow scan, the next row after 0x0001 is NOT 0x0002.  HBase only
>knows that the next row is > 0x0001.  The next row could be 0x00010 or
>0x0003.  However, when you call Htable.get(row=0x0001), HBase knows that
>you explicitly want that row and don't want 0x00010.
>
>Nicolas
>
>On 2/15/12 9:18 PM, "Bryan Beaudreault" <bb...@hubspot.com> wrote:
>
>>Hello,
>>
>>We are looking at Bloom Filters and wondering if they are helpful when
>>doing a sequential read (multi-row scan) or only when doing a Get for a
>>single row.  It logically makes sense that it would only affect (or to
>>greater affect) getting a single row since it is a way for determining if
>>you have to read a whole store file when fetching a key.  But, we are
>>told
>>that Scan and Get are essentially the same code on the backend, so I
>>imagine both will check the Blooms if they exist.
>>
>>Also, would a ROWCOL bloom be more effective if you are often doing
>>multi-row scans but always with specifying only a subset of columns in
>>those rows?
>>
>>Thanks,
>>
>>Bryan
>
>



Re: Scans and Bloom Filter

Posted by Nicolas Spiegelberg <ns...@fb.com>.
Bryan,

Currently, ROW & ROWCOL Bloom Filters are only checked for explicit,
single-row 'Get' scans.  ROWCOL BFs are only checked when you're querying
for explicit column qualifiers (vs getting the entire row).  This is
because multi-row scans & full-row scans are implicit queries.  To
clarify: 

With a multirow scan, the next row after 0x0001 is NOT 0x0002.  HBase only
knows that the next row is > 0x0001.  The next row could be 0x00010 or
0x0003.  However, when you call Htable.get(row=0x0001), HBase knows that
you explicitly want that row and don't want 0x00010.

Nicolas

On 2/15/12 9:18 PM, "Bryan Beaudreault" <bb...@hubspot.com> wrote:

>Hello,
>
>We are looking at Bloom Filters and wondering if they are helpful when
>doing a sequential read (multi-row scan) or only when doing a Get for a
>single row.  It logically makes sense that it would only affect (or to
>greater affect) getting a single row since it is a way for determining if
>you have to read a whole store file when fetching a key.  But, we are told
>that Scan and Get are essentially the same code on the backend, so I
>imagine both will check the Blooms if they exist.
>
>Also, would a ROWCOL bloom be more effective if you are often doing
>multi-row scans but always with specifying only a subset of columns in
>those rows?
>
>Thanks,
>
>Bryan