You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Mohit Anchlia <mo...@gmail.com> on 2012/07/26 22:30:30 UTC

Bloom Filter

Is it advisable to enable bloom filters on the column family?

Also, why is it called global kill switch?

Bloom Filter Configuration
  2.9.1. io.hfile.bloom.enabled global kill switch

io.hfile.bloom.enabled in Configuration serves as the kill switch in case
something goes wrong. Default = true.

Re: Bloom Filter

Posted by Stack <st...@duboce.net>.

On Fri, Jul 27, 2012 at 4:25 PM, Alex Baranau <al...@gmail.com> wrote:
> Should we put the link to it from Apache HBase book (ref guide)?
>

I added link.  Will show next time we push the site.
St.Ack

Re: Bloom Filter

Posted by Mohit Anchlia <mo...@gmail.com>.

On Fri, Jul 27, 2012 at 7:25 AM, Alex Baranau <al...@gmail.com>wrote:

> Very good explanation (and food for thinking) about using bloom filters in
> HBase in answers here:
> http://www.quora.com/How-are-bloom-filters-used-in-HBase.
>
> Should we put the link to it from Apache HBase book (ref guide)?
>

Thanks this is helpful

>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> On Thu, Jul 26, 2012 at 8:38 PM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
>
> > On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen <md...@gmail.com>
> > wrote:
> >
> > > Mohit,
> > >
> > > According to HBase: The Definitive Guide,
> > >
> > > The row+column Bloom filter is useful when you cannot batch updates
> for a
> > > specific row, and end up with store files which all contain parts of
> the
> > > row. The more specific row+column filter can then identify which of the
> > > files contain the data you are requesting. Obviously, if you always
> load
> > > the entire row, this filter is once again hardly useful, as the region
> > > server will need to load the matching block out of each file anyway.
> >  Since
> > > the row+column filter will require more storage, you need to do the
> math
> > to
> > > determine whether it is worth the extra resources.
> > >
> >
> > Thanks! I have a timeseries data so I am thinking I should enable bloom
> > filters for only rows
> >
> > >
> > >
> > >    ~ Minh
> > >
> > > On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia <mohitanchlia@gmail.com
> > > >wrote:
> > >
> > > > Is it advisable to enable bloom filters on the column family?
> > > >
> > > > Also, why is it called global kill switch?
> > > >
> > > > Bloom Filter Configuration
> > > >   2.9.1. io.hfile.bloom.enabled global kill switch
> > > >
> > > > io.hfile.bloom.enabled in Configuration serves as the kill switch in
> > case
> > > > something goes wrong. Default = true.
> > > >
> > >
> >
>
>
>
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>

Re: Bloom Filter

Posted by Alex Baranau <al...@gmail.com>.

Very good explanation (and food for thinking) about using bloom filters in
HBase in answers here:
http://www.quora.com/How-are-bloom-filters-used-in-HBase.

Should we put the link to it from Apache HBase book (ref guide)?

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Thu, Jul 26, 2012 at 8:38 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen <md...@gmail.com>
> wrote:
>
> > Mohit,
> >
> > According to HBase: The Definitive Guide,
> >
> > The row+column Bloom filter is useful when you cannot batch updates for a
> > specific row, and end up with store files which all contain parts of the
> > row. The more specific row+column filter can then identify which of the
> > files contain the data you are requesting. Obviously, if you always load
> > the entire row, this filter is once again hardly useful, as the region
> > server will need to load the matching block out of each file anyway.
>  Since
> > the row+column filter will require more storage, you need to do the math
> to
> > determine whether it is worth the extra resources.
> >
>
> Thanks! I have a timeseries data so I am thinking I should enable bloom
> filters for only rows
>
> >
> >
> >    ~ Minh
> >
> > On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia <mohitanchlia@gmail.com
> > >wrote:
> >
> > > Is it advisable to enable bloom filters on the column family?
> > >
> > > Also, why is it called global kill switch?
> > >
> > > Bloom Filter Configuration
> > >   2.9.1. io.hfile.bloom.enabled global kill switch
> > >
> > > io.hfile.bloom.enabled in Configuration serves as the kill switch in
> case
> > > something goes wrong. Default = true.
> > >
> >
>



-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Re: Bloom Filter

Posted by Mohit Anchlia <mo...@gmail.com>.

On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen <md...@gmail.com> wrote:

> Mohit,
>
> According to HBase: The Definitive Guide,
>
> The row+column Bloom filter is useful when you cannot batch updates for a
> specific row, and end up with store files which all contain parts of the
> row. The more specific row+column filter can then identify which of the
> files contain the data you are requesting. Obviously, if you always load
> the entire row, this filter is once again hardly useful, as the region
> server will need to load the matching block out of each file anyway.  Since
> the row+column filter will require more storage, you need to do the math to
> determine whether it is worth the extra resources.
>

Thanks! I have a timeseries data so I am thinking I should enable bloom
filters for only rows

>
>
>    ~ Minh
>
> On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
>
> > Is it advisable to enable bloom filters on the column family?
> >
> > Also, why is it called global kill switch?
> >
> > Bloom Filter Configuration
> >   2.9.1. io.hfile.bloom.enabled global kill switch
> >
> > io.hfile.bloom.enabled in Configuration serves as the kill switch in case
> > something goes wrong. Default = true.
> >
>

Re: Bloom Filter

Posted by Minh Duc Nguyen <md...@gmail.com>.

Mohit,

According to HBase: The Definitive Guide,

The row+column Bloom filter is useful when you cannot batch updates for a
specific row, and end up with store files which all contain parts of the
row. The more specific row+column filter can then identify which of the
files contain the data you are requesting. Obviously, if you always load
the entire row, this filter is once again hardly useful, as the region
server will need to load the matching block out of each file anyway.  Since
the row+column filter will require more storage, you need to do the math to
determine whether it is worth the extra resources.

   ~ Minh

On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> Is it advisable to enable bloom filters on the column family?
>
> Also, why is it called global kill switch?
>
> Bloom Filter Configuration
>   2.9.1. io.hfile.bloom.enabled global kill switch
>
> io.hfile.bloom.enabled in Configuration serves as the kill switch in case
> something goes wrong. Default = true.
>