You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Philo Yang <ud...@gmail.com> on 2014/09/14 20:22:57 UTC

why bloom filter is only for row key?

Hi all,

After reading some docs, I find that bloom filter is built on row keys, not
on column key. Can anyone tell me what is considered for not building bloom
filter on column key? Is it a good idea to offer a table property option
between row key and primary key for what boolm filter is built on?

Thanks,
Philo Yang

Re: why bloom filter is only for row key?

Posted by Mark Papadakis <ma...@icloud.com>.
There was a per CF skip list and bloom filter in earlier C* releases(serialized right before the actual value). Haven't checked lately but I believe this is no longer the case, or it's configurable and by default it doesn't do it. There is probably a ticket on JIRA that outlines that rationale.

Sent from my iPhone

> On 14 Σεπ 2014, at 9:22 μ.μ., Philo Yang <ud...@gmail.com> wrote:
> 
> Hi all,
> 
> After reading some docs, I find that bloom filter is built on row keys, not
> on column key. Can anyone tell me what is considered for not building bloom
> filter on column key? Is it a good idea to offer a table property option
> between row key and primary key for what boolm filter is built on?
> 
> Thanks,
> Philo Yang

Re: why bloom filter is only for row key?

Posted by Philo Yang <ud...@gmail.com>.
Thanks Rob

Thanks,
Philo Yang


2014-09-16 2:12 GMT+08:00 DuyHai Doan <do...@gmail.com>:

> Nice catch Rob
>
> On Mon, Sep 15, 2014 at 8:04 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud...@gmail.com> wrote:
>>
>>> After reading some docs, I find that bloom filter is built on row keys,
>>> not on column key. Can anyone tell me what is considered for not building
>>> bloom filter on column key? Is it a good idea to offer a table property
>>> option between row key and primary key for what boolm filter is built on?
>>>
>>
>> Here's the nitty gritty of the process which removed the row-level bloom
>> filter on column keys.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-4885 - "Remove or rework
>> per-row bloom filters"
>>
>> =Rob
>> http://twitter.com/rcolidba
>>
>
>

Re: why bloom filter is only for row key?

Posted by DuyHai Doan <do...@gmail.com>.
Nice catch Rob

On Mon, Sep 15, 2014 at 8:04 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud...@gmail.com> wrote:
>
>> After reading some docs, I find that bloom filter is built on row keys,
>> not on column key. Can anyone tell me what is considered for not building
>> bloom filter on column key? Is it a good idea to offer a table property
>> option between row key and primary key for what boolm filter is built on?
>>
>
> Here's the nitty gritty of the process which removed the row-level bloom
> filter on column keys.
>
> https://issues.apache.org/jira/browse/CASSANDRA-4885 - "Remove or rework
> per-row bloom filters"
>
> =Rob
> http://twitter.com/rcolidba
>

Re: why bloom filter is only for row key?

Posted by Robert Coli <rc...@eventbrite.com>.
On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud...@gmail.com> wrote:

> After reading some docs, I find that bloom filter is built on row keys,
> not on column key. Can anyone tell me what is considered for not building
> bloom filter on column key? Is it a good idea to offer a table property
> option between row key and primary key for what boolm filter is built on?
>

Here's the nitty gritty of the process which removed the row-level bloom
filter on column keys.

https://issues.apache.org/jira/browse/CASSANDRA-4885 - "Remove or rework
per-row bloom filters"

=Rob
http://twitter.com/rcolidba

Re: why bloom filter is only for row key?

Posted by Robert Coli <rc...@eventbrite.com>.
On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud...@gmail.com> wrote:

> After reading some docs, I find that bloom filter is built on row keys,
> not on column key. Can anyone tell me what is considered for not building
> bloom filter on column key? Is it a good idea to offer a table property
> option between row key and primary key for what boolm filter is built on?
>

Here's the nitty gritty of the process which removed the row-level bloom
filter on column keys.

https://issues.apache.org/jira/browse/CASSANDRA-4885 - "Remove or rework
per-row bloom filters"

=Rob
http://twitter.com/rcolidba

Re: why bloom filter is only for row key?

Posted by Philo Yang <ud...@gmail.com>.
Thanks DuyHai,

I think the trouble of bloom filter on all row keys & column names is
memory usage. However, if a CF has only hundreds of columns per row,  the
number of total columns will be much fewer, so the bloom filter is possible
for this condition, right? Is there a good way to adjust bloom filter's
property between row keys and row keys+column names automatically or by
user's config?

Thanks,
Philo Yang


2014-09-15 2:45 GMT+08:00 DuyHai Doan <do...@gmail.com>:

> Hello Philo
>
>  Building bloom filter for column names (what you call column key) is
> technically possible but very expensive in term of memory usage.
>
>   The approximate formula to calculate space required by bloom filter can
> be found on slide 27 here:
> http://fr.slideshare.net/quipo/modern-algorithms-and-data-structures-1-bloom-filters-merkle-trees
>
> false positive chance = 0.6185 * m/n  where m = number of bits for the
> filter and n = number of distinct keys
>
> For example, if you want to index 1 million of rows, each having 100 000
> columns on average, it will end up indexing 100 billions of keys (row keys
> & column names) with bloom filter.
>
>  By applying the above formula, m ≈ 4.8 * 10^11 bits ≈ 60Gb to allocate in
> RAM just for bloom filter on all row keys & column names ...
>
>  Regards
>
>  Duy Hai DOAN
>
> On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud...@gmail.com> wrote:
>
>> Hi all,
>>
>> After reading some docs, I find that bloom filter is built on row keys,
>> not on column key. Can anyone tell me what is considered for not building
>> bloom filter on column key? Is it a good idea to offer a table property
>> option between row key and primary key for what boolm filter is built on?
>>
>> Thanks,
>> Philo Yang
>>
>>
>

Re: why bloom filter is only for row key?

Posted by Philo Yang <ud...@gmail.com>.
Thanks DuyHai,

I think the trouble of bloom filter on all row keys & column names is
memory usage. However, if a CF has only hundreds of columns per row,  the
number of total columns will be much fewer, so the bloom filter is possible
for this condition, right? Is there a good way to adjust bloom filter's
property between row keys and row keys+column names automatically or by
user's config?

Thanks,
Philo Yang


2014-09-15 2:45 GMT+08:00 DuyHai Doan <do...@gmail.com>:

> Hello Philo
>
>  Building bloom filter for column names (what you call column key) is
> technically possible but very expensive in term of memory usage.
>
>   The approximate formula to calculate space required by bloom filter can
> be found on slide 27 here:
> http://fr.slideshare.net/quipo/modern-algorithms-and-data-structures-1-bloom-filters-merkle-trees
>
> false positive chance = 0.6185 * m/n  where m = number of bits for the
> filter and n = number of distinct keys
>
> For example, if you want to index 1 million of rows, each having 100 000
> columns on average, it will end up indexing 100 billions of keys (row keys
> & column names) with bloom filter.
>
>  By applying the above formula, m ≈ 4.8 * 10^11 bits ≈ 60Gb to allocate in
> RAM just for bloom filter on all row keys & column names ...
>
>  Regards
>
>  Duy Hai DOAN
>
> On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud...@gmail.com> wrote:
>
>> Hi all,
>>
>> After reading some docs, I find that bloom filter is built on row keys,
>> not on column key. Can anyone tell me what is considered for not building
>> bloom filter on column key? Is it a good idea to offer a table property
>> option between row key and primary key for what boolm filter is built on?
>>
>> Thanks,
>> Philo Yang
>>
>>
>

Re: why bloom filter is only for row key?

Posted by DuyHai Doan <do...@gmail.com>.
Hello Philo

 Building bloom filter for column names (what you call column key) is
technically possible but very expensive in term of memory usage.

  The approximate formula to calculate space required by bloom filter can
be found on slide 27 here:
http://fr.slideshare.net/quipo/modern-algorithms-and-data-structures-1-bloom-filters-merkle-trees

false positive chance = 0.6185 * m/n  where m = number of bits for the
filter and n = number of distinct keys

For example, if you want to index 1 million of rows, each having 100 000
columns on average, it will end up indexing 100 billions of keys (row keys
& column names) with bloom filter.

 By applying the above formula, m ≈ 4.8 * 10^11 bits ≈ 60Gb to allocate in
RAM just for bloom filter on all row keys & column names ...

 Regards

 Duy Hai DOAN

On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud...@gmail.com> wrote:

> Hi all,
>
> After reading some docs, I find that bloom filter is built on row keys,
> not on column key. Can anyone tell me what is considered for not building
> bloom filter on column key? Is it a good idea to offer a table property
> option between row key and primary key for what boolm filter is built on?
>
> Thanks,
> Philo Yang
>
>