You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Radim Kolar <hs...@sendmail.cz> on 2011/11/11 07:55:10 UTC

configurable bloom filters (like hbase)

i have problem with large CF (about 200 billions entries per node). 
While i can configure index_interval to lower memory requirements, i 
still have to stick with huge bloom filters.

Ideal would be to have bloom filters configurable like in hbase. 
Cassandra standard is about 1.05% false possitive but in my case i would 
be fine even with 20% false positive rate. Data are not often read back. 
Most of them will be never read before they expire via TTL.

Re: configurable bloom filters (like hbase)

Posted by Brandon Williams <dr...@gmail.com>.
https://issues.apache.org/jira/browse/CASSANDRA-3497

On Wed, Dec 14, 2011 at 4:52 AM, Radim Kolar <hs...@sendmail.cz> wrote:
> Dne 11.11.2011 7:55, Radim Kolar napsal(a):
>
>> i have problem with large CF (about 200 billions entries per node). While
>> i can configure index_interval to lower memory requirements, i still have to
>> stick with huge bloom filters.
>>
>> Ideal would be to have bloom filters configurable like in hbase. Cassandra
>> standard is about 1.05% false possitive but in my case i would be fine even
>> with 20% false positive rate. Data are not often read back. Most of them
>> will be never read before they expire via TTL.
>
> anybody other has problem that bloom filters are using too much memory in
> applications which do not needs to read written data often?
>
> I am looking at bloom filters memory used and it would be ideal to have in
> cassandra-1.1 ability to shrink bloom filters to about 1/10 of their size.
> Is possible to code something like this: save bloom filters to disk as usual
> but during load, transform them into something smaller at cost increasing FP
> rate?

Re: configurable bloom filters (like hbase)

Posted by Radim Kolar <hs...@sendmail.cz>.
Dne 11.11.2011 7:55, Radim Kolar napsal(a):
> i have problem with large CF (about 200 billions entries per node). 
> While i can configure index_interval to lower memory requirements, i 
> still have to stick with huge bloom filters.
>
> Ideal would be to have bloom filters configurable like in hbase. 
> Cassandra standard is about 1.05% false possitive but in my case i 
> would be fine even with 20% false positive rate. Data are not often 
> read back. Most of them will be never read before they expire via TTL.
anybody other has problem that bloom filters are using too much memory 
in applications which do not needs to read written data often?

I am looking at bloom filters memory used and it would be ideal to have 
in cassandra-1.1 ability to shrink bloom filters to about 1/10 of their 
size. Is possible to code something like this: save bloom filters to 
disk as usual but during load, transform them into something smaller at 
cost increasing FP rate?