You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kanwar Sangha <ka...@mavenir.com> on 2013/02/02 18:45:16 UTC

BloomFilter

Hi - Couple of questions -



1) What is the ratio of the sstable file size to bloom filter size ? If i have a sstable of 1 GB, what is the approximate bloom filter size ? Assuming

0.000744 default val configured.



2) The bloom filters are stored in RAM but not in help from 1.2 onwards ?



3) What is the ratio of the RAM/Disk per node ?  What is the max disk size recommended for 1 node ? If I have 10 TB of data per node, how much RAM will the bloomfilter consume ?



Thanks,

kanwar


Re: BloomFilter

Posted by aaron morton <aa...@thelastpickle.com>.
> 1) What is the ratio of the sstable file size to bloom filter size ? If i have a sstable of 1 GB, what is the approximate bloom filter size ? Assuming
> 0.000744 default val configured.
The size of the bloom filter varies with the number of rows in the CF, not the on disk size. More correctly it's the number of rows in each SSTable as a row can be stored in multiple sstables. 

nodetool cfstats reports the total bloom filter size for each cf. 

> 2) The bloom filters are stored in RAM but not in help from 1.2 onwards ?
They are always in RAM. Pre 1.2 they were stored in the JVM heap, from 1.2 onwards they are stored off heap. 

> 3) What is the ratio of the RAM/Disk per node ?  What is the max disk size recommended for 1 node ? If I have 10 TB of data per node, how much RAM will the bloomfilter consume ?
If you are using a spinning disk (HDD) and have 1GB networking, I would consider 300GB to 500GB a good rule of thumb for a small <6 node cluster.

There issues have to do with the time it takes to run nodetool repair, and the time it takes to replace a failed node. Once you have a feel for how long this takes you may want to put more data on each node.

In 1.2 there are things that make replacing a node faster, but they tend to kick in at higher node counts.

Cheers

  
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/02/2013, at 6:45 AM, Kanwar Sangha <ka...@mavenir.com> wrote:

> Hi - Couple of questions -
>  
> 1) What is the ratio of the sstable file size to bloom filter size ? If i have a sstable of 1 GB, what is the approximate bloom filter size ? Assuming
> 0.000744 default val configured.
>  
> 2) The bloom filters are stored in RAM but not in help from 1.2 onwards ?
>  
> 3) What is the ratio of the RAM/Disk per node ?  What is the max disk size recommended for 1 node ? If I have 10 TB of data per node, how much RAM will the bloomfilter consume ?
>  
> Thanks,
> kanwar
>