You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Radim Kolar <hs...@sendmail.cz> on 2011/12/25 14:13:40 UTC

reported bloom filter FP ratio

I have following CF

                 Read Count: 68844
                 Read Latency: 9.942 ms.
                 Write Count: 209712
                 Write Latency: 0.297 ms.
                 Pending Tasks: 0
                 Bloom Filter False Postives: 10
                 Bloom Filter False Ratio: 0.00495
                 Bloom Filter Space Used: 2196632

why reported bloom filter FP ratio is not counted like this
 >>> 10/68844.0
0.00014525594096798558

Re: index sampling

Posted by Peter Schuller <pe...@infidyne.com>.

> on node with 300m rows (small node), it will be 585937 index sample entries
> with 512 sampling. lets say 100 bytes per entry this will be 585 MB, bloom
> filters are 884 MB. With default sampling 128, sampled entries will use
> majority of node memory. Index sampling should be reworked like bloom
> filters to avoid allocating one large array per sstable. hadoop mapfile is
> using sampling 128 by default too and it reads entire mapfile index into
> memory.

The index summary does have an ArrayList which will be backed by an
array which could become large; however larger than that array (which
is going to be 1 object reference per sample, or 1-2 taking into
account internal growth of the array list) will be the overhead of the
objects in the array (regular Java objects). This is also why it is
non-trivial to report on the data size.

> it should be clearly documented in
> http://wiki.apache.org/cassandra/LargeDataSetConsiderations - that bloom
> filters + index sampling will be responsible for most memory used by node.
> Caching itself has minimal use on large data set used for OLAP.

I added some information at the end.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

index sampling

Posted by Radim Kolar <hs...@sendmail.cz>.

 > That is a good reason for both to be configurable IMO.
index sampling is currently configurable only per node, it would be 
better to have it per Keyspace because we are using OLTP like and OLAP 
keyspaces in same cluster. OLAP Keyspaces has about 1000x more rows.

But its difficult to estimate index sampling memory until there will be 
way to monitor memory used by index sampling 
https://issues.apache.org/jira/browse/CASSANDRA-3662 . Java can use 
about 10x more memory than raw data for index sample entry - and from 
sstable/IndexSummary.java it seems that cassandra is using one big 
arrayList with <RowPosition,long>.

on node with 300m rows (small node), it will be 585937 index sample 
entries with 512 sampling. lets say 100 bytes per entry this will be 585 
MB, bloom filters are 884 MB. With default sampling 128, sampled entries 
will use majority of node memory. Index sampling should be reworked like 
bloom filters to avoid allocating one large array per sstable. hadoop 
mapfile is using sampling 128 by default too and it reads entire mapfile 
index into memory.

it should be clearly documented in 
http://wiki.apache.org/cassandra/LargeDataSetConsiderations - that bloom 
filters + index sampling will be responsible for most memory used by 
node. Caching itself has minimal use on large data set used for OLAP.

Re: reported bloom filter FP ratio

Posted by Peter Schuller <pe...@infidyne.com>.

>> I don't understand how you reached that conclusion.
>
> On my nodes most memory is consumed by bloom filters. Also 1.0 creates

The point is that just because that's the problem you have, doesn't
mean the default is wrong, since it quite clearly depends on use-case.
If your relative amounts of rows is low compared to the cost of
sustaining a read-heavy workload, the trade-off is different.

> Cassandra does not measure memory used by index sampling yet, i suspect that
> it will be memory hungry too and can be safely lowered by default i see very
> little difference by changing index sampling from 64 to 512.

Bloom filters and index sampling are the two major contributors to
memory use that scale with the number of rows (and thus typically with
data size). This is known. Index sampling can indeed be significant.

The default is 128 though, not 64. Here again it's a matter of
trade-offs; 512 may have worked for you, but it doesn't mean it's an
appropriate default (I am not arguing for 128 either, I am just saying
that it's more complex than observing that in your particular case you
didn't see a problem with 512). Part of the trade-off is additional
CPU usage implied in streaming and deserializing a larger amount of
data per average sstable index read; part of the trade-off is also
effects on I/O; a sparser index sampling could result in a higher
amount of seeks per index lookup.

> Basic problem with cassandra daily administration which i am currently
> solving is that memory consumption grows with your dataset size. I dont
> really like this design - you put more data in and cluster can OOM. This
> makes cassandra not optimal solution for use in data archiving. It will get
> better after tunable bloom filters will be committed.

That is a good reason for both to be configurable IMO.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: reported bloom filter FP ratio

Posted by Radim Kolar <hs...@sendmail.cz>.

my missunderstanding of FP ratio was based on assumption that ratio is 
counted from node start, while it is getRecentBloomFilterFalseRatio()

 > I don't understand how you reached that conclusion.

On my nodes most memory is consumed by bloom filters. Also 1.0 creates 
larger bloom filters than 0.8 leading to higher memory consumption, i 
just checked few sstables for index to bloom filter ratio on same 
dataset. in 0.8 bloom filters are about 13% of index size and in 1.0, 
its about 16%. Key used in CF is fixed size 4byte integer.

Cassandra does not measure memory used by index sampling yet, i suspect 
that it will be memory hungry too and can be safely lowered by default i 
see very little difference by changing index sampling from 64 to 512.

Basic problem with cassandra daily administration which i am currently 
solving is that memory consumption grows with your dataset size. I dont 
really like this design - you put more data in and cluster can OOM. This 
makes cassandra not optimal solution for use in data archiving. It will 
get better after tunable bloom filters will be committed.

Re: reported bloom filter FP ratio

Posted by Peter Schuller <pe...@infidyne.com>.

> but reported ratio is  Bloom Filter False Ratio: 0.00495 which is higher
> than my computed ratio 0.000145. If you were true than reported ratio should
> be lower then mine computed from CF reads because there are more reads to
> sstables then to CF.

The ratio is the ratio of false positives to true positives *per
sstable*. It's not the amount of false positives in each sstable *per
cf read*. Thus, there is no expectation of higher vs. lower, and the
magnitude of the discrepancy is easily explained by the fact that you
only have 10 false positives. That's not a statistically significant
sample set.

> from investigation of bloom filter FP ratio it seems that default bloom
> filter FP ratio (soon user configurable) should be higher. Hbase defaults to
> 1% cassandra defaults to 0.000744. bloom filters are using quite a bit
> memory now.

I don't understand how you reached that conclusion. There is a direct
trade-off between memory use and false positive hit rate, yes. That
does not mean that hbase's 1% is magically the correct choice.

I definitely think it should be tweakable (and IIRC there's work
happening on a JIRA to make this an option now), but a 1% false
positive hit rate will be completely unacceptable in some
circumstances. In others, perfectly acceptable due to the decrease in
memory use and few reads.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: reported bloom filter FP ratio

Posted by Radim Kolar <hs...@sendmail.cz>.

Dne 25.12.2011 20:58, Peter Schuller napsal(a):
>>                 Read Count: 68844
> [snip]
>> why reported bloom filter FP ratio is not counted like this
>>>>> 10/68844.0
>> 0.00014525594096798558
> Because the read count is total amount of reads to the CF, while the
> bloom filter is per sstable. The number of individual reads to
> sstables will be higher than the number of reads to the CF (unless you
> happen to have exactly one sstable or no rows ever span sstables).
but reported ratio is  Bloom Filter False Ratio: 0.00495 which is higher 
than my computed ratio 0.000145. If you were true than reported ratio 
should be lower then mine computed from CF reads because there are more 
reads to sstables then to CF.

from investigation of bloom filter FP ratio it seems that default bloom 
filter FP ratio (soon user configurable) should be higher. Hbase 
defaults to 1% cassandra defaults to 0.000744. bloom filters are using 
quite a bit memory now.

Re: reported bloom filter FP ratio

Posted by Peter Schuller <pe...@infidyne.com>.

>                Read Count: 68844
[snip]
> why reported bloom filter FP ratio is not counted like this
>>>> 10/68844.0
> 0.00014525594096798558

Because the read count is total amount of reads to the CF, while the
bloom filter is per sstable. The number of individual reads to
sstables will be higher than the number of reads to the CF (unless you
happen to have exactly one sstable or no rows ever span sstables).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)