You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Chris Hart <ch...@remilon.com> on 2014/12/17 19:32:33 UTC
High Bloom Filter FP Ratio
Hi,
I have create the following table with bloom_filter_fp_chance=0.01:
CREATE TABLE logged_event (
time_key bigint,
partition_key_randomizer int,
resource_uuid timeuuid,
event_json text,
event_type text,
field_error_list map<text, text>,
javascript_timestamp timestamp,
javascript_uuid uuid,
page_impression_guid uuid,
page_request_guid uuid,
server_received_timestamp timestamp,
session_id bigint,
PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
When I run cfstats, I see a much higher false positive ratio:
Table: logged_event
SSTable count: 15
Space used (live), bytes: 104128214227
Space used (total), bytes: 104129482871
SSTable Compression Ratio: 0.3295840184239226
Number of keys (estimate): 199293952
Memtable cell count: 56364
Memtable data size, bytes: 20903960
Memtable switch count: 148
Local read count: 1396402
Local read latency: 0.362 ms
Local write count: 2345306
Local write latency: 0.062 ms
Pending tasks: 0
Bloom filter false positives: 147705
Bloom filter false ratio: 0.49020
Bloom filter space used, bytes: 249129040
Compacted partition minimum bytes: 447
Compacted partition maximum bytes: 315852
Compacted partition mean bytes: 1636
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Any idea what could be causing this? This is timeseries data. Every time we read from this table, we read a single row key with 1000 partition_key_randomizer values. I'm running cassandra 2.0.11. I tried running an upgradesstables to rewrite them, which didn't change this behavior at all. I'm using size tiered compaction and I haven't done any major compactions.
Thanks,
Chris
Re: High Bloom Filter FP Ratio
Posted by Chris Hart <ch...@remilon.com>.
Hi Tyler,
I tried what you said and false positives look much more reasonable there. Thanks for looking into this.
-Chris
----- Original Message -----
From: "Tyler Hobbs" <ty...@datastax.com>
To: user@cassandra.apache.org
Sent: Friday, December 19, 2014 1:25:29 PM
Subject: Re: High Bloom Filter FP Ratio
I took a look at the code where the bloom filter true/false positive
counters are updated and notice that the true-positive count isn't being
updated on key cache hits:
https://issues.apache.org/jira/browse/CASSANDRA-8525. That may explain
your ratios.
Can you try querying for a few non-existent partition keys in cqlsh with
tracing enabled (just run "TRACING ON") and see if you really do get that
high of a false-positive ratio?
On Fri, Dec 19, 2014 at 9:59 AM, Mark Greene <gr...@gmail.com> wrote:
>
> We're seeing similar behavior except our FP ratio is closer to 1.0 (100%).
>
> We're using Cassandra 2.1.2.
>
>
> Schema
> -----------------------------------------------------------------------
> CREATE TABLE contacts.contact (
> id bigint,
> property_id int,
> created_at bigint,
> updated_at bigint,
> value blob,
> PRIMARY KEY (id, property_id)
> ) WITH CLUSTERING ORDER BY (property_id ASC)
> * AND bloom_filter_fp_chance = 0.001*
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
> CF Stats Output:
> -------------------------------------------------------------------------
> Keyspace: contacts
> Read Count: 2458375
> Read Latency: 0.8528440766766665 ms.
> Write Count: 10357
> Write Latency: 0.1816912233272183 ms.
> Pending Flushes: 0
> Table: contact
> SSTable count: 61
> SSTables in each level: [1, 10, 50, 0, 0, 0, 0, 0, 0]
> Space used (live): 9047112471
> Space used (total): 9047112471
> Space used by snapshots (total): 0
> SSTable Compression Ratio: 0.34119240020241487
> Memtable cell count: 24570
> Memtable data size: 1299614
> Memtable switch count: 2
> Local read count: 2458290
> Local read latency: 0.853 ms
> Local write count: 10044
> Local write latency: 0.186 ms
> Pending flushes: 0
> Bloom filter false positives: 11096
> * Bloom filter false ratio: 0.99197*
> Bloom filter space used: 3923784
> Compacted partition minimum bytes: 373
> Compacted partition maximum bytes: 152321
> Compacted partition mean bytes: 9938
> Average live cells per slice (last five minutes): 37.57851240677983
> Maximum live cells per slice (last five minutes): 63.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
> --
> about.me <http://about.me/markgreene>
>
> On Wed, Dec 17, 2014 at 1:32 PM, Chris Hart <ch...@remilon.com> wrote:
>>
>> Hi,
>>
>> I have create the following table with bloom_filter_fp_chance=0.01:
>>
>> CREATE TABLE logged_event (
>> time_key bigint,
>> partition_key_randomizer int,
>> resource_uuid timeuuid,
>> event_json text,
>> event_type text,
>> field_error_list map<text, text>,
>> javascript_timestamp timestamp,
>> javascript_uuid uuid,
>> page_impression_guid uuid,
>> page_request_guid uuid,
>> server_received_timestamp timestamp,
>> session_id bigint,
>> PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid)
>> ) WITH
>> bloom_filter_fp_chance=0.010000 AND
>> caching='KEYS_ONLY' AND
>> comment='' AND
>> dclocal_read_repair_chance=0.000000 AND
>> gc_grace_seconds=864000 AND
>> index_interval=128 AND
>> read_repair_chance=0.000000 AND
>> replicate_on_write='true' AND
>> populate_io_cache_on_flush='false' AND
>> default_time_to_live=0 AND
>> speculative_retry='99.0PERCENTILE' AND
>> memtable_flush_period_in_ms=0 AND
>> compaction={'class': 'SizeTieredCompactionStrategy'} AND
>> compression={'sstable_compression': 'LZ4Compressor'};
>>
>>
>> When I run cfstats, I see a much higher false positive ratio:
>>
>> Table: logged_event
>> SSTable count: 15
>> Space used (live), bytes: 104128214227
>> Space used (total), bytes: 104129482871
>> SSTable Compression Ratio: 0.3295840184239226
>> Number of keys (estimate): 199293952
>> Memtable cell count: 56364
>> Memtable data size, bytes: 20903960
>> Memtable switch count: 148
>> Local read count: 1396402
>> Local read latency: 0.362 ms
>> Local write count: 2345306
>> Local write latency: 0.062 ms
>> Pending tasks: 0
>> Bloom filter false positives: 147705
>> Bloom filter false ratio: 0.49020
>> Bloom filter space used, bytes: 249129040
>> Compacted partition minimum bytes: 447
>> Compacted partition maximum bytes: 315852
>> Compacted partition mean bytes: 1636
>> Average live cells per slice (last five minutes): 0.0
>> Average tombstones per slice (last five minutes): 0.0
>>
>> Any idea what could be causing this? This is timeseries data. Every
>> time we read from this table, we read a single row key with 1000
>> partition_key_randomizer values. I'm running cassandra 2.0.11. I tried
>> running an upgradesstables to rewrite them, which didn't change this
>> behavior at all. I'm using size tiered compaction and I haven't done any
>> major compactions.
>>
>> Thanks,
>> Chris
>>
>
--
Tyler Hobbs
DataStax <http://datastax.com/>
Re: High Bloom Filter FP Ratio
Posted by Tyler Hobbs <ty...@datastax.com>.
I took a look at the code where the bloom filter true/false positive
counters are updated and notice that the true-positive count isn't being
updated on key cache hits:
https://issues.apache.org/jira/browse/CASSANDRA-8525. That may explain
your ratios.
Can you try querying for a few non-existent partition keys in cqlsh with
tracing enabled (just run "TRACING ON") and see if you really do get that
high of a false-positive ratio?
On Fri, Dec 19, 2014 at 9:59 AM, Mark Greene <gr...@gmail.com> wrote:
>
> We're seeing similar behavior except our FP ratio is closer to 1.0 (100%).
>
> We're using Cassandra 2.1.2.
>
>
> Schema
> -----------------------------------------------------------------------
> CREATE TABLE contacts.contact (
> id bigint,
> property_id int,
> created_at bigint,
> updated_at bigint,
> value blob,
> PRIMARY KEY (id, property_id)
> ) WITH CLUSTERING ORDER BY (property_id ASC)
> * AND bloom_filter_fp_chance = 0.001*
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
> CF Stats Output:
> -------------------------------------------------------------------------
> Keyspace: contacts
> Read Count: 2458375
> Read Latency: 0.8528440766766665 ms.
> Write Count: 10357
> Write Latency: 0.1816912233272183 ms.
> Pending Flushes: 0
> Table: contact
> SSTable count: 61
> SSTables in each level: [1, 10, 50, 0, 0, 0, 0, 0, 0]
> Space used (live): 9047112471
> Space used (total): 9047112471
> Space used by snapshots (total): 0
> SSTable Compression Ratio: 0.34119240020241487
> Memtable cell count: 24570
> Memtable data size: 1299614
> Memtable switch count: 2
> Local read count: 2458290
> Local read latency: 0.853 ms
> Local write count: 10044
> Local write latency: 0.186 ms
> Pending flushes: 0
> Bloom filter false positives: 11096
> * Bloom filter false ratio: 0.99197*
> Bloom filter space used: 3923784
> Compacted partition minimum bytes: 373
> Compacted partition maximum bytes: 152321
> Compacted partition mean bytes: 9938
> Average live cells per slice (last five minutes): 37.57851240677983
> Maximum live cells per slice (last five minutes): 63.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
> --
> about.me <http://about.me/markgreene>
>
> On Wed, Dec 17, 2014 at 1:32 PM, Chris Hart <ch...@remilon.com> wrote:
>>
>> Hi,
>>
>> I have create the following table with bloom_filter_fp_chance=0.01:
>>
>> CREATE TABLE logged_event (
>> time_key bigint,
>> partition_key_randomizer int,
>> resource_uuid timeuuid,
>> event_json text,
>> event_type text,
>> field_error_list map<text, text>,
>> javascript_timestamp timestamp,
>> javascript_uuid uuid,
>> page_impression_guid uuid,
>> page_request_guid uuid,
>> server_received_timestamp timestamp,
>> session_id bigint,
>> PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid)
>> ) WITH
>> bloom_filter_fp_chance=0.010000 AND
>> caching='KEYS_ONLY' AND
>> comment='' AND
>> dclocal_read_repair_chance=0.000000 AND
>> gc_grace_seconds=864000 AND
>> index_interval=128 AND
>> read_repair_chance=0.000000 AND
>> replicate_on_write='true' AND
>> populate_io_cache_on_flush='false' AND
>> default_time_to_live=0 AND
>> speculative_retry='99.0PERCENTILE' AND
>> memtable_flush_period_in_ms=0 AND
>> compaction={'class': 'SizeTieredCompactionStrategy'} AND
>> compression={'sstable_compression': 'LZ4Compressor'};
>>
>>
>> When I run cfstats, I see a much higher false positive ratio:
>>
>> Table: logged_event
>> SSTable count: 15
>> Space used (live), bytes: 104128214227
>> Space used (total), bytes: 104129482871
>> SSTable Compression Ratio: 0.3295840184239226
>> Number of keys (estimate): 199293952
>> Memtable cell count: 56364
>> Memtable data size, bytes: 20903960
>> Memtable switch count: 148
>> Local read count: 1396402
>> Local read latency: 0.362 ms
>> Local write count: 2345306
>> Local write latency: 0.062 ms
>> Pending tasks: 0
>> Bloom filter false positives: 147705
>> Bloom filter false ratio: 0.49020
>> Bloom filter space used, bytes: 249129040
>> Compacted partition minimum bytes: 447
>> Compacted partition maximum bytes: 315852
>> Compacted partition mean bytes: 1636
>> Average live cells per slice (last five minutes): 0.0
>> Average tombstones per slice (last five minutes): 0.0
>>
>> Any idea what could be causing this? This is timeseries data. Every
>> time we read from this table, we read a single row key with 1000
>> partition_key_randomizer values. I'm running cassandra 2.0.11. I tried
>> running an upgradesstables to rewrite them, which didn't change this
>> behavior at all. I'm using size tiered compaction and I haven't done any
>> major compactions.
>>
>> Thanks,
>> Chris
>>
>
--
Tyler Hobbs
DataStax <http://datastax.com/>
Re: High Bloom Filter FP Ratio
Posted by Mark Greene <gr...@gmail.com>.
We're seeing similar behavior except our FP ratio is closer to 1.0 (100%).
We're using Cassandra 2.1.2.
Schema
-----------------------------------------------------------------------
CREATE TABLE contacts.contact (
id bigint,
property_id int,
created_at bigint,
updated_at bigint,
value blob,
PRIMARY KEY (id, property_id)
) WITH CLUSTERING ORDER BY (property_id ASC)
* AND bloom_filter_fp_chance = 0.001*
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CF Stats Output:
-------------------------------------------------------------------------
Keyspace: contacts
Read Count: 2458375
Read Latency: 0.8528440766766665 ms.
Write Count: 10357
Write Latency: 0.1816912233272183 ms.
Pending Flushes: 0
Table: contact
SSTable count: 61
SSTables in each level: [1, 10, 50, 0, 0, 0, 0, 0, 0]
Space used (live): 9047112471
Space used (total): 9047112471
Space used by snapshots (total): 0
SSTable Compression Ratio: 0.34119240020241487
Memtable cell count: 24570
Memtable data size: 1299614
Memtable switch count: 2
Local read count: 2458290
Local read latency: 0.853 ms
Local write count: 10044
Local write latency: 0.186 ms
Pending flushes: 0
Bloom filter false positives: 11096
* Bloom filter false ratio: 0.99197*
Bloom filter space used: 3923784
Compacted partition minimum bytes: 373
Compacted partition maximum bytes: 152321
Compacted partition mean bytes: 9938
Average live cells per slice (last five minutes): 37.57851240677983
Maximum live cells per slice (last five minutes): 63.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0
--
about.me <http://about.me/markgreene>
On Wed, Dec 17, 2014 at 1:32 PM, Chris Hart <ch...@remilon.com> wrote:
>
> Hi,
>
> I have create the following table with bloom_filter_fp_chance=0.01:
>
> CREATE TABLE logged_event (
> time_key bigint,
> partition_key_randomizer int,
> resource_uuid timeuuid,
> event_json text,
> event_type text,
> field_error_list map<text, text>,
> javascript_timestamp timestamp,
> javascript_uuid uuid,
> page_impression_guid uuid,
> page_request_guid uuid,
> server_received_timestamp timestamp,
> session_id bigint,
> PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid)
> ) WITH
> bloom_filter_fp_chance=0.010000 AND
> caching='KEYS_ONLY' AND
> comment='' AND
> dclocal_read_repair_chance=0.000000 AND
> gc_grace_seconds=864000 AND
> index_interval=128 AND
> read_repair_chance=0.000000 AND
> replicate_on_write='true' AND
> populate_io_cache_on_flush='false' AND
> default_time_to_live=0 AND
> speculative_retry='99.0PERCENTILE' AND
> memtable_flush_period_in_ms=0 AND
> compaction={'class': 'SizeTieredCompactionStrategy'} AND
> compression={'sstable_compression': 'LZ4Compressor'};
>
>
> When I run cfstats, I see a much higher false positive ratio:
>
> Table: logged_event
> SSTable count: 15
> Space used (live), bytes: 104128214227
> Space used (total), bytes: 104129482871
> SSTable Compression Ratio: 0.3295840184239226
> Number of keys (estimate): 199293952
> Memtable cell count: 56364
> Memtable data size, bytes: 20903960
> Memtable switch count: 148
> Local read count: 1396402
> Local read latency: 0.362 ms
> Local write count: 2345306
> Local write latency: 0.062 ms
> Pending tasks: 0
> Bloom filter false positives: 147705
> Bloom filter false ratio: 0.49020
> Bloom filter space used, bytes: 249129040
> Compacted partition minimum bytes: 447
> Compacted partition maximum bytes: 315852
> Compacted partition mean bytes: 1636
> Average live cells per slice (last five minutes): 0.0
> Average tombstones per slice (last five minutes): 0.0
>
> Any idea what could be causing this? This is timeseries data. Every time
> we read from this table, we read a single row key with 1000
> partition_key_randomizer values. I'm running cassandra 2.0.11. I tried
> running an upgradesstables to rewrite them, which didn't change this
> behavior at all. I'm using size tiered compaction and I haven't done any
> major compactions.
>
> Thanks,
> Chris
>