You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Piush (Jira)" <ji...@apache.org> on 2019/10/26 21:27:00 UTC
[jira] [Created] (CASSANDRA-15378) Data not cleaned up from disk
for SSTables after compaction
Piush created CASSANDRA-15378:
---------------------------------
Summary: Data not cleaned up from disk for SSTables after compaction
Key: CASSANDRA-15378
URL: https://issues.apache.org/jira/browse/CASSANDRA-15378
Project: Cassandra
Issue Type: Improvement
Reporter: Piush
Hello Team,
We have an application where we create data in cf, and delete the data based on the partition key on a frequent basis. We have gc_grace_seconds set to lower value (2 mins) to evict tombstones on the cf.
We are noticing a behaviour where even though the number of records in cf is 0, the data is left back on disk in cassandra data directory for the specific cf.
Size on filesystem for cfs {{subscriber_event_by_id_shadow}}{{, }}{{subscriber_event_shadow}}{{}}
```
{{112M subscriber_event_by_id_shadow-4f08b880f59311e98530a93a5d955b83 129M subscriber_event_shadow-4e7b1e80f59311e98530a93a5d955b83}}
```
we see 0 records on this table
```
cqlsh:apim> select count(*) from subscriber_event_shadow;
*count*
-------
*0*
(1 rows)
Warnings :
Aggregation query used without partition key
cqlsh:apim> select count(*) from subscriber_event_by_id_shadow;
*count*
-------
*0*
(1 rows)
```
Schema for the cfs
```
CREATE TABLE apim.subscriber_event_by_id_shadow (
transaction_id uuid,
shadow_version text,
id uuid,
namespace text,
generated_at timeuuid,
api_version text,
created_at timestamp,
event text,
event_type text,
filter text,
metadata map<text, text>,
name text,
occ_keys list<text>,
operation text,
payload blob,
retries int,
scope text,
shadow boolean,
shadow_id timeuuid,
shadow_metadata map<text, text>,
state text,
summary text,
title text,
type text,
updated_at timestamp,
url text,
PRIMARY KEY (transaction_id, shadow_version, id, namespace, generated_at)
) WITH CLUSTERING ORDER BY (shadow_version ASC, id ASC, namespace ASC, generated_at ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = \{'keys': 'ALL', 'rows_per_partition': '10'}
AND comment = ''
AND compaction = \{'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4', 'tombstone_threshold': '0.1', 'unchecked_tombstone_compaction': 'true'}
AND compression = \{'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 120
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
```
We see gc_grace_seconds set to 120 (2 mins), my understanding is that tombstones should have been evicted and cleaned disk.
However the keyspace has following contents in the file system.
bash-4.2$ cd subscriber_event_shadow-4e7b1e80f59311e98530a93a5d955b83
bash-4.2$ du -sh *
4.0K backups
4.0K mc-102-big-CompressionInfo.db
22M mc-102-big-Data.db
4.0K mc-102-big-Digest.crc32
4.0K mc-102-big-Filter.db
8.0K mc-102-big-Index.db
8.0K mc-102-big-Statistics.db
4.0K mc-102-big-Summary.db
4.0K mc-102-big-TOC.txt
4.0K mc-103-big-CompressionInfo.db
4.5M mc-103-big-Data.db
4.0K mc-103-big-Digest.crc32
4.0K mc-103-big-Filter.db
4.0K mc-103-big-Index.db
8.0K mc-103-big-Statistics.db
4.0K mc-103-big-Summary.db
4.0K mc-103-big-TOC.txt
4.0K mc-104-big-CompressionInfo.db
4.0K mc-104-big-Data.db
4.0K mc-104-big-Digest.crc32
4.0K mc-104-big-Filter.db
4.0K mc-104-big-Index.db
8.0K mc-104-big-Statistics.db
4.0K mc-104-big-Summary.db
4.0K mc-104-big-TOC.txt
8.0K mc-95-big-CompressionInfo.db
52M mc-95-big-Data.db
4.0K mc-95-big-Digest.crc32
4.0K mc-95-big-Filter.db
8.0K mc-95-big-Index.db
8.0K mc-95-big-Statistics.db
4.0K mc-95-big-Summary.db
4.0K mc-95-big-TOC.txt
8.0K mc-96-big-CompressionInfo.db
51M mc-96-big-Data.db
4.0K mc-96-big-Digest.crc32
4.0K mc-96-big-Filter.db
12K mc-96-big-Index.db
8.0K mc-96-big-Statistics.db
4.0K mc-96-big-Summary.db
4.0K mc-96-big-TOC.txt
4.0K snapshots
bash-4.2$
Not able to figure out why we see .db files with 50 MB of data on disk.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org