You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Michael Shuler (JIRA)" <ji...@apache.org> on 2015/09/10 19:57:45 UTC
[jira] [Resolved] (CASSANDRA-10294) Old SSTables lying around

     [ https://issues.apache.org/jira/browse/CASSANDRA-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Shuler resolved CASSANDRA-10294.
----------------------------------------
    Resolution: Not A Problem

Frequent updates means you are creating tombstones of the old data. gc_grace_seconds = 864000 means your old data will remain on disk for at least 10 days before cleanup of deleted data is performed. This value is pretty long by default, so possibly down nodes or data centers will have plenty of time to be brought back online and get those deletes when repaired. If the 10 day default is longer than you would like, you could adjust this for your use case.

Marking this as "not a problem", since this is expected behavior. (mailing list and irc would be good places to discuss cluster behavior and determine if you are seeing a bug that needs a jira ticket :) )

> Old SSTables lying around
> -------------------------
>
>                 Key: CASSANDRA-10294
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10294
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Stand-alone cluster deployed on AWS EC2 instances (Linux)
>            Reporter: Vidur Malik
>         Attachments: Screen Shot 2015-09-09 at 9.32.53 AM.png
>
>
> We're running a Cassandra 2.2.0 cluster with 8 nodes. We are doing frequent updates to our data and we have very few reads, and we are using Leveled Compaction with a sstable_size_in_mb of 160MB. We don't have that much data currently since we're just testing the cluster.
> We are seeing the SSTable count linearly increase (see attached graph, each line is a node in the cluster) even though `nodetool compactionhistory` shows that compactions have definitely run. When I ran nodetool cfstats, I get the following output:
> Table: tender_summaries
> SSTable count: 56
> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
> Does it make sense that there is such a huge difference between the number of SStables in each level and the total count of SStables? It seems like old SSTables are lying around and never cleaned-up/compacted.
> Schema is relatively simple:
> CREATE TABLE IF NOT EXISTS reporting.tender_summaries (
> organization_id uuid,
> date timestamp,
> year int,
> location_id varchar,
> operation_type varchar,
> reference_id varchar,
> field1 int,
> field2 int,
> field3 int,
> field4 int,
> PRIMARY KEY((organization_id, year), location_id, date, operation_type, reference_id)
> ) WITH CLUSTERING ORDER BY (location_id DESC, date DESC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '
> {"keys":"ALL", "rows_per_partition":"NONE"}
> '
> AND comment = ''
> AND compaction =
> {'sstable_size_in_mb': '160', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression =
> {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)