You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Riccardo Ferrari <fe...@gmail.com> on 2016/07/07 13:49:48 UTC

DTCS SSTable count issue

Hi everyone,

This is my first question, apologize may I do something wrong.

I have a small Cassandra cluster build upon 3 nodes. Originally born as
2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4
recently 3.0.6. Ubuntu is the OS.

There are few tables that have DateTieredCompactionStrategy and are
suffering of constantly growing SSTable count. I have the feeling this has
something to do with the upgrade however I need some hint on how to debug
this issue.

Tables are created like:
CREATE TABLE <table> (
 ...
PRIMARY KEY (...)
) WITH CLUSTERING ORDER BY (...)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class':
'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 7776000
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

and this is the "nodetool cfstats" output for that table:
Read Count: 39
Read Latency: 85.03307692307692 ms.
Write Count: 9845275
Write Latency: 0.09604882382665797 ms.
Pending Flushes: 0
Table: <table>
SSTable count: 48
Space used (live): 19566109394
Space used (total): 19566109394
Space used by snapshots (total): 109796505570
Off heap memory used (total): 11317941
SSTable Compression Ratio: 0.22632301701483284
Number of keys (estimate): 2557
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 828
Local read count: 39
Local read latency: 93.051 ms
Local write count: 9845275
Local write latency: 0.106 ms
Pending flushes: 0
Bloom filter false positives: 2
Bloom filter false ratio: 0.00000
Bloom filter space used: 10200
Bloom filter off heap memory used: 9816
Index summary off heap memory used: 4677
Compression metadata off heap memory used: 11303448
Compacted partition minimum bytes: 150
Compacted partition maximum bytes: 4139110981
Compacted partition mean bytes: 13463937
Average live cells per slice (last five minutes): 59.69230769230769
Maximum live cells per slice (last five minutes): 149
Average tombstones per slice (last five minutes): 8.564102564102564
Maximum tombstones per slice (last five minutes): 42

According to the "nodetool compactionhistory <keyspace>.<table>"
the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"
and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)

However the table count is still very high compared to tables that have a
different compaction strategy. If I run a "nodetool compact <table>" the
SSTable count decrease dramatically to a reasonable number.
I read many articles including:
http://www.datastax.com/dev/blog/datetieredcompactionstrategy however I can
not really tell if this is an expected behavior.
What concerns me is that I have an high tombstone read count despite those
are insert only tables. Compacting the table make the tombstone issue
disappear. Yes, we are using TTL to expire data after 3 months and I have
not touch the GC grace period.
Looking at the file system I see the very first *-Data.db file that is 15GB
then there are all the other 43 *-Data.db files that are ranging from 50 to
150MB in size.

How can I debug this mis-compaction issue? Any help is much appreciated
Best,

Re: DTCS SSTable count issue

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

>
> The tombstone compaction options basically do this for you for the right
> settings (unchecked tombstone compaction = true, set threshold to 85% or
> so, don’t try to get clever and set it to something very close to 99%, the
> estimated tombstone ratio isn’t that accurate)
>
>
>

True. I have been a bit inaccurate here. I used this because tombstone
compaction were not happening (probably because there always were some
automatic compaction going on and some other pending). I heard tombstone
compactions have a lower priority, I never actually checked, but UDC worked
for my case.

Still Jeff and I agree on that, try this first:

Did you give a try to the unchecked_tombstone_compaction as well
> (compaction options at the table level)? Feel free to set this one to true.
> I think it could be the default. It is safe as long as your machines have
> some more resources available (not that much). That's the first thing I
> would do.


If you don't have compactions pending this should be fine. Sorry for the
confusion brought by the extra info without the corresponding details.

Worth mentioning that you can NOT disable blocking read repair which comes
> naturally if you use CL > ONE.


+1

About the read repair, Am I correct in thinking that the read repair in
> controlled by both options: 'read_repair_chance' and 'dclocal_read_repair_chance'.
> If that is the case I see that I still have read repair turned on...


So yes, both (one is local to the DC, the other one is cross DC) but also
consider your CL, as mentioned by Jeff.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-07-11 22:38 GMT+02:00 Jeff Jirsa <je...@crowdstrike.com>:

> DTCS is deprecated in favor of TWCS in new versions, yes.
>
>
>
> Worth mentioning that you can NOT disable blocking read repair which comes
> naturally if you use CL > ONE.
>
>
>
> >  Also instead of major compactions (which comes with its set of issues
> / tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
>
>
> The tombstone compaction options basically do this for you for the right
> settings (unchecked tombstone compaction = true, set threshold to 85% or
> so, don’t try to get clever and set it to something very close to 99%, the
> estimated tombstone ratio isn’t that accurate)
>
>
>
> -          Jeff
>
>
>
>
>
> *From: *Alain RODRIGUEZ <ar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Monday, July 11, 2016 at 1:05 PM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *Re: DTCS SSTable count issue
>
>
>
> @Jeff
>
>
>
> Rather than being an alternative, isn't your compaction strategy going to
> deprecate (and finally replace) DTCS ? That was my understanding from the
> ticket CASSANDRA-9666.
>
> @Riccardo
>
>
>
> If you are interested in TWCS from Jeff, I believe it has been introduced
> in 3.0.8 actually, not 3.0.7
> https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_cassandra-2D3.0_CHANGES.txt-23L28&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug&s=AqctrVapUKAr-AuBiB520RaDRjkh0YQcR-Ze4CPQWIw&e=>.
> Anyway, you can use it in any recent version as compactions strategies are
> pluggable.
>
>
>
> What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>
>
>
> I observed the same issue recently and I am confident that TWCS will solve
> this tombstone issue, but it is not tested on my side so far. Meanwhile, be
> sure you have disabled any "read repair" on tables using DTCS and maybe
> hints as well. It is a hard decision to take as you'll loose 2 out of 3
> anti entropy systems, but DTCS behaves badly with those options turned on
> (TWCS is fine with it). The last anti-entropy being a full repair that you
> might already not be running as you only do inserts...
>
>
>
> Also instead of major compactions (which comes with its set of issues /
> tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
> du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du
> -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs
> sstablemetadata | grep tombstones
>
> And something like this to run a user defined compaction on the ones you
> chose (big sstable with high tombstone ratio):
>
> echo "run -b org.apache.cassandra.db:type=CompactionManager
> forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar
> jmxterm-version.jar -l <ip>:<jmx_port>
>
> *note:* you have to download jmxterm (or use any other jmx tool).
>
>
>
> Did you give a try to the unchecked_tombstone_compaction as well
> (compaction options at the table level)? Feel free to set this one to true.
> I think it could be the default. It is safe as long as your machines have
> some more resources available (not that much). That's the first thing I
> would do.
>
>
>
> Also if you use TTL only, feel free to reduce the gc_grace_seconds, this
> will probably help having tombstones removed. I would start with other
> solutions first. Keep in mind that if someday you perform deletes, this
> setting could produce you some Zombies (data coming back), if you don't run
> repair in the gc_grace_seconds for the entire ring.
>
> C*heers,
>
> -----------------------
>
> Alain Rodriguez - alain@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug&s=7arVRTINYZivmy46OVP376O-ZUbNV6Z5uUs1ROprAD4&e=>
>
>
>
> 2016-07-07 19:25 GMT+02:00 Jeff Jirsa <je...@crowdstrike.com>:
>
> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow
> over time, but ideally data will expire as it nears your 90 day TTL and
> those tables should start dropping away as they age.
>
>
>
> 3.0.7 introduces an alternative to DTCS you may find easier to use called
> TWCS. It will almost certainly help address the growing sstable count.
>
>
>
>
>
>
>
> *From: *Riccardo Ferrari <fe...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Thursday, July 7, 2016 at 6:49 AM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *DTCS SSTable count issue
>
>
>
> Hi everyone,
>
>
>
> This is my first question, apologize may I do something wrong.
>
>
>
> I have a small Cassandra cluster build upon 3 nodes. Originally born as
> 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4
> recently 3.0.6. Ubuntu is the OS.
>
>
>
> There are few tables that have DateTieredCompactionStrategy and are
> suffering of constantly growing SSTable count. I have the feeling this has
> something to do with the upgrade however I need some hint on how to debug
> this issue.
>
>
>
> Tables are created like:
>
> CREATE TABLE <table> (
>
>  ...
>
> PRIMARY KEY (...)
>
> ) WITH CLUSTERING ORDER BY (...)
>
>     AND bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 7776000
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
>
>
>
> and this is the "nodetool cfstats" output for that table:
>
> Read Count: 39
>
> Read Latency: 85.03307692307692 ms.
>
> Write Count: 9845275
>
> Write Latency: 0.09604882382665797 ms.
>
> Pending Flushes: 0
>
> Table: <table>
>
> SSTable count: 48
>
> Space used (live): 19566109394
>
> Space used (total): 19566109394
>
> Space used by snapshots (total): 109796505570
>
> Off heap memory used (total): 11317941
>
> SSTable Compression Ratio: 0.22632301701483284
>
> Number of keys (estimate): 2557
>
> Memtable cell count: 0
>
> Memtable data size: 0
>
> Memtable off heap memory used: 0
>
> Memtable switch count: 828
>
> Local read count: 39
>
> Local read latency: 93.051 ms
>
> Local write count: 9845275
>
> Local write latency: 0.106 ms
>
> Pending flushes: 0
>
> Bloom filter false positives: 2
>
> Bloom filter false ratio: 0.00000
>
> Bloom filter space used: 10200
>
> Bloom filter off heap memory used: 9816
>
> Index summary off heap memory used: 4677
>
> Compression metadata off heap memory used: 11303448
>
> Compacted partition minimum bytes: 150
>
> Compacted partition maximum bytes: 4139110981
>
> Compacted partition mean bytes: 13463937
>
> Average live cells per slice (last five minutes): 59.69230769230769
>
> Maximum live cells per slice (last five minutes): 149
>
> Average tombstones per slice (last five minutes): 8.564102564102564
>
> Maximum tombstones per slice (last five minutes): 42
>
>
>
> According to the "nodetool compactionhistory <keyspace>.<table>"
>
> the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"
>
> and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)
>
>
>
> However the table count is still very high compared to tables that have a
> different compaction strategy. If I run a "nodetool compact <table>" the
> SSTable count decrease dramatically to a reasonable number.
>
> I read many articles including:
> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_datetieredcompactionstrategy&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=35ADGtvp3nLmSgTuemeQ5e3RIubiM_mbcWLyBbv6DEo&s=_1xjcAR70HQlYtx4geGugprQxrSNw2EaiSjeSWm2CJ4&e=>
> however I can not really tell if this is an expected behavior.
>
> What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>
> Looking at the file system I see the very first *-Data.db file that is
> 15GB then there are all the other 43 *-Data.db files that are ranging from
> 50 to 150MB in size.
>
>
>
> How can I debug this mis-compaction issue? Any help is much appreciated
>
> Best,
>
>
>

Re: DTCS SSTable count issue

Posted by "Jason J. W. Williams" <ja...@gmail.com>.

I can vouch for TWCS...we switched from DTCS to TWCS using Jeff's plugin w/
Cassandra 3.0.5 and just upgraded to 3.0.8 today and switched over to the
built-in version of TWCS.

-J

On Mon, Jul 11, 2016 at 1:38 PM, Jeff Jirsa <je...@crowdstrike.com>
wrote:

> DTCS is deprecated in favor of TWCS in new versions, yes.
>
>
>
> Worth mentioning that you can NOT disable blocking read repair which comes
> naturally if you use CL > ONE.
>
>
>
> >  Also instead of major compactions (which comes with its set of issues
> / tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
>
>
> The tombstone compaction options basically do this for you for the right
> settings (unchecked tombstone compaction = true, set threshold to 85% or
> so, don’t try to get clever and set it to something very close to 99%, the
> estimated tombstone ratio isn’t that accurate)
>
>
>
> -          Jeff
>
>
>
>
>
> *From: *Alain RODRIGUEZ <ar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Monday, July 11, 2016 at 1:05 PM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *Re: DTCS SSTable count issue
>
>
>
> @Jeff
>
>
>
> Rather than being an alternative, isn't your compaction strategy going to
> deprecate (and finally replace) DTCS ? That was my understanding from the
> ticket CASSANDRA-9666.
>
> @Riccardo
>
>
>
> If you are interested in TWCS from Jeff, I believe it has been introduced
> in 3.0.8 actually, not 3.0.7
> https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_cassandra-2D3.0_CHANGES.txt-23L28&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug&s=AqctrVapUKAr-AuBiB520RaDRjkh0YQcR-Ze4CPQWIw&e=>.
> Anyway, you can use it in any recent version as compactions strategies are
> pluggable.
>
>
>
> What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>
>
>
> I observed the same issue recently and I am confident that TWCS will solve
> this tombstone issue, but it is not tested on my side so far. Meanwhile, be
> sure you have disabled any "read repair" on tables using DTCS and maybe
> hints as well. It is a hard decision to take as you'll loose 2 out of 3
> anti entropy systems, but DTCS behaves badly with those options turned on
> (TWCS is fine with it). The last anti-entropy being a full repair that you
> might already not be running as you only do inserts...
>
>
>
> Also instead of major compactions (which comes with its set of issues /
> tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
> du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du
> -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs
> sstablemetadata | grep tombstones
>
> And something like this to run a user defined compaction on the ones you
> chose (big sstable with high tombstone ratio):
>
> echo "run -b org.apache.cassandra.db:type=CompactionManager
> forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar
> jmxterm-version.jar -l <ip>:<jmx_port>
>
> *note:* you have to download jmxterm (or use any other jmx tool).
>
>
>
> Did you give a try to the unchecked_tombstone_compaction as well
> (compaction options at the table level)? Feel free to set this one to true.
> I think it could be the default. It is safe as long as your machines have
> some more resources available (not that much). That's the first thing I
> would do.
>
>
>
> Also if you use TTL only, feel free to reduce the gc_grace_seconds, this
> will probably help having tombstones removed. I would start with other
> solutions first. Keep in mind that if someday you perform deletes, this
> setting could produce you some Zombies (data coming back), if you don't run
> repair in the gc_grace_seconds for the entire ring.
>
> C*heers,
>
> -----------------------
>
> Alain Rodriguez - alain@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug&s=7arVRTINYZivmy46OVP376O-ZUbNV6Z5uUs1ROprAD4&e=>
>
>
>
> 2016-07-07 19:25 GMT+02:00 Jeff Jirsa <je...@crowdstrike.com>:
>
> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow
> over time, but ideally data will expire as it nears your 90 day TTL and
> those tables should start dropping away as they age.
>
>
>
> 3.0.7 introduces an alternative to DTCS you may find easier to use called
> TWCS. It will almost certainly help address the growing sstable count.
>
>
>
>
>
>
>
> *From: *Riccardo Ferrari <fe...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Thursday, July 7, 2016 at 6:49 AM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *DTCS SSTable count issue
>
>
>
> Hi everyone,
>
>
>
> This is my first question, apologize may I do something wrong.
>
>
>
> I have a small Cassandra cluster build upon 3 nodes. Originally born as
> 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4
> recently 3.0.6. Ubuntu is the OS.
>
>
>
> There are few tables that have DateTieredCompactionStrategy and are
> suffering of constantly growing SSTable count. I have the feeling this has
> something to do with the upgrade however I need some hint on how to debug
> this issue.
>
>
>
> Tables are created like:
>
> CREATE TABLE <table> (
>
>  ...
>
> PRIMARY KEY (...)
>
> ) WITH CLUSTERING ORDER BY (...)
>
>     AND bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 7776000
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
>
>
>
> and this is the "nodetool cfstats" output for that table:
>
> Read Count: 39
>
> Read Latency: 85.03307692307692 ms.
>
> Write Count: 9845275
>
> Write Latency: 0.09604882382665797 ms.
>
> Pending Flushes: 0
>
> Table: <table>
>
> SSTable count: 48
>
> Space used (live): 19566109394
>
> Space used (total): 19566109394
>
> Space used by snapshots (total): 109796505570
>
> Off heap memory used (total): 11317941
>
> SSTable Compression Ratio: 0.22632301701483284
>
> Number of keys (estimate): 2557
>
> Memtable cell count: 0
>
> Memtable data size: 0
>
> Memtable off heap memory used: 0
>
> Memtable switch count: 828
>
> Local read count: 39
>
> Local read latency: 93.051 ms
>
> Local write count: 9845275
>
> Local write latency: 0.106 ms
>
> Pending flushes: 0
>
> Bloom filter false positives: 2
>
> Bloom filter false ratio: 0.00000
>
> Bloom filter space used: 10200
>
> Bloom filter off heap memory used: 9816
>
> Index summary off heap memory used: 4677
>
> Compression metadata off heap memory used: 11303448
>
> Compacted partition minimum bytes: 150
>
> Compacted partition maximum bytes: 4139110981
>
> Compacted partition mean bytes: 13463937
>
> Average live cells per slice (last five minutes): 59.69230769230769
>
> Maximum live cells per slice (last five minutes): 149
>
> Average tombstones per slice (last five minutes): 8.564102564102564
>
> Maximum tombstones per slice (last five minutes): 42
>
>
>
> According to the "nodetool compactionhistory <keyspace>.<table>"
>
> the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"
>
> and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)
>
>
>
> However the table count is still very high compared to tables that have a
> different compaction strategy. If I run a "nodetool compact <table>" the
> SSTable count decrease dramatically to a reasonable number.
>
> I read many articles including:
> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_datetieredcompactionstrategy&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=35ADGtvp3nLmSgTuemeQ5e3RIubiM_mbcWLyBbv6DEo&s=_1xjcAR70HQlYtx4geGugprQxrSNw2EaiSjeSWm2CJ4&e=>
> however I can not really tell if this is an expected behavior.
>
> What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>
> Looking at the file system I see the very first *-Data.db file that is
> 15GB then there are all the other 43 *-Data.db files that are ranging from
> 50 to 150MB in size.
>
>
>
> How can I debug this mis-compaction issue? Any help is much appreciated
>
> Best,
>
>
>

Re: DTCS SSTable count issue

Posted by Jeff Jirsa <je...@crowdstrike.com>.

DTCS is deprecated in favor of TWCS in new versions, yes. 

 

Worth mentioning that you can NOT disable blocking read repair which comes naturally if you use CL > ONE. 

 

>  Also instead of major compactions (which comes with its set of issues / tradeoffs too) you can think of a script smartly using sstablemetadata to find the sstables holding too much tombstones and running single SSTable compactions on them through JMX and user defined compactions. Meanwhile if you want to do it manually, you could do it with something like this to know the tombstone ratio from the biggest sstable:

 

The tombstone compaction options basically do this for you for the right settings (unchecked tombstone compaction = true, set threshold to 85% or so, don’t try to get clever and set it to something very close to 99%, the estimated tombstone ratio isn’t that accurate)

 

-          Jeff

 

 

From: Alain RODRIGUEZ <ar...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Monday, July 11, 2016 at 1:05 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: DTCS SSTable count issue

 

@Jeff 

 

Rather than being an alternative, isn't your compaction strategy going to deprecate (and finally replace) DTCS ? That was my understanding from the ticket CASSANDRA-9666.

@Riccardo 

 

If you are interested in TWCS from Jeff, I believe it has been introduced in 3.0.8 actually, not 3.0.7 https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28. Anyway, you can use it in any recent version as compactions strategies are pluggable. 

 

What concerns me is that I have an high tombstone read count despite those are insert only tables. Compacting the table make the tombstone issue disappear. Yes, we are using TTL to expire data after 3 months and I have not touch the GC grace period.

 

I observed the same issue recently and I am confident that TWCS will solve this tombstone issue, but it is not tested on my side so far. Meanwhile, be sure you have disabled any "read repair" on tables using DTCS and maybe hints as well. It is a hard decision to take as you'll loose 2 out of 3 anti entropy systems, but DTCS behaves badly with those options turned on (TWCS is fine with it). The last anti-entropy being a full repair that you might already not be running as you only do inserts...

 

Also instead of major compactions (which comes with its set of issues / tradeoffs too) you can think of a script smartly using sstablemetadata to find the sstables holding too much tombstones and running single SSTable compactions on them through JMX and user defined compactions. Meanwhile if you want to do it manually, you could do it with something like this to know the tombstone ratio from the biggest sstable:

du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs sstablemetadata | grep tombstones

And something like this to run a user defined compaction on the ones you chose (big sstable with high tombstone ratio):

echo "run -b org.apache.cassandra.db:type=CompactionManager forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar jmxterm-version.jar -l <ip>:<jmx_port>

note: you have to download jmxterm (or use any other jmx tool).

 

Did you give a try to the unchecked_tombstone_compaction as well (compaction options at the table level)? Feel free to set this one to true. I think it could be the default. It is safe as long as your machines have some more resources available (not that much). That's the first thing I would do.

 

Also if you use TTL only, feel free to reduce the gc_grace_seconds, this will probably help having tombstones removed. I would start with other solutions first. Keep in mind that if someday you perform deletes, this setting could produce you some Zombies (data coming back), if you don't run repair in the gc_grace_seconds for the entire ring.

C*heers,

-----------------------

Alain Rodriguez - alain@thelastpickle.com

France

 

The Last Pickle - Apache Cassandra Consulting

http://www.thelastpickle.com

 

2016-07-07 19:25 GMT+02:00 Jeff Jirsa <je...@crowdstrike.com>:

48 sstables isn’t unreasonable in a DTCS table. It will continue to grow over time, but ideally data will expire as it nears your 90 day TTL and those tables should start dropping away as they age.

 

3.0.7 introduces an alternative to DTCS you may find easier to use called TWCS. It will almost certainly help address the growing sstable count.  

 

 

 

From: Riccardo Ferrari <fe...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Thursday, July 7, 2016 at 6:49 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: DTCS SSTable count issue

 

Hi everyone, 

 

This is my first question, apologize may I do something wrong.

 

I have a small Cassandra cluster build upon 3 nodes. Originally born as 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4 recently 3.0.6. Ubuntu is the OS.

 

There are few tables that have DateTieredCompactionStrategy and are suffering of constantly growing SSTable count. I have the feeling this has something to do with the upgrade however I need some hint on how to debug this issue.

 

Tables are created like:

CREATE TABLE <table> (

 ...

PRIMARY KEY (...)

) WITH CLUSTERING ORDER BY (...)

    AND bloom_filter_fp_chance = 0.01

    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance = 1.0

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 7776000

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99PERCENTILE';

 

and this is the "nodetool cfstats" output for that table:

Read Count: 39

Read Latency: 85.03307692307692 ms.

Write Count: 9845275

Write Latency: 0.09604882382665797 ms.

Pending Flushes: 0

Table: <table>

SSTable count: 48

Space used (live): 19566109394

Space used (total): 19566109394

Space used by snapshots (total): 109796505570

Off heap memory used (total): 11317941

SSTable Compression Ratio: 0.22632301701483284

Number of keys (estimate): 2557

Memtable cell count: 0

Memtable data size: 0

Memtable off heap memory used: 0

Memtable switch count: 828

Local read count: 39

Local read latency: 93.051 ms

Local write count: 9845275

Local write latency: 0.106 ms

Pending flushes: 0

Bloom filter false positives: 2

Bloom filter false ratio: 0.00000

Bloom filter space used: 10200

Bloom filter off heap memory used: 9816

Index summary off heap memory used: 4677

Compression metadata off heap memory used: 11303448

Compacted partition minimum bytes: 150

Compacted partition maximum bytes: 4139110981

Compacted partition mean bytes: 13463937

Average live cells per slice (last five minutes): 59.69230769230769

Maximum live cells per slice (last five minutes): 149

Average tombstones per slice (last five minutes): 8.564102564102564

Maximum tombstones per slice (last five minutes): 42

 

According to the "nodetool compactionhistory <keyspace>.<table>"

the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"

and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)

 

However the table count is still very high compared to tables that have a different compaction strategy. If I run a "nodetool compact <table>" the SSTable count decrease dramatically to a reasonable number.

I read many articles including: http://www.datastax.com/dev/blog/datetieredcompactionstrategy however I can not really tell if this is an expected behavior.

What concerns me is that I have an high tombstone read count despite those are insert only tables. Compacting the table make the tombstone issue disappear. Yes, we are using TTL to expire data after 3 months and I have not touch the GC grace period.

Looking at the file system I see the very first *-Data.db file that is 15GB then there are all the other 43 *-Data.db files that are ranging from 50 to 150MB in size.

 

How can I debug this mis-compaction issue? Any help is much appreciated

Best,

Re: DTCS SSTable count issue

Posted by Riccardo Ferrari <fe...@gmail.com>.

@Alain, @Jeff

Thank you very much for your time. I really appreciate it!

Yes I found many posts/hints about TWCS, definitely look very promising. I
understand correctly that I can swap compaction strategy without any major
concern, right?

About the read repair, Am I correct in thinking that the read repair in
controlled by both options: 'read_repair_chance' and
'dclocal_read_repair_chance'.
If that is the case I see that I still have read repair turned on...

Best!

On Mon, Jul 11, 2016 at 10:05 PM, Alain RODRIGUEZ <ar...@gmail.com>
wrote:

> @Jeff
>
> Rather than being an alternative, isn't your compaction strategy going to
> deprecate (and finally replace) DTCS ? That was my understanding from the
> ticket CASSANDRA-9666.
>
> @Riccardo
>
> If you are interested in TWCS from Jeff, I believe it has been introduced
> in 3.0.8 actually, not 3.0.7
> https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28.
> Anyway, you can use it in any recent version as compactions strategies are
> pluggable.
>
> What concerns me is that I have an high tombstone read count despite those
>> are insert only tables. Compacting the table make the tombstone issue
>> disappear. Yes, we are using TTL to expire data after 3 months and I have
>> not touch the GC grace period.
>>
>
> I observed the same issue recently and I am confident that TWCS will solve
> this tombstone issue, but it is not tested on my side so far. Meanwhile, be
> sure you have disabled any "read repair" on tables using DTCS and maybe
> hints as well. It is a hard decision to take as you'll loose 2 out of 3
> anti entropy systems, but DTCS behaves badly with those options turned on
> (TWCS is fine with it). The last anti-entropy being a full repair that you
> might already not be running as you only do inserts...
>
> Also instead of major compactions (which comes with its set of issues /
> tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
> du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du
> -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs
> sstablemetadata | grep tombstones
> And something like this to run a user defined compaction on the ones you
> chose (big sstable with high tombstone ratio):
>
> echo "run -b org.apache.cassandra.db:type=CompactionManager
> forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar
> jmxterm-version.jar -l <ip>:<jmx_port>
>
> *note:* you have to download jmxterm (or use any other jmx tool).
>
>
> Did you give a try to the unchecked_tombstone_compaction as well
> (compaction options at the table level)? Feel free to set this one to true.
> I think it could be the default. It is safe as long as your machines have
> some more resources available (not that much). That's the first thing I
> would do.
>
>
> Also if you use TTL only, feel free to reduce the gc_grace_seconds, this
> will probably help having tombstones removed. I would start with other
> solutions first. Keep in mind that if someday you perform deletes, this
> setting could produce you some Zombies (data coming back), if you don't run
> repair in the gc_grace_seconds for the entire ring.
>
> C*heers,
>
> -----------------------
>
> Alain Rodriguez - alain@thelastpickle.com
>
> France
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
> 2016-07-07 19:25 GMT+02:00 Jeff Jirsa <je...@crowdstrike.com>:
>
>> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow
>> over time, but ideally data will expire as it nears your 90 day TTL and
>> those tables should start dropping away as they age.
>>
>>
>>
>> 3.0.7 introduces an alternative to DTCS you may find easier to use called
>> TWCS. It will almost certainly help address the growing sstable count.
>>
>>
>>
>>
>>
>>
>>
>> *From: *Riccardo Ferrari <fe...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Thursday, July 7, 2016 at 6:49 AM
>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Subject: *DTCS SSTable count issue
>>
>>
>>
>> Hi everyone,
>>
>>
>>
>> This is my first question, apologize may I do something wrong.
>>
>>
>>
>> I have a small Cassandra cluster build upon 3 nodes. Originally born as
>> 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4
>> recently 3.0.6. Ubuntu is the OS.
>>
>>
>>
>> There are few tables that have DateTieredCompactionStrategy and are
>> suffering of constantly growing SSTable count. I have the feeling this has
>> something to do with the upgrade however I need some hint on how to debug
>> this issue.
>>
>>
>>
>> Tables are created like:
>>
>> CREATE TABLE <table> (
>>
>>  ...
>>
>> PRIMARY KEY (...)
>>
>> ) WITH CLUSTERING ORDER BY (...)
>>
>>     AND bloom_filter_fp_chance = 0.01
>>
>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>
>>     AND comment = ''
>>
>>     AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>>
>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>
>>     AND crc_check_chance = 1.0
>>
>>     AND dclocal_read_repair_chance = 0.1
>>
>>     AND default_time_to_live = 7776000
>>
>>     AND gc_grace_seconds = 864000
>>
>>     AND max_index_interval = 2048
>>
>>     AND memtable_flush_period_in_ms = 0
>>
>>     AND min_index_interval = 128
>>
>>     AND read_repair_chance = 0.0
>>
>>     AND speculative_retry = '99PERCENTILE';
>>
>>
>>
>> and this is the "nodetool cfstats" output for that table:
>>
>> Read Count: 39
>>
>> Read Latency: 85.03307692307692 ms.
>>
>> Write Count: 9845275
>>
>> Write Latency: 0.09604882382665797 ms.
>>
>> Pending Flushes: 0
>>
>> Table: <table>
>>
>> SSTable count: 48
>>
>> Space used (live): 19566109394
>>
>> Space used (total): 19566109394
>>
>> Space used by snapshots (total): 109796505570
>>
>> Off heap memory used (total): 11317941
>>
>> SSTable Compression Ratio: 0.22632301701483284
>>
>> Number of keys (estimate): 2557
>>
>> Memtable cell count: 0
>>
>> Memtable data size: 0
>>
>> Memtable off heap memory used: 0
>>
>> Memtable switch count: 828
>>
>> Local read count: 39
>>
>> Local read latency: 93.051 ms
>>
>> Local write count: 9845275
>>
>> Local write latency: 0.106 ms
>>
>> Pending flushes: 0
>>
>> Bloom filter false positives: 2
>>
>> Bloom filter false ratio: 0.00000
>>
>> Bloom filter space used: 10200
>>
>> Bloom filter off heap memory used: 9816
>>
>> Index summary off heap memory used: 4677
>>
>> Compression metadata off heap memory used: 11303448
>>
>> Compacted partition minimum bytes: 150
>>
>> Compacted partition maximum bytes: 4139110981
>>
>> Compacted partition mean bytes: 13463937
>>
>> Average live cells per slice (last five minutes): 59.69230769230769
>>
>> Maximum live cells per slice (last five minutes): 149
>>
>> Average tombstones per slice (last five minutes): 8.564102564102564
>>
>> Maximum tombstones per slice (last five minutes): 42
>>
>>
>>
>> According to the "nodetool compactionhistory <keyspace>.<table>"
>>
>> the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"
>>
>> and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)
>>
>>
>>
>> However the table count is still very high compared to tables that have a
>> different compaction strategy. If I run a "nodetool compact <table>" the
>> SSTable count decrease dramatically to a reasonable number.
>>
>> I read many articles including:
>> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_datetieredcompactionstrategy&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=35ADGtvp3nLmSgTuemeQ5e3RIubiM_mbcWLyBbv6DEo&s=_1xjcAR70HQlYtx4geGugprQxrSNw2EaiSjeSWm2CJ4&e=>
>> however I can not really tell if this is an expected behavior.
>>
>> What concerns me is that I have an high tombstone read count despite
>> those are insert only tables. Compacting the table make the tombstone issue
>> disappear. Yes, we are using TTL to expire data after 3 months and I have
>> not touch the GC grace period.
>>
>> Looking at the file system I see the very first *-Data.db file that is
>> 15GB then there are all the other 43 *-Data.db files that are ranging from
>> 50 to 150MB in size.
>>
>>
>>
>> How can I debug this mis-compaction issue? Any help is much appreciated
>>
>> Best,
>>
>
>

Re: DTCS SSTable count issue

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

@Jeff

Rather than being an alternative, isn't your compaction strategy going to
deprecate (and finally replace) DTCS ? That was my understanding from the
ticket CASSANDRA-9666.

@Riccardo

If you are interested in TWCS from Jeff, I believe it has been introduced
in 3.0.8 actually, not 3.0.7
https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28.
Anyway, you can use it in any recent version as compactions strategies are
pluggable.

What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>

I observed the same issue recently and I am confident that TWCS will solve
this tombstone issue, but it is not tested on my side so far. Meanwhile, be
sure you have disabled any "read repair" on tables using DTCS and maybe
hints as well. It is a hard decision to take as you'll loose 2 out of 3
anti entropy systems, but DTCS behaves badly with those options turned on
(TWCS is fine with it). The last anti-entropy being a full repair that you
might already not be running as you only do inserts...

Also instead of major compactions (which comes with its set of issues /
tradeoffs too) you can think of a script smartly using sstablemetadata to
find the sstables holding too much tombstones and running single SSTable
compactions on them through JMX and user defined compactions. Meanwhile if
you want to do it manually, you could do it with something like this to
know the tombstone ratio from the biggest sstable:

du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du -sh
/path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs
sstablemetadata | grep tombstones
And something like this to run a user defined compaction on the ones you
chose (big sstable with high tombstone ratio):

echo "run -b org.apache.cassandra.db:type=CompactionManager
forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar
jmxterm-version.jar -l <ip>:<jmx_port>

*note:* you have to download jmxterm (or use any other jmx tool).


Did you give a try to the unchecked_tombstone_compaction as well
(compaction options at the table level)? Feel free to set this one to true.
I think it could be the default. It is safe as long as your machines have
some more resources available (not that much). That's the first thing I
would do.


Also if you use TTL only, feel free to reduce the gc_grace_seconds, this
will probably help having tombstones removed. I would start with other
solutions first. Keep in mind that if someday you perform deletes, this
setting could produce you some Zombies (data coming back), if you don't run
repair in the gc_grace_seconds for the entire ring.

C*heers,

-----------------------

Alain Rodriguez - alain@thelastpickle.com

France


The Last Pickle - Apache Cassandra Consulting

http://www.thelastpickle.com

2016-07-07 19:25 GMT+02:00 Jeff Jirsa <je...@crowdstrike.com>:

> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow
> over time, but ideally data will expire as it nears your 90 day TTL and
> those tables should start dropping away as they age.
>
>
>
> 3.0.7 introduces an alternative to DTCS you may find easier to use called
> TWCS. It will almost certainly help address the growing sstable count.
>
>
>
>
>
>
>
> *From: *Riccardo Ferrari <fe...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Thursday, July 7, 2016 at 6:49 AM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *DTCS SSTable count issue
>
>
>
> Hi everyone,
>
>
>
> This is my first question, apologize may I do something wrong.
>
>
>
> I have a small Cassandra cluster build upon 3 nodes. Originally born as
> 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4
> recently 3.0.6. Ubuntu is the OS.
>
>
>
> There are few tables that have DateTieredCompactionStrategy and are
> suffering of constantly growing SSTable count. I have the feeling this has
> something to do with the upgrade however I need some hint on how to debug
> this issue.
>
>
>
> Tables are created like:
>
> CREATE TABLE <table> (
>
>  ...
>
> PRIMARY KEY (...)
>
> ) WITH CLUSTERING ORDER BY (...)
>
>     AND bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 7776000
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
>
>
>
> and this is the "nodetool cfstats" output for that table:
>
> Read Count: 39
>
> Read Latency: 85.03307692307692 ms.
>
> Write Count: 9845275
>
> Write Latency: 0.09604882382665797 ms.
>
> Pending Flushes: 0
>
> Table: <table>
>
> SSTable count: 48
>
> Space used (live): 19566109394
>
> Space used (total): 19566109394
>
> Space used by snapshots (total): 109796505570
>
> Off heap memory used (total): 11317941
>
> SSTable Compression Ratio: 0.22632301701483284
>
> Number of keys (estimate): 2557
>
> Memtable cell count: 0
>
> Memtable data size: 0
>
> Memtable off heap memory used: 0
>
> Memtable switch count: 828
>
> Local read count: 39
>
> Local read latency: 93.051 ms
>
> Local write count: 9845275
>
> Local write latency: 0.106 ms
>
> Pending flushes: 0
>
> Bloom filter false positives: 2
>
> Bloom filter false ratio: 0.00000
>
> Bloom filter space used: 10200
>
> Bloom filter off heap memory used: 9816
>
> Index summary off heap memory used: 4677
>
> Compression metadata off heap memory used: 11303448
>
> Compacted partition minimum bytes: 150
>
> Compacted partition maximum bytes: 4139110981
>
> Compacted partition mean bytes: 13463937
>
> Average live cells per slice (last five minutes): 59.69230769230769
>
> Maximum live cells per slice (last five minutes): 149
>
> Average tombstones per slice (last five minutes): 8.564102564102564
>
> Maximum tombstones per slice (last five minutes): 42
>
>
>
> According to the "nodetool compactionhistory <keyspace>.<table>"
>
> the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"
>
> and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)
>
>
>
> However the table count is still very high compared to tables that have a
> different compaction strategy. If I run a "nodetool compact <table>" the
> SSTable count decrease dramatically to a reasonable number.
>
> I read many articles including:
> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_datetieredcompactionstrategy&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=35ADGtvp3nLmSgTuemeQ5e3RIubiM_mbcWLyBbv6DEo&s=_1xjcAR70HQlYtx4geGugprQxrSNw2EaiSjeSWm2CJ4&e=>
> however I can not really tell if this is an expected behavior.
>
> What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>
> Looking at the file system I see the very first *-Data.db file that is
> 15GB then there are all the other 43 *-Data.db files that are ranging from
> 50 to 150MB in size.
>
>
>
> How can I debug this mis-compaction issue? Any help is much appreciated
>
> Best,
>

Re: DTCS SSTable count issue

Posted by Jeff Jirsa <je...@crowdstrike.com>.

48 sstables isn’t unreasonable in a DTCS table. It will continue to grow over time, but ideally data will expire as it nears your 90 day TTL and those tables should start dropping away as they age.

 

3.0.7 introduces an alternative to DTCS you may find easier to use called TWCS. It will almost certainly help address the growing sstable count.  

 

 

 

From: Riccardo Ferrari <fe...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Thursday, July 7, 2016 at 6:49 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: DTCS SSTable count issue

 

Hi everyone, 

 

This is my first question, apologize may I do something wrong.

 

I have a small Cassandra cluster build upon 3 nodes. Originally born as 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4 recently 3.0.6. Ubuntu is the OS.

 

There are few tables that have DateTieredCompactionStrategy and are suffering of constantly growing SSTable count. I have the feeling this has something to do with the upgrade however I need some hint on how to debug this issue.

 

Tables are created like:

CREATE TABLE <table> (

 ...

PRIMARY KEY (...)

) WITH CLUSTERING ORDER BY (...)

    AND bloom_filter_fp_chance = 0.01

    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance = 1.0

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 7776000

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99PERCENTILE';

 

and this is the "nodetool cfstats" output for that table:

Read Count: 39

Read Latency: 85.03307692307692 ms.

Write Count: 9845275

Write Latency: 0.09604882382665797 ms.

Pending Flushes: 0

Table: <table>

SSTable count: 48

Space used (live): 19566109394

Space used (total): 19566109394

Space used by snapshots (total): 109796505570

Off heap memory used (total): 11317941

SSTable Compression Ratio: 0.22632301701483284

Number of keys (estimate): 2557

Memtable cell count: 0

Memtable data size: 0

Memtable off heap memory used: 0

Memtable switch count: 828

Local read count: 39

Local read latency: 93.051 ms

Local write count: 9845275

Local write latency: 0.106 ms

Pending flushes: 0

Bloom filter false positives: 2

Bloom filter false ratio: 0.00000

Bloom filter space used: 10200

Bloom filter off heap memory used: 9816

Index summary off heap memory used: 4677

Compression metadata off heap memory used: 11303448

Compacted partition minimum bytes: 150

Compacted partition maximum bytes: 4139110981

Compacted partition mean bytes: 13463937

Average live cells per slice (last five minutes): 59.69230769230769

Maximum live cells per slice (last five minutes): 149

Average tombstones per slice (last five minutes): 8.564102564102564

Maximum tombstones per slice (last five minutes): 42

 

According to the "nodetool compactionhistory <keyspace>.<table>"

the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"

and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)

 

However the table count is still very high compared to tables that have a different compaction strategy. If I run a "nodetool compact <table>" the SSTable count decrease dramatically to a reasonable number.

I read many articles including: http://www.datastax.com/dev/blog/datetieredcompactionstrategy however I can not really tell if this is an expected behavior.

What concerns me is that I have an high tombstone read count despite those are insert only tables. Compacting the table make the tombstone issue disappear. Yes, we are using TTL to expire data after 3 months and I have not touch the GC grace period.

Looking at the file system I see the very first *-Data.db file that is 15GB then there are all the other 43 *-Data.db files that are ranging from 50 to 150MB in size.

 

How can I debug this mis-compaction issue? Any help is much appreciated

Best,