You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sotirios Delimanolis <so...@yahoo.com.INVALID> on 2017/07/31 19:11:43 UTC

Cassandra isn't compacting old files

On Cassandra 2.2.11, I have a table that uses LeveledCompactionStrategy and that gets written to continuously. If I list the files in its data directory, I see something like this

-rw-r--r-- 1 acassy agroup 161733811 Jul 31 18:46 lb-135346-big-Data.db
-rw-r--r-- 1 acassy agroup 159626222 Jul 31 02:53 lb-135297-big-Data.db
-rw-r--r-- 1 acassy agroup 209892692 Jul 31 02:53 lb-135296-big-Data.db
-rw-r--r-- 1 acassy agroup 209758372 Jul 31 02:53 lb-135295-big-Data.db
-rw-r--r-- 1 acassy agroup 210109976 Jul 31 02:52 lb-135294-big-Data.db
-rw-r--r-- 1 acassy agroup 209871524 Jul 31 02:52 lb-135293-big-Data.db
-rw-r--r-- 1 acassy agroup 209889455 Jul 31 02:51 lb-135292-big-Data.db
-rw-r--r-- 1 acassy agroup 209771979 Jul 31 02:51 lb-135291-big-Data.db
-rw-r--r-- 1 acassy agroup  96253626 Jul 31 02:51 lb-135290-big-Data.db
-rw-r--r-- 1 acassy agroup 209880076 Jul 31 02:51 lb-135289-big-Data.db
-rw-r--r-- 1 acassy agroup 210007576 Jul 31 02:50 lb-135288-big-Data.db
-rw-r--r-- 1 acassy agroup 209752707 Jul 31 02:50 lb-135287-big-Data.db
-rw-r--r-- 1 acassy agroup 209832244 Jul 31 02:50 lb-135286-big-Data.db
-rw-r--r-- 1 acassy agroup 209786246 Jul 31 02:50 lb-135285-big-Data.db
-rw-r--r-- 1 acassy agroup 209786623 Jul 31 02:50 lb-135284-big-Data.db
-rw-r--r-- 1 acassy agroup 209773999 Jul 31 02:49 lb-135282-big-Data.db
-rw-r--r-- 1 acassy agroup 209769307 Jul 31 02:49 lb-135281-big-Data.db
-rw-r--r-- 1 acassy agroup 209774780 Jul 31 02:49 lb-135280-big-Data.db
-rw-r--r-- 1 acassy agroup 157218909 Jul 30 02:56 lb-135211-big-Data.db
-rw-r--r-- 1 acassy agroup 210016992 Jul 30 02:56 lb-135210-big-Data.db
-rw-r--r-- 1 acassy agroup 209723709 Jul 30 02:56 lb-135209-big-Data.db
-rw-r--r-- 1 acassy agroup 209799709 Jul 30 02:56 lb-135208-big-Data.db
-rw-r--r-- 1 acassy agroup  35399539 Jul 29 04:05 lb-135128-big-Data.db
-rw-r--r-- 1 acassy agroup 209737007 Jul 29 04:05 lb-135127-big-Data.db
-rw-r--r-- 1 acassy agroup 209741955 Jul 29 04:05 lb-135126-big-Data.db
-rw-r--r-- 1 acassy agroup 209837894 Jul 29 04:04 lb-135125-big-Data.db
-rw-r--r-- 1 acassy agroup 209775097 Jul 29 04:04 lb-135124-big-Data.db
-rw-r--r-- 1 acassy agroup   2762187 Jul 28 09:48 lb-135050-big-Data.db
-rw-r--r-- 1 acassy agroup 209736820 Jul 28 09:48 lb-135049-big-Data.db
-rw-r--r-- 1 acassy agroup 209740897 Jul 28 09:47 lb-135048-big-Data.db
-rw-r--r-- 1 acassy agroup 209765920 Jul 28 09:47 lb-135047-big-Data.db
-rw-r--r-- 1 acassy agroup 210085882 Jul 28 09:46 lb-135046-big-Data.db
-rw-r--r-- 1 acassy agroup 209901085 Jul 28 09:46 lb-135045-big-Data.db
-rw-r--r-- 1 acassy agroup 209861149 Jul 28 09:46 lb-135044-big-Data.db
-rw-r--r-- 1 acassy agroup 209723151 Jul 28 09:45 lb-135043-big-Data.db
-rw-r--r-- 1 acassy agroup 209733066 Jul 28 09:45 lb-135042-big-Data.db
-rw-r--r-- 1 acassy agroup  78406141 Jul 27 13:29 lb-134962-big-Data.db
-rw-r--r-- 1 acassy agroup 209781961 Jul 27 13:29 lb-134961-big-Data.db
-rw-r--r-- 1 acassy agroup 186161072 Jul 26 16:27 lb-134881-big-Data.db
-rw-r--r-- 1 acassy agroup 209857207 Jul 26 16:27 lb-134880-big-Data.db
-rw-r--r-- 1 acassy agroup  42059209 Jul 25 19:32 lb-134800-big-Data.db
-rw-r--r-- 1 acassy agroup 210004006 Jul 25 19:32 lb-134799-big-Data.db
-rw-r--r-- 1 acassy agroup  87893551 Jul 24 23:08 lb-134721-big-Data.db
-rw-r--r-- 1 acassy agroup 209827743 Jul 24 23:08 lb-134720-big-Data.db
-rw-r--r-- 1 acassy agroup 209734295 Jul 24 23:08 lb-134719-big-Data.db
-rw-r--r-- 1 acassy agroup 209883247 Jul 24 23:07 lb-134718-big-Data.db
-rw-r--r-- 1 acassy agroup 209738278 Jul 24 23:07 lb-134717-big-Data.db
-rw-r--r-- 1 acassy agroup 158983134 Jul 21 10:25 lb-134404-big-Data.db
-rw-r--r-- 1 acassy agroup 209740532 Jul 21 10:25 lb-134403-big-Data.db
-rw-r--r-- 1 acassy agroup 209725876 Jul 21 10:25 lb-134402-big-Data.db
-rw-r--r-- 1 acassy agroup  72250507 Jul 18 02:10 lb-134062-big-Data.db
-rw-r--r-- 1 acassy agroup 209827278 Jul 17 09:13 lb-133986-big-Data.db
-rw-r--r-- 1 acassy agroup  88312194 Jul 13 20:30 lb-133665-big-Data.db
-rw-r--r-- 1 acassy agroup  89114556 Jul 13 01:12 lb-133589-big-Data.db
-rw-r--r-- 1 acassy agroup 175385370 Jul 12 08:44 lb-133518-big-Data.db
-rw-r--r-- 1 acassy agroup 209742099 Jul 11 14:02 lb-133441-big-Data.db
-rw-r--r-- 1 acassy agroup 209733254 Jul 10 18:06 lb-133361-big-Data.db
-rw-r--r-- 1 acassy agroup 209724386 Jul 10 18:06 lb-133360-big-Data.db
-rw-r--r-- 1 acassy agroup  57820535 Nov 28  2016 lb-99553-big-Data.db
-rw-r--r-- 1 acassy agroup 209734839 Nov 28  2016 lb-99552-big-Data.db
-rw-r--r-- 1 acassy agroup 209857899 Nov 28  2016 lb-99551-big-Data.db
-rw-r--r-- 1 acassy agroup 209751188 Nov 28  2016 lb-99550-big-Data.db
-rw-r--r-- 1 acassy agroup 209731744 Nov 28  2016 lb-99549-big-Data.db
-rw-r--r-- 1 acassy agroup 209736307 Nov 28  2016 lb-99548-big-Data.db
-rw-r--r-- 1 acassy agroup 209799584 Nov 28  2016 lb-99547-big-Data.db
-rw-r--r-- 1 acassy agroup 210050527 Nov 28  2016 lb-99546-big-Data.db
-rw-r--r-- 1 acassy agroup 209731049 Nov 28  2016 lb-99545-big-Data.db
-rw-r--r-- 1 acassy agroup 209792768 Nov 28  2016 lb-99544-big-Data.db
-rw-r--r-- 1 acassy agroup 209739117 Nov 28  2016 lb-99543-big-Data.db
-rw-r--r-- 1 acassy agroup 209753948 Nov 28  2016 lb-99542-big-Data.db
-rw-r--r-- 1 acassy agroup 209772674 Nov 28  2016 lb-99541-big-Data.db
-rw-r--r-- 1 acassy agroup 209793439 Nov 28  2016 lb-99540-big-Data.db
-rw-r--r-- 1 acassy agroup 209719742 Nov 28  2016 lb-99539-big-Data.db
-rw-r--r-- 1 acassy agroup 209784762 Nov 28  2016 lb-99538-big-Data.db
-rw-r--r-- 1 acassy agroup 209744155 Nov 28  2016 lb-99536-big-Data.db
-rw-r--r-- 1 acassy agroup 209878115 Nov 28  2016 lb-99535-big-Data.db
-rw-r--r-- 1 acassy agroup 209749096 Nov 28  2016 lb-99534-big-Data.db
-rw-r--r-- 1 acassy agroup 209859702 Nov 28  2016 lb-99533-big-Data.db
-rw-r--r-- 1 acassy agroup 209834936 Nov 28  2016 lb-99532-big-Data.db
-rw-r--r-- 1 acassy agroup 209768726 Nov 28  2016 lb-99531-big-Data.db
-rw-r--r-- 1 acassy agroup 209754728 Nov 28  2016 lb-99530-big-Data.db
-rw-r--r-- 1 acassy agroup 209718788 Nov 28  2016 lb-99529-big-Data.db
-rw-r--r-- 1 acassy agroup 209769816 Nov 28  2016 lb-99528-big-Data.db
-rw-r--r-- 1 acassy agroup 170810315 Nov 25  2016 lb-98259-big-Data.db
-rw-r--r-- 1 acassy agroup 209749227 Nov 25  2016 lb-98258-big-Data.db
-rw-r--r-- 1 acassy agroup 209735521 Nov 23  2016 lb-97557-big-Data.db
-rw-r--r-- 1 acassy agroup 209748060 Nov 23  2016 lb-97550-big-Data.db
-rw-r--r-- 1 acassy agroup 209724471 Nov 23  2016 lb-97445-big-Data.db
-rw-r--r-- 1 acassy agroup 209869523 Nov 18  2016 lb-95132-big-Data.db
-rw-r--r-- 1 acassy agroup 209809636 Nov 17  2016 lb-94927-big-Data.db
-rw-r--r-- 1 acassy agroup 209758214 Nov 17  2016 lb-94873-big-Data.db
-rw-r--r-- 1 acassy agroup 209804833 Nov 16  2016 lb-94452-big-Data.db
-rw-r--r-- 1 acassy agroup 110494743 Oct  1  2016 lb-75515-big-Data.db
-rw-r--r-- 1 acassy agroup 209777568 Oct  1  2016 lb-75514-big-Data.db
-rw-r--r-- 1 acassy agroup 209690277 Aug 16  2016 lb-50882-big-Data.db
-rw-r--r-- 1 acassy agroup 209743321 Aug 16  2016 lb-50879-big-Data.db
-rw-r--r-- 1 acassy agroup 209708311 Aug 16  2016 lb-50877-big-Data.db
-rw-r--r-- 1 acassy agroup 200324269 Aug 16  2016 lb-50614-big-Data.db
-rw-r--r-- 1 acassy agroup 201124079 Aug 16  2016 lb-50613-big-Data.db
-rw-r--r-- 1 acassy agroup 201165532 Aug 16  2016 lb-50612-big-Data.db
-rw-r--r-- 1 acassy agroup 201079038 Aug 16  2016 lb-50611-big-Data.db
-rw-r--r-- 1 acassy agroup 201189531 Aug 16  2016 lb-50610-big-Data.db
-rw-r--r-- 1 acassy agroup 201091465 Aug 16  2016 lb-50609-big-Data.db
-rw-r--r-- 1 acassy agroup 201147689 Aug 16  2016 lb-50607-big-Data.db
-rw-r--r-- 1 acassy agroup 201072987 Aug 16  2016 lb-50606-big-Data.db
-rw-r--r-- 1 acassy agroup 201234706 Aug 16  2016 lb-50604-big-Data.db
-rw-r--r-- 1 acassy agroup 201118109 Aug 16  2016 lb-50603-big-Data.db
Notice all those files from 2016. These never get compacted away. If I output nodetool cfstats, I see there are currently four SSTable levels.
Table: MyTableSSTable count: 70SSTables in each level: [1, 10, 47, 50, 0, 0, 0, 0, 0] (200MB each)
sstablemetadata tells me those August files are in the 3rd level. I can also tell through lsof that Cassandra has an open handle to all of these files. Why isn't Cassandra including these files in its compactions? Has the strategy simply not reached its threshold for the next level? That seems very unlikely in almost a year.
This particular table has TTL for all its rows, but I've seen this behaviour in other tables that don't.
If I issue a compaction with nodetool compact, they disappear. What's going on?


Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
These guesses will have to do. I thought something was wrong with such old SSTables.
Thanks for your help investigating!

On Wednesday, August 23, 2017, 3:09:34 AM PDT, kurt greaves <ku...@instaclustr.com> wrote:

Ignore me, I was getting the major compaction for LCS mixed up with STCS. Estimated droppable tombstones tends to be fairly accurate. If your SSTables in level 2 have that many tombstones I'd say that's definitely the reason L3 isn't being compacted.
As for how you got here in the first place, hard to say. Maybe a burst of writes over a period a long time ago? Or possibly repairs streaming a lot of data? Hard to tell what happened in the past without lots of metrics.​

Re: Cassandra isn't compacting old files

Posted by kurt greaves <ku...@instaclustr.com>.
Ignore me, I was getting the major compaction for LCS mixed up with STCS.
Estimated droppable tombstones tends to be fairly accurate. If your
SSTables in level 2 have that many tombstones I'd say that's definitely the
reason L3 isn't being compacted.

As for how you got here in the first place, hard to say. Maybe a burst of
writes over a period a long time ago? Or possibly repairs streaming a lot
of data? Hard to tell what happened in the past without lots of metrics.
​

Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
I issued another major compaction just now and a brand new SSTable in Level 2 has an Estimated droppable tombstone value of 0.64. I don't know how accurate that is.

On Tuesday, August 22, 2017, 9:33:34 PM PDT, Sotirios Delimanolis <so...@yahoo.com.INVALID> wrote:

What do you mean by "a single SSTable"? SSTable size is set to 200MB and there are ~ 100 SSTables in that previous example in Level 3.
This previous example table doesn't have a TTL, but we do delete rows. I've since compacted the table so I can't provide the previous "Estimated droppable tombstones", but it was > 0.3. I've set the threshold to 0.25, but unchecked_tombstone_compaction is false. Perhaps setting it to true would eventually compact individual SSTables.
I agree that I am probably not creating enough data at the moment, but what got me into this situation in the first place? All (+/- a couple) SSTables in each level are last modified on the same date. 
On Tuesday, August 22, 2017, 5:16:27 PM PDT, kurt greaves <ku...@instaclustr.com> wrote:

LCS major compaction on 2.2 should compact each level to have a single SSTable. It seems more likely to me that you are simply not generating enough data to require compactions in L3 and most data is TTL'ing before it gets there. Out of curiosity, what does sstablemetadata report for  Estimated droppable tombstones on one of those tables, and what is your TTL?​

Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
What do you mean by "a single SSTable"? SSTable size is set to 200MB and there are ~ 100 SSTables in that previous example in Level 3.
This previous example table doesn't have a TTL, but we do delete rows. I've since compacted the table so I can't provide the previous "Estimated droppable tombstones", but it was > 0.3. I've set the threshold to 0.25, but unchecked_tombstone_compaction is false. Perhaps setting it to true would eventually compact individual SSTables.
I agree that I am probably not creating enough data at the moment, but what got me into this situation in the first place? All (+/- a couple) SSTables in each level are last modified on the same date. 
On Tuesday, August 22, 2017, 5:16:27 PM PDT, kurt greaves <ku...@instaclustr.com> wrote:

LCS major compaction on 2.2 should compact each level to have a single SSTable. It seems more likely to me that you are simply not generating enough data to require compactions in L3 and most data is TTL'ing before it gets there. Out of curiosity, what does sstablemetadata report for  Estimated droppable tombstones on one of those tables, and what is your TTL?​

Re: Cassandra isn't compacting old files

Posted by kurt greaves <ku...@instaclustr.com>.
LCS major compaction on 2.2 should compact each level to have a single
SSTable. It seems more likely to me that you are simply not generating
enough data to require compactions in L3 and most data is TTL'ing before it
gets there. Out of curiosity, what does sstablemetadata report for
 Estimated droppable tombstones on one of those tables, and what is your
TTL?​

Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
Ignore the files missing those other components, that was confirmation bias :( I was sorting by date instead of by name and just assumed that something was wrong with Cassandra.
Here's an example table's SSTables, sorted by level, then by repaired status:
SSTable [name=lb-432055-big-Data.db, level=0, repaired=false, instant=2017-08-22]------------ Level 1 ------------SSTable [name=lb-431497-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431496-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431495-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431498-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431499-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431501-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431503-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431500-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431502-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-431504-big-Data.db, level=1, repaired=false, instant=2017-08-17]
SSTable [name=lb-426107-big-Data.db, level=1, repaired=true, instant=2017-07-07]
SSTable [name=lb-426105-big-Data.db, level=1, repaired=true, instant=2017-07-07]
SSTable [name=lb-426090-big-Data.db, level=1, repaired=true, instant=2017-07-07]
SSTable [name=lb-426092-big-Data.db, level=1, repaired=true, instant=2017-07-07]
SSTable [name=lb-426094-big-Data.db, level=1, repaired=true, instant=2017-07-07]
SSTable [name=lb-426096-big-Data.db, level=1, repaired=true, instant=2017-07-07]
SSTable [name=lb-426104-big-Data.db, level=1, repaired=true, instant=2017-07-07]
SSTable [name=lb-426102-big-Data.db, level=1, repaired=true, instant=2017-07-07]
SSTable [name=lb-426100-big-Data.db, level=1, repaired=true, instant=2017-07-07]------------ Level 2 ------------SSTable [name=lb-423829-big-Data.db, level=2, repaired=false, instant=2017-06-23]
SSTable [name=lb-431505-big-Data.db, level=2, repaired=false, instant=2017-08-17]
SSTable [name=lb-423830-big-Data.db, level=2, repaired=false, instant=2017-06-23]
SSTable [name=lb-424559-big-Data.db, level=2, repaired=false, instant=2017-06-29]
SSTable [name=lb-424568-big-Data.db, level=2, repaired=false, instant=2017-06-29]
SSTable [name=lb-424567-big-Data.db, level=2, repaired=false, instant=2017-06-29]
SSTable [name=lb-424566-big-Data.db, level=2, repaired=false, instant=2017-06-29]
SSTable [name=lb-424561-big-Data.db, level=2, repaired=false, instant=2017-06-29]
SSTable [name=lb-424563-big-Data.db, level=2, repaired=false, instant=2017-06-29]
SSTable [name=lb-424562-big-Data.db, level=2, repaired=false, instant=2017-06-29]
SSTable [name=lb-423825-big-Data.db, level=2, repaired=false, instant=2017-06-23]
SSTable [name=lb-423823-big-Data.db, level=2, repaired=false, instant=2017-06-23]
SSTable [name=lb-424560-big-Data.db, level=2, repaired=false, instant=2017-06-29]
SSTable [name=lb-423824-big-Data.db, level=2, repaired=false, instant=2017-06-23]
SSTable [name=lb-423828-big-Data.db, level=2, repaired=false, instant=2017-06-23]
SSTable [name=lb-423826-big-Data.db, level=2, repaired=false, instant=2017-06-23]
SSTable [name=lb-423380-big-Data.db, level=2, repaired=false, instant=2017-06-20]
SSTable [name=lb-426057-big-Data.db, level=2, repaired=true, instant=2017-07-07]
SSTable [name=lb-426058-big-Data.db, level=2, repaired=true, instant=2017-07-07][...~60 more from 2017-07-07...]
SSTable [name=lb-425991-big-Data.db, level=2, repaired=true, instant=2017-07-07]
SSTable [name=lb-426084-big-Data.db, level=2, repaired=true, instant=2017-07-07]------------ Level 3 ------------SSTable [name=lb-383142-big-Data.db, level=3, repaired=false, instant=2016-11-19]
SSTable [name=lb-383143-big-Data.db, level=3, repaired=false, instant=2016-11-19][...~40 more from 2016-11-19...]
SSTable [name=lb-383178-big-Data.db, level=3, repaired=false, instant=2016-11-19]
SSTable [name=lb-383188-big-Data.db, level=3, repaired=false, instant=2016-11-19]
SSTable [name=lb-425948-big-Data.db, level=3, repaired=false, instant=2017-07-07]
SSTable [name=lb-383179-big-Data.db, level=3, repaired=false, instant=2016-11-19]
SSTable [name=lb-383175-big-Data.db, level=3, repaired=false, instant=2016-11-19][...~30 more from 2016-11-19...]
SSTable [name=lb-383160-big-Data.db, level=3, repaired=false, instant=2016-11-19]
SSTable [name=lb-383181-big-Data.db, level=3, repaired=false, instant=2016-11-19]
SSTable [name=lb-383258-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-383256-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-386829-big-Data.db, level=3, repaired=true, instant=2016-11-30]
SSTable [name=lb-383251-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-383259-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-383250-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-386823-big-Data.db, level=3, repaired=true, instant=2016-11-30]
SSTable [name=lb-386825-big-Data.db, level=3, repaired=true, instant=2016-11-30]
SSTable [name=lb-383263-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-383252-big-Data.db, level=3, repaired=true, instant=2016-11-19][...~10 more from 2016-11-19...]
SSTable [name=lb-383231-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-383242-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-383244-big-Data.db, level=3, repaired=true, instant=2016-11-19]
SSTable [name=lb-386819-big-Data.db, level=3, repaired=true, instant=2016-11-30]
[...19 more from 2016-11-19...]
SSTable [name=lb-425947-big-Data.db, level=3, repaired=true, instant=2017-07-07]
[...8 more from 2016-11-19...]
SSTable [name=lb-383262-big-Data.db, level=3, repaired=true, instant=2016-11-19]

Obviously, I cannot remember what I (or team) did on those dates, but that's what I'm trying to figure out. It looks like I performed a major compaction (nodetool compact) on or around 2016-11-19 that brought all the data into level 3. Then background compactions on 2017-06-23 and 2017-07-07 brought data into level 2. Level 1 has been growing since then.
Does that seem plausible?

On Monday, August 21, 2017, 5:08:44 PM PDT, kurt greaves <ku...@instaclustr.com> wrote:

Sorry about that Sotirios, I didn't make the connection between the two threads and this one dropped off my radar.
Is it possible that there's always a compaction to be run in the "repaired" state, with that many SSTables, that unrepaired compactions are essentially "starved", considering the WrappingCompactionStrategy  prioritizes the "repaired" set?
Not really as the code prioritises whichever pool has the most pending tasks. See hereBesides, the repaired pool will only increase if you actually run repairs, so there is only a limited set of data that can be compacted between repairs.
Cassandra has been known to leave some file handles open on occasion, I assume you've restarted Cassandra to see if it clears out the files that are invalid (the ones that only have Data and Index)?

Re: Cassandra isn't compacting old files

Posted by kurt greaves <ku...@instaclustr.com>.
Sorry about that Sotirios, I didn't make the connection between the two
threads and this one dropped off my radar.

> Is it possible that there's always a compaction to be run in the
> "repaired" state, with that many SSTables, that unrepaired compactions are
> essentially "starved", considering the WrappingCompactionStrategy prioritizes
> the "repaired" set?

Not really as the code prioritises whichever pool has the most pending
tasks. See here
<https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java#L76>
Besides, the repaired pool will only increase if you actually run repairs,
so there is only a limited set of data that can be compacted between
repairs.

Cassandra has been known to leave some file handles open on occasion, I
assume you've restarted Cassandra to see if it clears out the files that
are invalid (the ones that only have Data and Index)?

Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
There seem to be a lot of SSTables in a repaired state and a lot in an unrepaired state. For example, for this one table, the logs report

TRACE [main] 2017-08-15 23:50:30,732 LeveledManifest.java:473 - L0 contains 2 SSTables (176997267 bytes) in Manifest@1217144872
TRACE [main] 2017-08-15 23:50:30,732 LeveledManifest.java:473 - L1 contains 10 SSTables (2030691642 bytes) in Manifest@1217144872
TRACE [main] 2017-08-15 23:50:30,732 LeveledManifest.java:473 - L2 contains 94 SSTables (19352545435 bytes) in Manifest@1217144872

and 

TRACE [main] 2017-08-15 23:50:30,731 LeveledManifest.java:473 - L0 contains 1 SSTables (65038718 bytes) in Manifest@499561185
TRACE [main] 2017-08-15 23:50:30,731 LeveledManifest.java:473 - L2 contains 5 SSTables (117221111 bytes) in Manifest@499561185
TRACE [main] 2017-08-15 23:50:30,731 LeveledManifest.java:473 - L3 contains 39 SSTables (7377654173 bytes) in Manifest@499561185

Is it possible that there's always a compaction to be run in the "repaired" state, with that many SSTables, that unrepaired compactions are essentially "starved", considering the WrappingCompactionStrategy prioritizes the "repaired" set?On Wednesday, August 2, 2017, 2:35:02 PM PDT, Sotirios Delimanolis <so...@yahoo.com.INVALID> wrote:

Turns out there are already logs for this in Tracker.java. I enabled those and clearly saw the old files are being tracked.
What else can I look at for hints about whether these files are later invalidated/filtered out somehow?

On Tuesday, August 1, 2017, 3:29:38 PM PDT, Sotirios Delimanolis <so...@yahoo.com.INVALID> wrote:

There aren't any ERROR logs for failure to load these files and they do get compacted away. I'll try to plug some DEBUG logs in a custom Cassandra version.On Tuesday, August 1, 2017, 12:13:09 PM PDT, Jeff Jirsa <jj...@gmail.com> wrote:

I don't have time to dive deep into the code of your version, but it may be ( https://issues.apache.org/jira/browse/CASSANDRA-13620 ) , or it may be something else.
I wouldn't expect compaction to touch them if they're invalid. The handle may be a leftover from trying to load them. 


On Tue, Aug 1, 2017 at 10:01 AM, Sotirios Delimanolis <so...@yahoo.com.invalid> wrote:

@Jeff, why does compaction clear them and why does Cassandra keep a handle to them? Shouldn't they be ignored entirely? Is there an error log I can enable to detect them?
@kurt, there are no such logs for any of these tables. We have a custom log in our build of Cassandra that does shows that compactions are happening for that table but only ever include the files from July.
On Tuesday, August 1, 2017, 12:55:53 AM PDT, kurt greaves <ku...@instaclustr.com> wrote:

Seeing as there aren't even 100 SSTables in L2, LCS should be gradually trying to compact L3 with L2. You could search the logs for "Adding high-level (L3)" to check if this is happening. ​


Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
Turns out there are already logs for this in Tracker.java. I enabled those and clearly saw the old files are being tracked.
What else can I look at for hints about whether these files are later invalidated/filtered out somehow?

On Tuesday, August 1, 2017, 3:29:38 PM PDT, Sotirios Delimanolis <so...@yahoo.com.INVALID> wrote:

There aren't any ERROR logs for failure to load these files and they do get compacted away. I'll try to plug some DEBUG logs in a custom Cassandra version.On Tuesday, August 1, 2017, 12:13:09 PM PDT, Jeff Jirsa <jj...@gmail.com> wrote:

I don't have time to dive deep into the code of your version, but it may be ( https://issues.apache.org/jira/browse/CASSANDRA-13620 ) , or it may be something else.
I wouldn't expect compaction to touch them if they're invalid. The handle may be a leftover from trying to load them. 


On Tue, Aug 1, 2017 at 10:01 AM, Sotirios Delimanolis <so...@yahoo.com.invalid> wrote:

@Jeff, why does compaction clear them and why does Cassandra keep a handle to them? Shouldn't they be ignored entirely? Is there an error log I can enable to detect them?
@kurt, there are no such logs for any of these tables. We have a custom log in our build of Cassandra that does shows that compactions are happening for that table but only ever include the files from July.
On Tuesday, August 1, 2017, 12:55:53 AM PDT, kurt greaves <ku...@instaclustr.com> wrote:

Seeing as there aren't even 100 SSTables in L2, LCS should be gradually trying to compact L3 with L2. You could search the logs for "Adding high-level (L3)" to check if this is happening. ​


Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
There aren't any ERROR logs for failure to load these files and they do get compacted away. I'll try to plug some DEBUG logs in a custom Cassandra version.On Tuesday, August 1, 2017, 12:13:09 PM PDT, Jeff Jirsa <jj...@gmail.com> wrote:

I don't have time to dive deep into the code of your version, but it may be ( https://issues.apache.org/jira/browse/CASSANDRA-13620 ) , or it may be something else.
I wouldn't expect compaction to touch them if they're invalid. The handle may be a leftover from trying to load them. 


On Tue, Aug 1, 2017 at 10:01 AM, Sotirios Delimanolis <so...@yahoo.com.invalid> wrote:

@Jeff, why does compaction clear them and why does Cassandra keep a handle to them? Shouldn't they be ignored entirely? Is there an error log I can enable to detect them?
@kurt, there are no such logs for any of these tables. We have a custom log in our build of Cassandra that does shows that compactions are happening for that table but only ever include the files from July.
On Tuesday, August 1, 2017, 12:55:53 AM PDT, kurt greaves <ku...@instaclustr.com> wrote:

Seeing as there aren't even 100 SSTables in L2, LCS should be gradually trying to compact L3 with L2. You could search the logs for "Adding high-level (L3)" to check if this is happening. ​


Re: Cassandra isn't compacting old files

Posted by Jeff Jirsa <jj...@gmail.com>.
I don't have time to dive deep into the code of your version, but it may be
( https://issues.apache.org/jira/browse/CASSANDRA-13620 ) , or it may be
something else.

I wouldn't expect compaction to touch them if they're invalid. The handle
may be a leftover from trying to load them.



On Tue, Aug 1, 2017 at 10:01 AM, Sotirios Delimanolis <
sotodel_89@yahoo.com.invalid> wrote:

> @Jeff, why does compaction clear them and why does Cassandra keep a handle
> to them? Shouldn't they be ignored entirely? Is there an error log I can
> enable to detect them?
>
> @kurt, there are no such logs for any of these tables. We have a custom
> log in our build of Cassandra that does shows that compactions are
> happening for that table but only ever include the files from July.
>
> On Tuesday, August 1, 2017, 12:55:53 AM PDT, kurt greaves <
> kurt@instaclustr.com> wrote:
>
>
> Seeing as there aren't even 100 SSTables in L2, LCS should be gradually
> trying to compact L3 with L2. You could search the logs for "Adding
> high-level (L3)" to check if this is happening. ​
>

Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
@Jeff, why does compaction clear them and why does Cassandra keep a handle to them? Shouldn't they be ignored entirely? Is there an error log I can enable to detect them?
@kurt, there are no such logs for any of these tables. We have a custom log in our build of Cassandra that does shows that compactions are happening for that table but only ever include the files from July.
On Tuesday, August 1, 2017, 12:55:53 AM PDT, kurt greaves <ku...@instaclustr.com> wrote:

Seeing as there aren't even 100 SSTables in L2, LCS should be gradually trying to compact L3 with L2. You could search the logs for "Adding high-level (L3)" to check if this is happening. ​

Re: Cassandra isn't compacting old files

Posted by kurt greaves <ku...@instaclustr.com>.
Seeing as there aren't even 100 SSTables in L2, LCS should be gradually
trying to compact L3 with L2. You could search the logs for "Adding
high-level (L3)" to check if this is happening. ​

Re: Cassandra isn't compacting old files

Posted by Jeff Jirsa <jj...@gmail.com>.
Yea, it means they're effecitvely invalid files, and would not be loaded at
startup.



On Mon, Jul 31, 2017 at 9:07 PM, Sotirios Delimanolis <
sotodel_89@yahoo.com.invalid> wrote:

> I don't want to go down the TTL path because this behaviour is also
> occurring for tables without a TTL. I don't have hard numbers about the
> amount of writes, but there's definitely been enough to trigger compaction
> in the ~year since.
>
> We've never changed the topology of this cluster. Ranges have always been
> the same.
>
> I can't remember about repairs, but running sstablemetadata shows
>
> Repaired at: 0
>
> across all files. The Cassandra process has been restarted multiple times
> in the last year.
>
> I'v noticed there are only -Data.db and -Index.db files in some rare
> cases. The compression info, filter, summary, and statistics files are
> missing. Does that hint at anything?
>
>
>
>
> On Monday, July 31, 2017, 3:39:11 PM PDT, Jeff Jirsa <jj...@apache.org>
> wrote:
>
>
>
>
> On 2017-07-31 15:00 (-0700), kurt greaves <ku...@instaclustr.com> wrote:
> > How long is your ttl and how much data do you write per day (ie, what is
> > the difference in disk usage over a day)? Did you always TTL?
> > I'd say it's likely there is live data in those older sstables but you're
> > not generating enough data to push new data to the highest level before
> it
> > expires.
>
> >
>
> This is a pretty good option. Other options:
>
> 1) You changed topology on Nov 28, and the ranges covered by those
> sstables are no longer intersecting with the ranges on the node, so they're
> not being selected as LCS compaction candidates (and if you run nodetool
> cleanup, they probably get deleted)
>
> 2) You ran incremental repairs once, and stopped on the 28th, and now
> those sstables have a repairedAt set, so they won't be compacted with other
> (unrepaired) sstables
>
> 3) There's some horrible bug where the sstables got lost from the running
> daemon, and if you restart it'll magically get sucked in and start working
> again (this is really unlikely, and it would be a very bad bug).
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>
>

Re: Cassandra isn't compacting old files

Posted by Sotirios Delimanolis <so...@yahoo.com.INVALID>.
I don't want to go down the TTL path because this behaviour is also occurring for tables without a TTL. I don't have hard numbers about the amount of writes, but there's definitely been enough to trigger compaction in the ~year since.
We've never changed the topology of this cluster. Ranges have always been the same. 
I can't remember about repairs, but running sstablemetadata shows
Repaired at: 0
across all files. The Cassandra process has been restarted multiple times in the last year.
I'v noticed there are only -Data.db and -Index.db files in some rare cases. The compression info, filter, summary, and statistics files are missing. Does that hint at anything?




On Monday, July 31, 2017, 3:39:11 PM PDT, Jeff Jirsa <jj...@apache.org> wrote:



On 2017-07-31 15:00 (-0700), kurt greaves <ku...@instaclustr.com> wrote: 
> How long is your ttl and how much data do you write per day (ie, what is
> the difference in disk usage over a day)? Did you always TTL?
> I'd say it's likely there is live data in those older sstables but you're
> not generating enough data to push new data to the highest level before it
> expires.
> 

This is a pretty good option. Other options:

1) You changed topology on Nov 28, and the ranges covered by those sstables are no longer intersecting with the ranges on the node, so they're not being selected as LCS compaction candidates (and if you run nodetool cleanup, they probably get deleted)

2) You ran incremental repairs once, and stopped on the 28th, and now those sstables have a repairedAt set, so they won't be compacted with other (unrepaired) sstables

3) There's some horrible bug where the sstables got lost from the running daemon, and if you restart it'll magically get sucked in and start working again (this is really unlikely, and it would be a very bad bug).



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Cassandra isn't compacting old files

Posted by Jeff Jirsa <jj...@apache.org>.

On 2017-07-31 15:00 (-0700), kurt greaves <ku...@instaclustr.com> wrote: 
> How long is your ttl and how much data do you write per day (ie, what is
> the difference in disk usage over a day)? Did you always TTL?
> I'd say it's likely there is live data in those older sstables but you're
> not generating enough data to push new data to the highest level before it
> expires.
> 

This is a pretty good option. Other options:

1) You changed topology on Nov 28, and the ranges covered by those sstables are no longer intersecting with the ranges on the node, so they're not being selected as LCS compaction candidates (and if you run nodetool cleanup, they probably get deleted)

2) You ran incremental repairs once, and stopped on the 28th, and now those sstables have a repairedAt set, so they won't be compacted with other (unrepaired) sstables

3) There's some horrible bug where the sstables got lost from the running daemon, and if you restart it'll magically get sucked in and start working again (this is really unlikely, and it would be a very bad bug).



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Cassandra isn't compacting old files

Posted by kurt greaves <ku...@instaclustr.com>.
How long is your ttl and how much data do you write per day (ie, what is
the difference in disk usage over a day)? Did you always TTL?
I'd say it's likely there is live data in those older sstables but you're
not generating enough data to push new data to the highest level before it
expires.