You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Paul Chandler <pa...@redshots.com> on 2019/06/04 13:25:50 UTC

Re: TWCS sstables not dropping even though all data is expired

Mike,

It has taken me sometime, but I have now written this up in more detail on my blog: http://www.redshots.com/cassandra-twcs-must-have-ttls/ <http://www.redshots.com/cassandra-twcs-must-have-ttls/>

However I couldn’t get the tombstone compaction subproperties to work as expected.

If I use the following properties:

gc_grace_seconds = 60
   AND default_time_to_live = 300
    AND compaction = {'compaction_window_size': '1', 
    				  'compaction_window_unit': 'MINUTES', 
                                 'tombstone_compaction_interval': '60',
                                 'tombstone_threshold': '0.01',
                                 'unchecked_tombstone_compaction': 'true',
     				  'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy’}


When a row without a TTL is blocking the sstable being deleted, I would expect the later sstables to be deleted with these settings.

What actually happens is that the sstables are compacted every 60 seconds, but the new sstable is exactly the same as the previous one, even though the rows have expired and we are way past the gc_grace_seconds.

I have included an sstabledump below.

Does anyone know what I am doing wrong ?


Thanks 

Paul


sstabledump md-463-big-Data.db
[
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 33,
        "clustering" : [ 3 ],
        "liveness_info" : { "tstamp" : "2019-06-03T14:56:41.120579Z", "ttl" : 300, "expires_at" : "2019-06-03T15:01:41Z", "expired" : true },
        "cells" : [
          { "name" : "when", "deletion_info" : { "local_delete_time" : "2019-06-03T14:56:41Z" }
          }
        ]
      },
      {
        "type" : "row",
        "position" : 33,
        "clustering" : [ 4 ],
        "liveness_info" : { "tstamp" : "2019-06-03T14:56:45.499467Z", "ttl" : 300, "expires_at" : "2019-06-03T15:01:45Z", "expired" : true },
        "cells" : [
          { "name" : "when", "deletion_info" : { "local_delete_time" : "2019-06-03T14:56:45Z" }
          }
        ]
      },
      {
        "type" : "row",
        "position" : 51,
        "clustering" : [ 5 ],
        "liveness_info" : { "tstamp" : "2019-06-03T14:56:50.009615Z", "ttl" : 300, "expires_at" : "2019-06-03T15:01:50Z", "expired" : true },
        "cells" : [
          { "name" : "when", "deletion_info" : { "local_delete_time" : "2019-06-03T14:56:50Z" }
          }
        ]
      },
      {
        "type" : "row",
        "position" : 69,
        "clustering" : [ 6 ],
        "liveness_info" : { "tstamp" : "2019-06-03T14:56:54.926536Z", "ttl" : 300, "expires_at" : "2019-06-03T15:01:54Z", "expired" : true },
        "cells" : [
          { "name" : "when", "deletion_info" : { "local_delete_time" : "2019-06-03T14:56:54Z" }
          }
        ]
      },
      {
        "type" : "row",
        "position" : 87,
        "clustering" : [ 7 ],
        "liveness_info" : { "tstamp" : "2019-06-03T14:57:00.600615Z", "ttl" : 300, "expires_at" : "2019-06-03T15:02:00Z", "expired" : true },
        "cells" : [
          { "name" : "when", "deletion_info" : { "local_delete_time" : "2019-06-03T14:57:00Z" }
          }
        ]
      }
    ]
  }
]


> On 3 May 2019, at 19:59, Mike Torra <mt...@salesforce.com.INVALID> wrote:
> 
> Thx for the help Paul - there are definitely some details here I still don't fully understand, but this helped me resolve the problem and know what to look for in the future :)
> 
> On Fri, May 3, 2019 at 12:44 PM Paul Chandler <paul@redshots.com <ma...@redshots.com>> wrote:
> Hi Mike,
> 
> For TWCS the sstable can only be deleted when all the data has expired in that sstable, but you had a record without a ttl in it, so that sstable could never be deleted.
> 
> That bit is straight forward, the next bit I remember reading somewhere but can’t find it at the moment to confirm my thinking.
> 
> An sstable can only be deleted if it is the earliest sstable. I think this is due to the fact that deleting later sstables may expose old versions of the data stored in the stuck sstable which had been superseded. For example, if there was a tombstone in a later sstable for the non TTLed record causing the problem in this instance. Then deleting that sstable would cause that deleted data to reappear. (Someone please correct me if I have this wrong) 
> 
> Because sstables in different time buckets are never compacted together, this problem only goes away when you did the major compaction.
> 
> This would happen on all replicas of the data, hence the reason you this problem on 3 nodes.
> 
> Thanks 
> 
> Paul
> www.redshots.com <http://www.redshots.com/>
> 
>> On 3 May 2019, at 15:35, Mike Torra <mt...@salesforce.com.INVALID> wrote:
>> 
>> This does indeed seem to be a problem of overlapping sstables, but I don't understand why the data (and number of sstables) just continues to grow indefinitely. I also don't understand why this problem is only appearing on some nodes. Is it just a coincidence that the one rogue test row without a ttl is at the 'root' sstable causing the problem (ie, from the output of `sstableexpiredblockers`)?
>> 
>> Running a full compaction via `nodetool compact` reclaims the disk space, but I'd like to figure out why this happened and prevent it. Understanding why this problem would be isolated the way it is (ie only one CF even though I have a few others that share a very similar schema, and only some nodes) seems like it will help me prevent it.
>> 
>> 
>> On Thu, May 2, 2019 at 1:00 PM Paul Chandler <paul@redshots.com <ma...@redshots.com>> wrote:
>> Hi Mike,
>> 
>> It sounds like that record may have been deleted, if that is the case then it would still be shown in this sstable, but the deleted tombstone record would be in a later sstable. You can use nodetool getsstables to work out which sstables contain the data.
>> 
>> I recommend reading The Last Pickle post on this: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html <http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html> the sections towards the bottom of this post may well explain why the sstable is not being deleted.
>> 
>> Thanks 
>> 
>> Paul
>> www.redshots.com <http://www.redshots.com/>
>> 
>>> On 2 May 2019, at 16:08, Mike Torra <mtorra@salesforce.com.INVALID <ma...@salesforce.com.INVALID>> wrote:
>>> 
>>> I'm pretty stumped by this, so here is some more detail if it helps.
>>> 
>>> Here is what the suspicious partition looks like in the `sstabledump` output (some pii etc redacted):
>>> ```
>>> {
>>>     "partition" : {
>>>       "key" : [ "some_user_id_value", "user_id", "demo-test" ],
>>>       "position" : 210
>>>     },
>>>     "rows" : [
>>>       {
>>>         "type" : "row",
>>>         "position" : 1132,
>>>         "clustering" : [ "2019-01-22 15:27:45.000Z" ],
>>>         "liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
>>>         "cells" : [
>>>           { "some": "data" }
>>>         ]
>>>       }
>>>     ]
>>>   }
>>> ```
>>> 
>>> And here is what every other partition looks like:
>>> ```
>>> {
>>>     "partition" : {
>>>       "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
>>>       "position" : 1133
>>>     },
>>>     "rows" : [
>>>       {
>>>         "type" : "row",
>>>         "position" : 1234,
>>>         "clustering" : [ "2019-01-22 17:59:35.547Z" ],
>>>         "liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
>>>         "cells" : [
>>>           { "name" : "activity_data", "deletion_info" : { "local_delete_time" : "2019-01-22T17:59:35Z" }
>>>           }
>>>         ]
>>>       }
>>>     ]
>>>   }
>>> ```
>>> 
>>> As expected, almost all of the data except this one suspicious partition has a ttl and is already expired. But if a partition isn't expired and I see it in the sstable, why wouldn't I see it executing a CQL query against the CF? Why would this sstable be preventing so many other sstable's from getting cleaned up?
>>> 
>>> On Tue, Apr 30, 2019 at 12:34 PM Mike Torra <mtorra@salesforce.com <ma...@salesforce.com>> wrote:
>>> Hello -
>>> 
>>> I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few months ago I started noticing disk usage on some nodes increasing consistently. At first I solved the problem by destroying the nodes and rebuilding them, but the problem returns.
>>> 
>>> I did some more investigation recently, and this is what I found:
>>> - I narrowed the problem down to a CF that uses TWCS, by simply looking at disk space usage
>>> - in each region, 3 nodes have this problem of growing disk space (matches replication factor)
>>> - on each node, I tracked down the problem to a particular SSTable using `sstableexpiredblockers`
>>> - in the SSTable, using `sstabledump`, I found a row that does not have a ttl like the other rows, and appears to be from someone else on the team testing something and forgetting to include a ttl
>>> - all other rows show "expired: true" except this one, hence my suspicion
>>> - when I query for that particular partition key, I get no results
>>> - I tried deleting the row anyways, but that didn't seem to change anything
>>> - I also tried `nodetool scrub`, but that didn't help either
>>> 
>>> Would this rogue row without a ttl explain the problem? If so, why? If not, does anyone have any other ideas? Why does the row show in `sstabledump` but not when I query for it?
>>> 
>>> I appreciate any help or suggestions!
>>> 
>>> - Mike
>> 
>