You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Bryan Talbot <bt...@aeriagames.com> on 2013/01/16 20:39:15 UTC

LCS not removing rows with all TTL expired columns

On cassandra 1.1.5 with a write heavy workload, we're having problems
getting rows to be compacted away (removed) even though all columns have
expired TTL.  We've tried size tiered and now leveled and are seeing the
same symptom: the data stays around essentially forever.

Currently we write all columns with a TTL of 72 hours (259200 seconds) and
expect to add 10 GB of data to this CF per day per node.  Each node
currently has 73 GB for the affected CF and shows no indications that old
rows will be removed on their own.

Why aren't rows being removed?  Below is some data from a sample row which
should have been removed several days ago but is still around even though
it has been involved in numerous compactions since being expired.

$> ./bin/nodetool -h localhost getsstables metrics request_summary
459fb460-5ace-11e2-9b92-11d67b6163b4
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$> ls -alF
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
-rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$> ./bin/sstable2json
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
-k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
{
"34353966623436302d356163652d313165322d396239322d313164363762363136336234":
[["app_name","50f21d3d",1357785277207001,"d"],
["client_ip","50f21d3d",1357785277207001,"d"],
["client_req_id","50f21d3d",1357785277207001,"d"],
["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
["mysql_duration_us","50f21d3d",1357785277207001,"d"],
["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
["req_duration_us","50f21d3d",1357785277207001,"d"],
["req_finish_time_us","50f21d3d",1357785277207001,"d"],
["req_method","50f21d3d",1357785277207001,"d"],
["req_service","50f21d3d",1357785277207001,"d"],
["req_start_time_us","50f21d3d",1357785277207001,"d"],
["success","50f21d3d",1357785277207001,"d"]]
}


Decoding the column timestamps to shows that the columns were written at
"Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
Jan 16 which is 3 days after all columns have TTL-ed out.


The schema shows that gc_grace is set to 0 since this data is write-once,
read-seldom and is never updated or deleted.

create column family request_summary
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 0
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and bloom_filter_fp_chance = 1.0
  and compression_options = {'chunk_length_kb' : '64',
'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};


Thanks in advance for help in understanding why rows such as this are not
removed!

-Bryan

Re: LCS not removing rows with all TTL expired columns

Posted by Derek Williams <de...@fyrie.net>.

Thanks for letting us know. I also have a some tables with a lot of
activity and very short ttls, and while I haven't experienced this problem,
it's good to know just in case.


On Tue, Jan 22, 2013 at 7:35 PM, Bryan Talbot <bt...@aeriagames.com>wrote:

> It turns out that having gc_grace=0 isn't required to produce the problem.
>  My colleague did a lot of digging into the compaction code and we think
> he's found the issue.  It's detailed in
> https://issues.apache.org/jira/browse/CASSANDRA-5182
>
> Basically tombstones for a row will not be removed from an SSTable during
> compaction if the row appears in other SSTables; however, the compaction
> code checks the bloom filters to make this determination.  Since this data
> is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows
> seem to appear in every SSTable as far as compaction is concerned.
>
> This caused our data to essentially never be removed when using either
> STSC or LCS and will probably affect anyone else running 1.1 with high
> bloom filter fp ratios.
>
> Setting our fp ratio to 0.1, running upgradesstables and running the
> application as it was before seems to have stabilized the load as desired
> at the expense of additional jvm memory.
>
> -Bryan
>
>
> On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot <bt...@aeriagames.com>wrote:
>
>> Bleh, I rushed out the email before some meetings and I messed something
>> up.  Working on reproducing now with better notes this time.
>>
>> -Bryan
>>
>>
>>
>> On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams <de...@fyrie.net> wrote:
>>
>>> When you ran this test, is that the exact schema you used? I'm not
>>> seeing where you are setting gc_grace to 0 (although I could just be blind,
>>> it happens).
>>>
>>>
>>> On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot <bt...@aeriagames.com>wrote:
>>>
>>>> I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
>>>> 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
>>>> the TTL is small enough so that all LCS data fits in generation 0 then the
>>>> rows seem to be removed with TTL expires as desired.  However, if the
>>>> insertion rate is high enough or the TTL long enough then the data keep
>>>> accumulating for far longer than expected.
>>>>
>>>> Using 120 second TTL and a single threaded php insertion script my MBP
>>>> with SSD retained almost all of the data.  120 seconds should accumulate
>>>> 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
>>>> for the cassandra load to level off at some reasonable value near 10 MB.
>>>>  After running for 2 hours and with a cassandra load of ~550 MB I stopped
>>>> the test.
>>>>
>>>> The schema is
>>>>
>>>> create keyspace test
>>>>   with placement_strategy = 'SimpleStrategy'
>>>>   and strategy_options = {replication_factor : 1}
>>>>   and durable_writes = true;
>>>>
>>>> use test;
>>>>
>>>> create column family test
>>>>   with column_type = 'Standard'
>>>>   and comparator = 'UTF8Type'
>>>>   and default_validation_class = 'UTF8Type'
>>>>   and key_validation_class = 'TimeUUIDType'
>>>>   and compaction_strategy =
>>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>>>>   and caching = 'NONE'
>>>>   and bloom_filter_fp_chance = 1.0
>>>>   and column_metadata = [
>>>>     {column_name : 'a',
>>>>     validation_class : LongType}];
>>>>
>>>>
>>>> and the insert script is
>>>>
>>>> <?php
>>>>
>>>> require_once('phpcassa/1.0.a.5/autoload.php');
>>>>
>>>> use phpcassa\Connection\ConnectionPool;
>>>> use phpcassa\ColumnFamily;
>>>> use phpcassa\SystemManager;
>>>> use phpcassa\UUID;
>>>>
>>>> // Connect to test keyspace and column family
>>>> $sys = new SystemManager('127.0.0.1');
>>>>
>>>> // Start a connection pool, create our ColumnFamily instance
>>>> $pool = new ConnectionPool('test', array('127.0.0.1'));
>>>> $testCf = new ColumnFamily($pool, 'test');
>>>>
>>>> // Insert records
>>>> while( 1 ) {
>>>>   $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120);
>>>> }
>>>>
>>>> // Close our connections
>>>> $pool->close();
>>>> $sys->close();
>>>>
>>>> ?>
>>>>
>>>>
>>>> -Bryan
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot <bt...@aeriagames.com>wrote:
>>>>
>>>>> We are using LCS and the particular row I've referenced has been
>>>>> involved in several compactions after all columns have TTL expired.  The
>>>>> most recent one was again this morning and the row is still there -- TTL
>>>>> expired for several days now with gc_grace=0 and several compactions later
>>>>> ...
>>>>>
>>>>>
>>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>>>>>
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>>
>>>>> $> ls -alF
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>>
>>>>>
>>>>> $> ./bin/sstable2json
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>>>>  {
>>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>>> ["success","50f21d3d",1357785277207001,"d"]]
>>>>> }
>>>>>
>>>>>
>>>>> My experience with TTL columns so far has been pretty similar to
>>>>> Viktor's in that the only way to keep them row count under control is to
>>>>> force major compactions.  In real world use, STCS and LCS both leave TTL
>>>>> expired rows around forever as far as I can tell.  When testing with
>>>>> minimal data, removal of TTL expired rows seem to work as expected but in
>>>>> this case there seems to be some divergence from real life work and test
>>>>> samples.
>>>>>
>>>>> -Bryan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
>>>>> Viktor.Jevdokimov@adform.com> wrote:
>>>>>
>>>>>>  @Bryan,****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> To keep data size as low as possible with TTL columns we still use
>>>>>> STCS and nightly major compactions.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Experience with LCS was not successful in our case, data size keeps
>>>>>> too high along with amount of compactions.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> IMO, before 1.2, LCS was good for CFs without TTL or high delete
>>>>>> rate. I have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>    Best regards / Pagarbiai
>>>>>> *Viktor Jevdokimov*
>>>>>> Senior Developer
>>>>>>
>>>>>> Email: Viktor.Jevdokimov@adform.com
>>>>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>>>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>>>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>>>>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>>>>>  [image: Adform News] <http://www.adform.com>
>>>>>> [image: Adform awarded the Best Employer 2012]
>>>>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>>>>
>>>>>> Disclaimer: The information contained in this message and attachments
>>>>>> is intended solely for the attention and use of the named addressee and may
>>>>>> be confidential. If you are not the intended recipient, you are reminded
>>>>>> that the information remains the property of the sender. You must not use,
>>>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>>>> received this message in error, please contact the sender immediately and
>>>>>> irrevocably delete this message and any copies.
>>>>>>
>>>>>>   *From:* aaron morton [mailto:aaron@thelastpickle.com]
>>>>>> *Sent:* Thursday, January 17, 2013 06:24
>>>>>> *To:* user@cassandra.apache.org
>>>>>> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Minor compaction (with Size Tiered) will only purge tombstones if all
>>>>>> fragments of a row are contained in the SSTables being compacted. So if you
>>>>>> have a long lived row, that is present in many size tiers, the columns will
>>>>>> not be purged. ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>>   (thus compacted compacted) 3 days after all columns for that row
>>>>>> had expired****
>>>>>>
>>>>>> Tombstones have to get on disk, even if you set the gc_grace_seconds
>>>>>> to 0. If not they do not get a chance to delete previous versions of the
>>>>>> column which already exist on disk. So when the compaction ran your
>>>>>> ExpiringColumn was turned into a DeletedColumn and placed on disk. **
>>>>>> **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> I would expect the next round of compaction to remove these columns.
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> There is a new feature in 1.2 that may help you here. It will do a
>>>>>> special compaction of individual sstables when they have a certain
>>>>>> proportion of dead columns
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Also interested to know if LCS helps. ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Cheers****
>>>>>>
>>>>>>  ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> -----------------****
>>>>>>
>>>>>> Aaron Morton****
>>>>>>
>>>>>> Freelance Cassandra Developer****
>>>>>>
>>>>>> New Zealand****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> @aaronmorton****
>>>>>>
>>>>>> http://www.thelastpickle.com****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> On 17/01/2013, at 2:55 PM, Bryan Talbot <bt...@aeriagames.com>
>>>>>> wrote:****
>>>>>>
>>>>>>
>>>>>>
>>>>>> ****
>>>>>>
>>>>>> According to the timestamps (see original post) the SSTable was
>>>>>> written (thus compacted compacted) 3 days after all columns for that row
>>>>>> had expired and 6 days after the row was created; yet all columns are still
>>>>>> showing up in the SSTable.  Note that the column shows now rows when a
>>>>>> "get" for that key is run so that's working correctly, but the data is
>>>>>> lugged around far longer than it should be -- maybe forever.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> -Bryan****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com>
>>>>>> wrote:****
>>>>>>
>>>>>> To get column removed you have to meet two requirements ****
>>>>>>
>>>>>> 1. column should be expired****
>>>>>>
>>>>>> 2. after that CF gets compacted****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> I guess your expired columns are propagated to high tier CF, which
>>>>>> gets compacted rarely.****
>>>>>>
>>>>>> So, you have to wait when high tier CF gets compacted.  ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Andrey****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <
>>>>>> btalbot@aeriagames.com> wrote:****
>>>>>>
>>>>>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>>>>>> getting rows to be compacted away (removed) even though all columns have
>>>>>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>>>>>> same symptom: the data stays around essentially forever.  ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Currently we write all columns with a TTL of 72 hours (259200
>>>>>> seconds) and expect to add 10 GB of data to this CF per day per node.  Each
>>>>>> node currently has 73 GB for the affected CF and shows no indications that
>>>>>> old rows will be removed on their own.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Why aren't rows being removed?  Below is some data from a sample row
>>>>>> which should have been removed several days ago but is still around even
>>>>>> though it has been involved in numerous compactions since being expired.
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>>>>>>
>>>>>>
>>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> $> ls -alF
>>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>>> ****
>>>>>>
>>>>>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> $> ./bin/sstable2json
>>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>>>>> ****
>>>>>>
>>>>>> {****
>>>>>>
>>>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>>>> ["success","50f21d3d",1357785277207001,"d"]]****
>>>>>>
>>>>>> }****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Decoding the column timestamps to shows that the columns were written
>>>>>> at "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13
>>>>>> Jan 2013 02:34:37 GMT".  The date of the SSTable shows that it was
>>>>>> generated on Jan 16 which is 3 days after all columns have TTL-ed out.
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> The schema shows that gc_grace is set to 0 since this data is
>>>>>> write-once, read-seldom and is never updated or deleted.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> create column family request_summary****
>>>>>>
>>>>>>   with column_type = 'Standard'****
>>>>>>
>>>>>>   and comparator = 'UTF8Type'****
>>>>>>
>>>>>>   and default_validation_class = 'UTF8Type'****
>>>>>>
>>>>>>   and key_validation_class = 'UTF8Type'****
>>>>>>
>>>>>>   and read_repair_chance = 0.1****
>>>>>>
>>>>>>   and dclocal_read_repair_chance = 0.0****
>>>>>>
>>>>>>   and gc_grace = 0****
>>>>>>
>>>>>>   and min_compaction_threshold = 4****
>>>>>>
>>>>>>   and max_compaction_threshold = 32****
>>>>>>
>>>>>>   and replicate_on_write = true****
>>>>>>
>>>>>>   and compaction_strategy =
>>>>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>>>>>>
>>>>>>   and caching = 'NONE'****
>>>>>>
>>>>>>   and bloom_filter_fp_chance = 1.0****
>>>>>>
>>>>>>   and compression_options = {'chunk_length_kb' : '64',
>>>>>> 'sstable_compression' :
>>>>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Thanks in advance for help in understanding why rows such as this are
>>>>>> not removed!****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> -Bryan****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Derek Williams
>>>
>>
>


-- 
Derek Williams

Re: LCS not removing rows with all TTL expired columns

Posted by Bryan Talbot <bt...@aeriagames.com>.

It turns out that having gc_grace=0 isn't required to produce the problem.
 My colleague did a lot of digging into the compaction code and we think
he's found the issue.  It's detailed in
https://issues.apache.org/jira/browse/CASSANDRA-5182

Basically tombstones for a row will not be removed from an SSTable during
compaction if the row appears in other SSTables; however, the compaction
code checks the bloom filters to make this determination.  Since this data
is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows
seem to appear in every SSTable as far as compaction is concerned.

This caused our data to essentially never be removed when using either STSC
or LCS and will probably affect anyone else running 1.1 with high bloom
filter fp ratios.

Setting our fp ratio to 0.1, running upgradesstables and running the
application as it was before seems to have stabilized the load as desired
at the expense of additional jvm memory.

-Bryan


On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot <bt...@aeriagames.com>wrote:

> Bleh, I rushed out the email before some meetings and I messed something
> up.  Working on reproducing now with better notes this time.
>
> -Bryan
>
>
>
> On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams <de...@fyrie.net> wrote:
>
>> When you ran this test, is that the exact schema you used? I'm not seeing
>> where you are setting gc_grace to 0 (although I could just be blind, it
>> happens).
>>
>>
>> On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot <bt...@aeriagames.com>wrote:
>>
>>> I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
>>> 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
>>> the TTL is small enough so that all LCS data fits in generation 0 then the
>>> rows seem to be removed with TTL expires as desired.  However, if the
>>> insertion rate is high enough or the TTL long enough then the data keep
>>> accumulating for far longer than expected.
>>>
>>> Using 120 second TTL and a single threaded php insertion script my MBP
>>> with SSD retained almost all of the data.  120 seconds should accumulate
>>> 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
>>> for the cassandra load to level off at some reasonable value near 10 MB.
>>>  After running for 2 hours and with a cassandra load of ~550 MB I stopped
>>> the test.
>>>
>>> The schema is
>>>
>>> create keyspace test
>>>   with placement_strategy = 'SimpleStrategy'
>>>   and strategy_options = {replication_factor : 1}
>>>   and durable_writes = true;
>>>
>>> use test;
>>>
>>> create column family test
>>>   with column_type = 'Standard'
>>>   and comparator = 'UTF8Type'
>>>   and default_validation_class = 'UTF8Type'
>>>   and key_validation_class = 'TimeUUIDType'
>>>   and compaction_strategy =
>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>>>   and caching = 'NONE'
>>>   and bloom_filter_fp_chance = 1.0
>>>   and column_metadata = [
>>>     {column_name : 'a',
>>>     validation_class : LongType}];
>>>
>>>
>>> and the insert script is
>>>
>>> <?php
>>>
>>> require_once('phpcassa/1.0.a.5/autoload.php');
>>>
>>> use phpcassa\Connection\ConnectionPool;
>>> use phpcassa\ColumnFamily;
>>> use phpcassa\SystemManager;
>>> use phpcassa\UUID;
>>>
>>> // Connect to test keyspace and column family
>>> $sys = new SystemManager('127.0.0.1');
>>>
>>> // Start a connection pool, create our ColumnFamily instance
>>> $pool = new ConnectionPool('test', array('127.0.0.1'));
>>> $testCf = new ColumnFamily($pool, 'test');
>>>
>>> // Insert records
>>> while( 1 ) {
>>>   $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120);
>>> }
>>>
>>> // Close our connections
>>> $pool->close();
>>> $sys->close();
>>>
>>> ?>
>>>
>>>
>>> -Bryan
>>>
>>>
>>>
>>>
>>> On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot <bt...@aeriagames.com>wrote:
>>>
>>>> We are using LCS and the particular row I've referenced has been
>>>> involved in several compactions after all columns have TTL expired.  The
>>>> most recent one was again this morning and the row is still there -- TTL
>>>> expired for several days now with gc_grace=0 and several compactions later
>>>> ...
>>>>
>>>>
>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>>>>
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>
>>>> $> ls -alF
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>
>>>>
>>>> $> ./bin/sstable2json
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>>>  {
>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>> ["success","50f21d3d",1357785277207001,"d"]]
>>>> }
>>>>
>>>>
>>>> My experience with TTL columns so far has been pretty similar to
>>>> Viktor's in that the only way to keep them row count under control is to
>>>> force major compactions.  In real world use, STCS and LCS both leave TTL
>>>> expired rows around forever as far as I can tell.  When testing with
>>>> minimal data, removal of TTL expired rows seem to work as expected but in
>>>> this case there seems to be some divergence from real life work and test
>>>> samples.
>>>>
>>>> -Bryan
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
>>>> Viktor.Jevdokimov@adform.com> wrote:
>>>>
>>>>>  @Bryan,****
>>>>>
>>>>> ** **
>>>>>
>>>>> To keep data size as low as possible with TTL columns we still use
>>>>> STCS and nightly major compactions.****
>>>>>
>>>>> ** **
>>>>>
>>>>> Experience with LCS was not successful in our case, data size keeps
>>>>> too high along with amount of compactions.****
>>>>>
>>>>> ** **
>>>>>
>>>>> IMO, before 1.2, LCS was good for CFs without TTL or high delete rate.
>>>>> I have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>    Best regards / Pagarbiai
>>>>> *Viktor Jevdokimov*
>>>>> Senior Developer
>>>>>
>>>>> Email: Viktor.Jevdokimov@adform.com
>>>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>>>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>>>>  [image: Adform News] <http://www.adform.com>
>>>>> [image: Adform awarded the Best Employer 2012]
>>>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>>>
>>>>> Disclaimer: The information contained in this message and attachments
>>>>> is intended solely for the attention and use of the named addressee and may
>>>>> be confidential. If you are not the intended recipient, you are reminded
>>>>> that the information remains the property of the sender. You must not use,
>>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>>> received this message in error, please contact the sender immediately and
>>>>> irrevocably delete this message and any copies.
>>>>>
>>>>>   *From:* aaron morton [mailto:aaron@thelastpickle.com]
>>>>> *Sent:* Thursday, January 17, 2013 06:24
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>>>>>
>>>>> ** **
>>>>>
>>>>> Minor compaction (with Size Tiered) will only purge tombstones if all
>>>>> fragments of a row are contained in the SSTables being compacted. So if you
>>>>> have a long lived row, that is present in many size tiers, the columns will
>>>>> not be purged. ****
>>>>>
>>>>> ** **
>>>>>
>>>>>   (thus compacted compacted) 3 days after all columns for that row
>>>>> had expired****
>>>>>
>>>>> Tombstones have to get on disk, even if you set the gc_grace_seconds
>>>>> to 0. If not they do not get a chance to delete previous versions of the
>>>>> column which already exist on disk. So when the compaction ran your
>>>>> ExpiringColumn was turned into a DeletedColumn and placed on disk. ***
>>>>> *
>>>>>
>>>>> ** **
>>>>>
>>>>> I would expect the next round of compaction to remove these columns. *
>>>>> ***
>>>>>
>>>>> ** **
>>>>>
>>>>> There is a new feature in 1.2 that may help you here. It will do a
>>>>> special compaction of individual sstables when they have a certain
>>>>> proportion of dead columns
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>>>>>
>>>>> ** **
>>>>>
>>>>> Also interested to know if LCS helps. ****
>>>>>
>>>>> ** **
>>>>>
>>>>> Cheers****
>>>>>
>>>>>  ****
>>>>>
>>>>> ** **
>>>>>
>>>>> -----------------****
>>>>>
>>>>> Aaron Morton****
>>>>>
>>>>> Freelance Cassandra Developer****
>>>>>
>>>>> New Zealand****
>>>>>
>>>>> ** **
>>>>>
>>>>> @aaronmorton****
>>>>>
>>>>> http://www.thelastpickle.com****
>>>>>
>>>>> ** **
>>>>>
>>>>> On 17/01/2013, at 2:55 PM, Bryan Talbot <bt...@aeriagames.com>
>>>>> wrote:****
>>>>>
>>>>>
>>>>>
>>>>> ****
>>>>>
>>>>> According to the timestamps (see original post) the SSTable was
>>>>> written (thus compacted compacted) 3 days after all columns for that row
>>>>> had expired and 6 days after the row was created; yet all columns are still
>>>>> showing up in the SSTable.  Note that the column shows now rows when a
>>>>> "get" for that key is run so that's working correctly, but the data is
>>>>> lugged around far longer than it should be -- maybe forever.****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> -Bryan****
>>>>>
>>>>> ** **
>>>>>
>>>>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com>
>>>>> wrote:****
>>>>>
>>>>> To get column removed you have to meet two requirements ****
>>>>>
>>>>> 1. column should be expired****
>>>>>
>>>>> 2. after that CF gets compacted****
>>>>>
>>>>> ** **
>>>>>
>>>>> I guess your expired columns are propagated to high tier CF, which
>>>>> gets compacted rarely.****
>>>>>
>>>>> So, you have to wait when high tier CF gets compacted.  ****
>>>>>
>>>>> ** **
>>>>>
>>>>> Andrey****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com>
>>>>> wrote:****
>>>>>
>>>>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>>>>> getting rows to be compacted away (removed) even though all columns have
>>>>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>>>>> same symptom: the data stays around essentially forever.  ****
>>>>>
>>>>> ** **
>>>>>
>>>>> Currently we write all columns with a TTL of 72 hours (259200 seconds)
>>>>> and expect to add 10 GB of data to this CF per day per node.  Each node
>>>>> currently has 73 GB for the affected CF and shows no indications that old
>>>>> rows will be removed on their own.****
>>>>>
>>>>> ** **
>>>>>
>>>>> Why aren't rows being removed?  Below is some data from a sample row
>>>>> which should have been removed several days ago but is still around even
>>>>> though it has been involved in numerous compactions since being expired.
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>>>>>
>>>>>
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> $> ls -alF
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>> ****
>>>>>
>>>>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> $> ./bin/sstable2json
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>>>> ****
>>>>>
>>>>> {****
>>>>>
>>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>>> ["success","50f21d3d",1357785277207001,"d"]]****
>>>>>
>>>>> }****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Decoding the column timestamps to shows that the columns were written
>>>>> at "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13
>>>>> Jan 2013 02:34:37 GMT".  The date of the SSTable shows that it was
>>>>> generated on Jan 16 which is 3 days after all columns have TTL-ed out.
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> The schema shows that gc_grace is set to 0 since this data is
>>>>> write-once, read-seldom and is never updated or deleted.****
>>>>>
>>>>> ** **
>>>>>
>>>>> create column family request_summary****
>>>>>
>>>>>   with column_type = 'Standard'****
>>>>>
>>>>>   and comparator = 'UTF8Type'****
>>>>>
>>>>>   and default_validation_class = 'UTF8Type'****
>>>>>
>>>>>   and key_validation_class = 'UTF8Type'****
>>>>>
>>>>>   and read_repair_chance = 0.1****
>>>>>
>>>>>   and dclocal_read_repair_chance = 0.0****
>>>>>
>>>>>   and gc_grace = 0****
>>>>>
>>>>>   and min_compaction_threshold = 4****
>>>>>
>>>>>   and max_compaction_threshold = 32****
>>>>>
>>>>>   and replicate_on_write = true****
>>>>>
>>>>>   and compaction_strategy =
>>>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>>>>>
>>>>>   and caching = 'NONE'****
>>>>>
>>>>>   and bloom_filter_fp_chance = 1.0****
>>>>>
>>>>>   and compression_options = {'chunk_length_kb' : '64',
>>>>> 'sstable_compression' :
>>>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks in advance for help in understanding why rows such as this are
>>>>> not removed!****
>>>>>
>>>>> ** **
>>>>>
>>>>> -Bryan****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Derek Williams
>>
>

Re: LCS not removing rows with all TTL expired columns

Posted by Bryan Talbot <bt...@aeriagames.com>.

Bleh, I rushed out the email before some meetings and I messed something
up.  Working on reproducing now with better notes this time.

-Bryan



On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams <de...@fyrie.net> wrote:

> When you ran this test, is that the exact schema you used? I'm not seeing
> where you are setting gc_grace to 0 (although I could just be blind, it
> happens).
>
>
> On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot <bt...@aeriagames.com>wrote:
>
>> I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
>> 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
>> the TTL is small enough so that all LCS data fits in generation 0 then the
>> rows seem to be removed with TTL expires as desired.  However, if the
>> insertion rate is high enough or the TTL long enough then the data keep
>> accumulating for far longer than expected.
>>
>> Using 120 second TTL and a single threaded php insertion script my MBP
>> with SSD retained almost all of the data.  120 seconds should accumulate
>> 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
>> for the cassandra load to level off at some reasonable value near 10 MB.
>>  After running for 2 hours and with a cassandra load of ~550 MB I stopped
>> the test.
>>
>> The schema is
>>
>> create keyspace test
>>   with placement_strategy = 'SimpleStrategy'
>>   and strategy_options = {replication_factor : 1}
>>   and durable_writes = true;
>>
>> use test;
>>
>> create column family test
>>   with column_type = 'Standard'
>>   and comparator = 'UTF8Type'
>>   and default_validation_class = 'UTF8Type'
>>   and key_validation_class = 'TimeUUIDType'
>>   and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>>   and caching = 'NONE'
>>   and bloom_filter_fp_chance = 1.0
>>   and column_metadata = [
>>     {column_name : 'a',
>>     validation_class : LongType}];
>>
>>
>> and the insert script is
>>
>> <?php
>>
>> require_once('phpcassa/1.0.a.5/autoload.php');
>>
>> use phpcassa\Connection\ConnectionPool;
>> use phpcassa\ColumnFamily;
>> use phpcassa\SystemManager;
>> use phpcassa\UUID;
>>
>> // Connect to test keyspace and column family
>> $sys = new SystemManager('127.0.0.1');
>>
>> // Start a connection pool, create our ColumnFamily instance
>> $pool = new ConnectionPool('test', array('127.0.0.1'));
>> $testCf = new ColumnFamily($pool, 'test');
>>
>> // Insert records
>> while( 1 ) {
>>   $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120);
>> }
>>
>> // Close our connections
>> $pool->close();
>> $sys->close();
>>
>> ?>
>>
>>
>> -Bryan
>>
>>
>>
>>
>> On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot <bt...@aeriagames.com>wrote:
>>
>>> We are using LCS and the particular row I've referenced has been
>>> involved in several compactions after all columns have TTL expired.  The
>>> most recent one was again this morning and the row is still there -- TTL
>>> expired for several days now with gc_grace=0 and several compactions later
>>> ...
>>>
>>>
>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>>>
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>
>>> $> ls -alF
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>
>>>
>>> $> ./bin/sstable2json
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>>  {
>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>> ["success","50f21d3d",1357785277207001,"d"]]
>>> }
>>>
>>>
>>> My experience with TTL columns so far has been pretty similar to
>>> Viktor's in that the only way to keep them row count under control is to
>>> force major compactions.  In real world use, STCS and LCS both leave TTL
>>> expired rows around forever as far as I can tell.  When testing with
>>> minimal data, removal of TTL expired rows seem to work as expected but in
>>> this case there seems to be some divergence from real life work and test
>>> samples.
>>>
>>> -Bryan
>>>
>>>
>>>
>>>
>>> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
>>> Viktor.Jevdokimov@adform.com> wrote:
>>>
>>>>  @Bryan,****
>>>>
>>>> ** **
>>>>
>>>> To keep data size as low as possible with TTL columns we still use STCS
>>>> and nightly major compactions.****
>>>>
>>>> ** **
>>>>
>>>> Experience with LCS was not successful in our case, data size keeps too
>>>> high along with amount of compactions.****
>>>>
>>>> ** **
>>>>
>>>> IMO, before 1.2, LCS was good for CFs without TTL or high delete rate.
>>>> I have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>    Best regards / Pagarbiai
>>>> *Viktor Jevdokimov*
>>>> Senior Developer
>>>>
>>>> Email: Viktor.Jevdokimov@adform.com
>>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>>>  [image: Adform News] <http://www.adform.com>
>>>> [image: Adform awarded the Best Employer 2012]
>>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>>
>>>> Disclaimer: The information contained in this message and attachments
>>>> is intended solely for the attention and use of the named addressee and may
>>>> be confidential. If you are not the intended recipient, you are reminded
>>>> that the information remains the property of the sender. You must not use,
>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>> received this message in error, please contact the sender immediately and
>>>> irrevocably delete this message and any copies.
>>>>
>>>>   *From:* aaron morton [mailto:aaron@thelastpickle.com]
>>>> *Sent:* Thursday, January 17, 2013 06:24
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>>>>
>>>> ** **
>>>>
>>>> Minor compaction (with Size Tiered) will only purge tombstones if all
>>>> fragments of a row are contained in the SSTables being compacted. So if you
>>>> have a long lived row, that is present in many size tiers, the columns will
>>>> not be purged. ****
>>>>
>>>> ** **
>>>>
>>>>   (thus compacted compacted) 3 days after all columns for that row had
>>>> expired****
>>>>
>>>> Tombstones have to get on disk, even if you set the gc_grace_seconds to
>>>> 0. If not they do not get a chance to delete previous versions of the
>>>> column which already exist on disk. So when the compaction ran your
>>>> ExpiringColumn was turned into a DeletedColumn and placed on disk. ****
>>>>
>>>> ** **
>>>>
>>>> I would expect the next round of compaction to remove these columns. **
>>>> **
>>>>
>>>> ** **
>>>>
>>>> There is a new feature in 1.2 that may help you here. It will do a
>>>> special compaction of individual sstables when they have a certain
>>>> proportion of dead columns
>>>> https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>>>>
>>>> ** **
>>>>
>>>> Also interested to know if LCS helps. ****
>>>>
>>>> ** **
>>>>
>>>> Cheers****
>>>>
>>>>  ****
>>>>
>>>> ** **
>>>>
>>>> -----------------****
>>>>
>>>> Aaron Morton****
>>>>
>>>> Freelance Cassandra Developer****
>>>>
>>>> New Zealand****
>>>>
>>>> ** **
>>>>
>>>> @aaronmorton****
>>>>
>>>> http://www.thelastpickle.com****
>>>>
>>>> ** **
>>>>
>>>> On 17/01/2013, at 2:55 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
>>>> ****
>>>>
>>>>
>>>>
>>>> ****
>>>>
>>>> According to the timestamps (see original post) the SSTable was written
>>>> (thus compacted compacted) 3 days after all columns for that row had
>>>> expired and 6 days after the row was created; yet all columns are still
>>>> showing up in the SSTable.  Note that the column shows now rows when a
>>>> "get" for that key is run so that's working correctly, but the data is
>>>> lugged around far longer than it should be -- maybe forever.****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> -Bryan****
>>>>
>>>> ** **
>>>>
>>>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com>
>>>> wrote:****
>>>>
>>>> To get column removed you have to meet two requirements ****
>>>>
>>>> 1. column should be expired****
>>>>
>>>> 2. after that CF gets compacted****
>>>>
>>>> ** **
>>>>
>>>> I guess your expired columns are propagated to high tier CF, which gets
>>>> compacted rarely.****
>>>>
>>>> So, you have to wait when high tier CF gets compacted.  ****
>>>>
>>>> ** **
>>>>
>>>> Andrey****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com>
>>>> wrote:****
>>>>
>>>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>>>> getting rows to be compacted away (removed) even though all columns have
>>>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>>>> same symptom: the data stays around essentially forever.  ****
>>>>
>>>> ** **
>>>>
>>>> Currently we write all columns with a TTL of 72 hours (259200 seconds)
>>>> and expect to add 10 GB of data to this CF per day per node.  Each node
>>>> currently has 73 GB for the affected CF and shows no indications that old
>>>> rows will be removed on their own.****
>>>>
>>>> ** **
>>>>
>>>> Why aren't rows being removed?  Below is some data from a sample row
>>>> which should have been removed several days ago but is still around even
>>>> though it has been involved in numerous compactions since being expired.
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>>>>
>>>>
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> $> ls -alF
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>> ****
>>>>
>>>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> $> ./bin/sstable2json
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>>> ****
>>>>
>>>> {****
>>>>
>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>> ["success","50f21d3d",1357785277207001,"d"]]****
>>>>
>>>> }****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Decoding the column timestamps to shows that the columns were written
>>>> at "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13
>>>> Jan 2013 02:34:37 GMT".  The date of the SSTable shows that it was
>>>> generated on Jan 16 which is 3 days after all columns have TTL-ed out.*
>>>> ***
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> The schema shows that gc_grace is set to 0 since this data is
>>>> write-once, read-seldom and is never updated or deleted.****
>>>>
>>>> ** **
>>>>
>>>> create column family request_summary****
>>>>
>>>>   with column_type = 'Standard'****
>>>>
>>>>   and comparator = 'UTF8Type'****
>>>>
>>>>   and default_validation_class = 'UTF8Type'****
>>>>
>>>>   and key_validation_class = 'UTF8Type'****
>>>>
>>>>   and read_repair_chance = 0.1****
>>>>
>>>>   and dclocal_read_repair_chance = 0.0****
>>>>
>>>>   and gc_grace = 0****
>>>>
>>>>   and min_compaction_threshold = 4****
>>>>
>>>>   and max_compaction_threshold = 32****
>>>>
>>>>   and replicate_on_write = true****
>>>>
>>>>   and compaction_strategy =
>>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>>>>
>>>>   and caching = 'NONE'****
>>>>
>>>>   and bloom_filter_fp_chance = 1.0****
>>>>
>>>>   and compression_options = {'chunk_length_kb' : '64',
>>>> 'sstable_compression' :
>>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Thanks in advance for help in understanding why rows such as this are
>>>> not removed!****
>>>>
>>>> ** **
>>>>
>>>> -Bryan****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>
>>>
>>>
>>
>
>
> --
> Derek Williams
>



-- 
Bryan Talbot
Architect / Platform team lead, Aeria Games and Entertainment
Silicon Valley | Berlin | Tokyo | Sao Paulo

Re: LCS not removing rows with all TTL expired columns

Posted by Derek Williams <de...@fyrie.net>.

When you ran this test, is that the exact schema you used? I'm not seeing
where you are setting gc_grace to 0 (although I could just be blind, it
happens).


On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot <bt...@aeriagames.com>wrote:

> I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
> 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
> the TTL is small enough so that all LCS data fits in generation 0 then the
> rows seem to be removed with TTL expires as desired.  However, if the
> insertion rate is high enough or the TTL long enough then the data keep
> accumulating for far longer than expected.
>
> Using 120 second TTL and a single threaded php insertion script my MBP
> with SSD retained almost all of the data.  120 seconds should accumulate
> 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
> for the cassandra load to level off at some reasonable value near 10 MB.
>  After running for 2 hours and with a cassandra load of ~550 MB I stopped
> the test.
>
> The schema is
>
> create keyspace test
>   with placement_strategy = 'SimpleStrategy'
>   and strategy_options = {replication_factor : 1}
>   and durable_writes = true;
>
> use test;
>
> create column family test
>   with column_type = 'Standard'
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'TimeUUIDType'
>   and compaction_strategy =
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>   and caching = 'NONE'
>   and bloom_filter_fp_chance = 1.0
>   and column_metadata = [
>     {column_name : 'a',
>     validation_class : LongType}];
>
>
> and the insert script is
>
> <?php
>
> require_once('phpcassa/1.0.a.5/autoload.php');
>
> use phpcassa\Connection\ConnectionPool;
> use phpcassa\ColumnFamily;
> use phpcassa\SystemManager;
> use phpcassa\UUID;
>
> // Connect to test keyspace and column family
> $sys = new SystemManager('127.0.0.1');
>
> // Start a connection pool, create our ColumnFamily instance
> $pool = new ConnectionPool('test', array('127.0.0.1'));
> $testCf = new ColumnFamily($pool, 'test');
>
> // Insert records
> while( 1 ) {
>   $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120);
> }
>
> // Close our connections
> $pool->close();
> $sys->close();
>
> ?>
>
>
> -Bryan
>
>
>
>
> On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot <bt...@aeriagames.com>wrote:
>
>> We are using LCS and the particular row I've referenced has been involved
>> in several compactions after all columns have TTL expired.  The most recent
>> one was again this morning and the row is still there -- TTL expired for
>> several days now with gc_grace=0 and several compactions later ...
>>
>>
>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>>
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>
>> $> ls -alF
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>
>>
>> $> ./bin/sstable2json
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>  {
>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>> [["app_name","50f21d3d",1357785277207001,"d"],
>> ["client_ip","50f21d3d",1357785277207001,"d"],
>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>> ["req_method","50f21d3d",1357785277207001,"d"],
>> ["req_service","50f21d3d",1357785277207001,"d"],
>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>> ["success","50f21d3d",1357785277207001,"d"]]
>> }
>>
>>
>> My experience with TTL columns so far has been pretty similar to Viktor's
>> in that the only way to keep them row count under control is to force major
>> compactions.  In real world use, STCS and LCS both leave TTL expired rows
>> around forever as far as I can tell.  When testing with minimal data,
>> removal of TTL expired rows seem to work as expected but in this case there
>> seems to be some divergence from real life work and test samples.
>>
>> -Bryan
>>
>>
>>
>>
>> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
>> Viktor.Jevdokimov@adform.com> wrote:
>>
>>>  @Bryan,****
>>>
>>> ** **
>>>
>>> To keep data size as low as possible with TTL columns we still use STCS
>>> and nightly major compactions.****
>>>
>>> ** **
>>>
>>> Experience with LCS was not successful in our case, data size keeps too
>>> high along with amount of compactions.****
>>>
>>> ** **
>>>
>>> IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I
>>> have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>>>
>>> ** **
>>>
>>> ** **
>>>    Best regards / Pagarbiai
>>> *Viktor Jevdokimov*
>>> Senior Developer
>>>
>>> Email: Viktor.Jevdokimov@adform.com
>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>>  [image: Adform News] <http://www.adform.com>
>>> [image: Adform awarded the Best Employer 2012]
>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>>   *From:* aaron morton [mailto:aaron@thelastpickle.com]
>>> *Sent:* Thursday, January 17, 2013 06:24
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>>>
>>> ** **
>>>
>>> Minor compaction (with Size Tiered) will only purge tombstones if all
>>> fragments of a row are contained in the SSTables being compacted. So if you
>>> have a long lived row, that is present in many size tiers, the columns will
>>> not be purged. ****
>>>
>>> ** **
>>>
>>>   (thus compacted compacted) 3 days after all columns for that row had
>>> expired****
>>>
>>> Tombstones have to get on disk, even if you set the gc_grace_seconds to
>>> 0. If not they do not get a chance to delete previous versions of the
>>> column which already exist on disk. So when the compaction ran your
>>> ExpiringColumn was turned into a DeletedColumn and placed on disk. ****
>>>
>>> ** **
>>>
>>> I would expect the next round of compaction to remove these columns. ***
>>> *
>>>
>>> ** **
>>>
>>> There is a new feature in 1.2 that may help you here. It will do a
>>> special compaction of individual sstables when they have a certain
>>> proportion of dead columns
>>> https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>>>
>>> ** **
>>>
>>> Also interested to know if LCS helps. ****
>>>
>>> ** **
>>>
>>> Cheers****
>>>
>>>  ****
>>>
>>> ** **
>>>
>>> -----------------****
>>>
>>> Aaron Morton****
>>>
>>> Freelance Cassandra Developer****
>>>
>>> New Zealand****
>>>
>>> ** **
>>>
>>> @aaronmorton****
>>>
>>> http://www.thelastpickle.com****
>>>
>>> ** **
>>>
>>> On 17/01/2013, at 2:55 PM, Bryan Talbot <bt...@aeriagames.com> wrote:*
>>> ***
>>>
>>>
>>>
>>> ****
>>>
>>> According to the timestamps (see original post) the SSTable was written
>>> (thus compacted compacted) 3 days after all columns for that row had
>>> expired and 6 days after the row was created; yet all columns are still
>>> showing up in the SSTable.  Note that the column shows now rows when a
>>> "get" for that key is run so that's working correctly, but the data is
>>> lugged around far longer than it should be -- maybe forever.****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> -Bryan****
>>>
>>> ** **
>>>
>>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com>
>>> wrote:****
>>>
>>> To get column removed you have to meet two requirements ****
>>>
>>> 1. column should be expired****
>>>
>>> 2. after that CF gets compacted****
>>>
>>> ** **
>>>
>>> I guess your expired columns are propagated to high tier CF, which gets
>>> compacted rarely.****
>>>
>>> So, you have to wait when high tier CF gets compacted.  ****
>>>
>>> ** **
>>>
>>> Andrey****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com>
>>> wrote:****
>>>
>>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>>> getting rows to be compacted away (removed) even though all columns have
>>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>>> same symptom: the data stays around essentially forever.  ****
>>>
>>> ** **
>>>
>>> Currently we write all columns with a TTL of 72 hours (259200 seconds)
>>> and expect to add 10 GB of data to this CF per day per node.  Each node
>>> currently has 73 GB for the affected CF and shows no indications that old
>>> rows will be removed on their own.****
>>>
>>> ** **
>>>
>>> Why aren't rows being removed?  Below is some data from a sample row
>>> which should have been removed several days ago but is still around even
>>> though it has been involved in numerous compactions since being expired.
>>> ****
>>>
>>> ** **
>>>
>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>>>
>>>
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>> ****
>>>
>>> ** **
>>>
>>> $> ls -alF
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>> ****
>>>
>>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>> ****
>>>
>>> ** **
>>>
>>> $> ./bin/sstable2json
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>> ****
>>>
>>> {****
>>>
>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>> ["success","50f21d3d",1357785277207001,"d"]]****
>>>
>>> }****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> Decoding the column timestamps to shows that the columns were written at
>>> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
>>> 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
>>> Jan 16 which is 3 days after all columns have TTL-ed out.****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> The schema shows that gc_grace is set to 0 since this data is
>>> write-once, read-seldom and is never updated or deleted.****
>>>
>>> ** **
>>>
>>> create column family request_summary****
>>>
>>>   with column_type = 'Standard'****
>>>
>>>   and comparator = 'UTF8Type'****
>>>
>>>   and default_validation_class = 'UTF8Type'****
>>>
>>>   and key_validation_class = 'UTF8Type'****
>>>
>>>   and read_repair_chance = 0.1****
>>>
>>>   and dclocal_read_repair_chance = 0.0****
>>>
>>>   and gc_grace = 0****
>>>
>>>   and min_compaction_threshold = 4****
>>>
>>>   and max_compaction_threshold = 32****
>>>
>>>   and replicate_on_write = true****
>>>
>>>   and compaction_strategy =
>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>>>
>>>   and caching = 'NONE'****
>>>
>>>   and bloom_filter_fp_chance = 1.0****
>>>
>>>   and compression_options = {'chunk_length_kb' : '64',
>>> 'sstable_compression' :
>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> Thanks in advance for help in understanding why rows such as this are
>>> not removed!****
>>>
>>> ** **
>>>
>>> -Bryan****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>
>>
>>
>>
>


-- 
Derek Williams

Re: LCS not removing rows with all TTL expired columns

Posted by Bryan Talbot <bt...@aeriagames.com>.

I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7, 1.1.8,
a trivial schema, and a simple script that just inserts rows.  If the TTL
is small enough so that all LCS data fits in generation 0 then the rows
seem to be removed with TTL expires as desired.  However, if the insertion
rate is high enough or the TTL long enough then the data keep accumulating
for far longer than expected.

Using 120 second TTL and a single threaded php insertion script my MBP with
SSD retained almost all of the data.  120 seconds should accumulate 5-10 MB
of data.  I would expect that TTL rows to be removed eventually and for the
cassandra load to level off at some reasonable value near 10 MB.  After
running for 2 hours and with a cassandra load of ~550 MB I stopped the test.

The schema is

create keyspace test
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 1}
  and durable_writes = true;

use test;

create column family test
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'TimeUUIDType'
  and compaction_strategy =
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and bloom_filter_fp_chance = 1.0
  and column_metadata = [
    {column_name : 'a',
    validation_class : LongType}];


and the insert script is

<?php

require_once('phpcassa/1.0.a.5/autoload.php');

use phpcassa\Connection\ConnectionPool;
use phpcassa\ColumnFamily;
use phpcassa\SystemManager;
use phpcassa\UUID;

// Connect to test keyspace and column family
$sys = new SystemManager('127.0.0.1');

// Start a connection pool, create our ColumnFamily instance
$pool = new ConnectionPool('test', array('127.0.0.1'));
$testCf = new ColumnFamily($pool, 'test');

// Insert records
while( 1 ) {
  $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120);
}

// Close our connections
$pool->close();
$sys->close();

?>


-Bryan




On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot <bt...@aeriagames.com>wrote:

> We are using LCS and the particular row I've referenced has been involved
> in several compactions after all columns have TTL expired.  The most recent
> one was again this morning and the row is still there -- TTL expired for
> several days now with gc_grace=0 and several compactions later ...
>
>
> $> ./bin/nodetool -h localhost getsstables metrics request_summary
> 459fb460-5ace-11e2-9b92-11d67b6163b4
>
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>
> $> ls -alF
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>
>
> $> ./bin/sstable2json
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
> {
> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
> [["app_name","50f21d3d",1357785277207001,"d"],
> ["client_ip","50f21d3d",1357785277207001,"d"],
> ["client_req_id","50f21d3d",1357785277207001,"d"],
> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
> ["req_duration_us","50f21d3d",1357785277207001,"d"],
> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
> ["req_method","50f21d3d",1357785277207001,"d"],
> ["req_service","50f21d3d",1357785277207001,"d"],
> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
> ["success","50f21d3d",1357785277207001,"d"]]
> }
>
>
> My experience with TTL columns so far has been pretty similar to Viktor's
> in that the only way to keep them row count under control is to force major
> compactions.  In real world use, STCS and LCS both leave TTL expired rows
> around forever as far as I can tell.  When testing with minimal data,
> removal of TTL expired rows seem to work as expected but in this case there
> seems to be some divergence from real life work and test samples.
>
> -Bryan
>
>
>
>
> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
> Viktor.Jevdokimov@adform.com> wrote:
>
>>  @Bryan,****
>>
>> ** **
>>
>> To keep data size as low as possible with TTL columns we still use STCS
>> and nightly major compactions.****
>>
>> ** **
>>
>> Experience with LCS was not successful in our case, data size keeps too
>> high along with amount of compactions.****
>>
>> ** **
>>
>> IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I
>> have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>>
>> ** **
>>
>> ** **
>>    Best regards / Pagarbiai
>> *Viktor Jevdokimov*
>> Senior Developer
>>
>> Email: Viktor.Jevdokimov@adform.com
>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>  [image: Adform News] <http://www.adform.com>
>> [image: Adform awarded the Best Employer 2012]
>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>   *From:* aaron morton [mailto:aaron@thelastpickle.com]
>> *Sent:* Thursday, January 17, 2013 06:24
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>>
>> ** **
>>
>> Minor compaction (with Size Tiered) will only purge tombstones if all
>> fragments of a row are contained in the SSTables being compacted. So if you
>> have a long lived row, that is present in many size tiers, the columns will
>> not be purged. ****
>>
>> ** **
>>
>>   (thus compacted compacted) 3 days after all columns for that row had
>> expired****
>>
>> Tombstones have to get on disk, even if you set the gc_grace_seconds to
>> 0. If not they do not get a chance to delete previous versions of the
>> column which already exist on disk. So when the compaction ran your
>> ExpiringColumn was turned into a DeletedColumn and placed on disk. ****
>>
>> ** **
>>
>> I would expect the next round of compaction to remove these columns. ****
>>
>> ** **
>>
>> There is a new feature in 1.2 that may help you here. It will do a
>> special compaction of individual sstables when they have a certain
>> proportion of dead columns
>> https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>>
>> ** **
>>
>> Also interested to know if LCS helps. ****
>>
>> ** **
>>
>> Cheers****
>>
>>  ****
>>
>> ** **
>>
>> -----------------****
>>
>> Aaron Morton****
>>
>> Freelance Cassandra Developer****
>>
>> New Zealand****
>>
>> ** **
>>
>> @aaronmorton****
>>
>> http://www.thelastpickle.com****
>>
>> ** **
>>
>> On 17/01/2013, at 2:55 PM, Bryan Talbot <bt...@aeriagames.com> wrote:**
>> **
>>
>>
>>
>> ****
>>
>> According to the timestamps (see original post) the SSTable was written
>> (thus compacted compacted) 3 days after all columns for that row had
>> expired and 6 days after the row was created; yet all columns are still
>> showing up in the SSTable.  Note that the column shows now rows when a
>> "get" for that key is run so that's working correctly, but the data is
>> lugged around far longer than it should be -- maybe forever.****
>>
>> ** **
>>
>> ** **
>>
>> -Bryan****
>>
>> ** **
>>
>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com>
>> wrote:****
>>
>> To get column removed you have to meet two requirements ****
>>
>> 1. column should be expired****
>>
>> 2. after that CF gets compacted****
>>
>> ** **
>>
>> I guess your expired columns are propagated to high tier CF, which gets
>> compacted rarely.****
>>
>> So, you have to wait when high tier CF gets compacted.  ****
>>
>> ** **
>>
>> Andrey****
>>
>> ** **
>>
>> ** **
>>
>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com>
>> wrote:****
>>
>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>> getting rows to be compacted away (removed) even though all columns have
>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>> same symptom: the data stays around essentially forever.  ****
>>
>> ** **
>>
>> Currently we write all columns with a TTL of 72 hours (259200 seconds)
>> and expect to add 10 GB of data to this CF per day per node.  Each node
>> currently has 73 GB for the affected CF and shows no indications that old
>> rows will be removed on their own.****
>>
>> ** **
>>
>> Why aren't rows being removed?  Below is some data from a sample row
>> which should have been removed several days ago but is still around even
>> though it has been involved in numerous compactions since being expired.*
>> ***
>>
>> ** **
>>
>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>>
>>
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>> ****
>>
>> ** **
>>
>> $> ls -alF
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>> ****
>>
>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>> ****
>>
>> ** **
>>
>> $> ./bin/sstable2json
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>> ****
>>
>> {****
>>
>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>> [["app_name","50f21d3d",1357785277207001,"d"],
>> ["client_ip","50f21d3d",1357785277207001,"d"],
>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>> ["req_method","50f21d3d",1357785277207001,"d"],
>> ["req_service","50f21d3d",1357785277207001,"d"],
>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>> ["success","50f21d3d",1357785277207001,"d"]]****
>>
>> }****
>>
>> ** **
>>
>> ** **
>>
>> Decoding the column timestamps to shows that the columns were written at
>> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
>> 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
>> Jan 16 which is 3 days after all columns have TTL-ed out.****
>>
>> ** **
>>
>> ** **
>>
>> The schema shows that gc_grace is set to 0 since this data is write-once,
>> read-seldom and is never updated or deleted.****
>>
>> ** **
>>
>> create column family request_summary****
>>
>>   with column_type = 'Standard'****
>>
>>   and comparator = 'UTF8Type'****
>>
>>   and default_validation_class = 'UTF8Type'****
>>
>>   and key_validation_class = 'UTF8Type'****
>>
>>   and read_repair_chance = 0.1****
>>
>>   and dclocal_read_repair_chance = 0.0****
>>
>>   and gc_grace = 0****
>>
>>   and min_compaction_threshold = 4****
>>
>>   and max_compaction_threshold = 32****
>>
>>   and replicate_on_write = true****
>>
>>   and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>>
>>   and caching = 'NONE'****
>>
>>   and bloom_filter_fp_chance = 1.0****
>>
>>   and compression_options = {'chunk_length_kb' : '64',
>> 'sstable_compression' :
>> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>>
>> ** **
>>
>> ** **
>>
>> Thanks in advance for help in understanding why rows such as this are not
>> removed!****
>>
>> ** **
>>
>> -Bryan****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>
>
>

Re: LCS not removing rows with all TTL expired columns

Posted by Bryan Talbot <bt...@aeriagames.com>.

We are using LCS and the particular row I've referenced has been involved
in several compactions after all columns have TTL expired.  The most recent
one was again this morning and the row is still there -- TTL expired for
several days now with gc_grace=0 and several compactions later ...


$> ./bin/nodetool -h localhost getsstables metrics request_summary
459fb460-5ace-11e2-9b92-11d67b6163b4
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db

$> ls -alF
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
-rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db


$> ./bin/sstable2json
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
-k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
{
"34353966623436302d356163652d313165322d396239322d313164363762363136336234":
[["app_name","50f21d3d",1357785277207001,"d"],
["client_ip","50f21d3d",1357785277207001,"d"],
["client_req_id","50f21d3d",1357785277207001,"d"],
["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
["mysql_duration_us","50f21d3d",1357785277207001,"d"],
["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
["req_duration_us","50f21d3d",1357785277207001,"d"],
["req_finish_time_us","50f21d3d",1357785277207001,"d"],
["req_method","50f21d3d",1357785277207001,"d"],
["req_service","50f21d3d",1357785277207001,"d"],
["req_start_time_us","50f21d3d",1357785277207001,"d"],
["success","50f21d3d",1357785277207001,"d"]]
}


My experience with TTL columns so far has been pretty similar to Viktor's
in that the only way to keep them row count under control is to force major
compactions.  In real world use, STCS and LCS both leave TTL expired rows
around forever as far as I can tell.  When testing with minimal data,
removal of TTL expired rows seem to work as expected but in this case there
seems to be some divergence from real life work and test samples.

-Bryan




On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
Viktor.Jevdokimov@adform.com> wrote:

>  @Bryan,****
>
> ** **
>
> To keep data size as low as possible with TTL columns we still use STCS
> and nightly major compactions.****
>
> ** **
>
> Experience with LCS was not successful in our case, data size keeps too
> high along with amount of compactions.****
>
> ** **
>
> IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I
> have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>
> ** **
>
> ** **
>    Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: Viktor.Jevdokimov@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>  [image: Adform News] <http://www.adform.com>
> [image: Adform awarded the Best Employer 2012]
> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>   *From:* aaron morton [mailto:aaron@thelastpickle.com]
> *Sent:* Thursday, January 17, 2013 06:24
> *To:* user@cassandra.apache.org
> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>
> ** **
>
> Minor compaction (with Size Tiered) will only purge tombstones if all
> fragments of a row are contained in the SSTables being compacted. So if you
> have a long lived row, that is present in many size tiers, the columns will
> not be purged. ****
>
> ** **
>
>   (thus compacted compacted) 3 days after all columns for that row had
> expired****
>
> Tombstones have to get on disk, even if you set the gc_grace_seconds to 0.
> If not they do not get a chance to delete previous versions of the column
> which already exist on disk. So when the compaction ran your ExpiringColumn
> was turned into a DeletedColumn and placed on disk. ****
>
> ** **
>
> I would expect the next round of compaction to remove these columns. ****
>
> ** **
>
> There is a new feature in 1.2 that may help you here. It will do a special
> compaction of individual sstables when they have a certain proportion of
> dead columns https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>
> ** **
>
> Also interested to know if LCS helps. ****
>
> ** **
>
> Cheers****
>
>  ****
>
> ** **
>
> -----------------****
>
> Aaron Morton****
>
> Freelance Cassandra Developer****
>
> New Zealand****
>
> ** **
>
> @aaronmorton****
>
> http://www.thelastpickle.com****
>
> ** **
>
> On 17/01/2013, at 2:55 PM, Bryan Talbot <bt...@aeriagames.com> wrote:***
> *
>
>
>
> ****
>
> According to the timestamps (see original post) the SSTable was written
> (thus compacted compacted) 3 days after all columns for that row had
> expired and 6 days after the row was created; yet all columns are still
> showing up in the SSTable.  Note that the column shows now rows when a
> "get" for that key is run so that's working correctly, but the data is
> lugged around far longer than it should be -- maybe forever.****
>
> ** **
>
> ** **
>
> -Bryan****
>
> ** **
>
> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com>
> wrote:****
>
> To get column removed you have to meet two requirements ****
>
> 1. column should be expired****
>
> 2. after that CF gets compacted****
>
> ** **
>
> I guess your expired columns are propagated to high tier CF, which gets
> compacted rarely.****
>
> So, you have to wait when high tier CF gets compacted.  ****
>
> ** **
>
> Andrey****
>
> ** **
>
> ** **
>
> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com>
> wrote:****
>
> On cassandra 1.1.5 with a write heavy workload, we're having problems
> getting rows to be compacted away (removed) even though all columns have
> expired TTL.  We've tried size tiered and now leveled and are seeing the
> same symptom: the data stays around essentially forever.  ****
>
> ** **
>
> Currently we write all columns with a TTL of 72 hours (259200 seconds) and
> expect to add 10 GB of data to this CF per day per node.  Each node
> currently has 73 GB for the affected CF and shows no indications that old
> rows will be removed on their own.****
>
> ** **
>
> Why aren't rows being removed?  Below is some data from a sample row which
> should have been removed several days ago but is still around even though
> it has been involved in numerous compactions since being expired.****
>
> ** **
>
> $> ./bin/nodetool -h localhost getsstables metrics request_summary
> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>
>
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> ****
>
> ** **
>
> $> ls -alF
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> ****
>
> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> ****
>
> ** **
>
> $> ./bin/sstable2json
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
> ****
>
> {****
>
> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
> [["app_name","50f21d3d",1357785277207001,"d"],
> ["client_ip","50f21d3d",1357785277207001,"d"],
> ["client_req_id","50f21d3d",1357785277207001,"d"],
> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
> ["req_duration_us","50f21d3d",1357785277207001,"d"],
> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
> ["req_method","50f21d3d",1357785277207001,"d"],
> ["req_service","50f21d3d",1357785277207001,"d"],
> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
> ["success","50f21d3d",1357785277207001,"d"]]****
>
> }****
>
> ** **
>
> ** **
>
> Decoding the column timestamps to shows that the columns were written at
> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
> 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
> Jan 16 which is 3 days after all columns have TTL-ed out.****
>
> ** **
>
> ** **
>
> The schema shows that gc_grace is set to 0 since this data is write-once,
> read-seldom and is never updated or deleted.****
>
> ** **
>
> create column family request_summary****
>
>   with column_type = 'Standard'****
>
>   and comparator = 'UTF8Type'****
>
>   and default_validation_class = 'UTF8Type'****
>
>   and key_validation_class = 'UTF8Type'****
>
>   and read_repair_chance = 0.1****
>
>   and dclocal_read_repair_chance = 0.0****
>
>   and gc_grace = 0****
>
>   and min_compaction_threshold = 4****
>
>   and max_compaction_threshold = 32****
>
>   and replicate_on_write = true****
>
>   and compaction_strategy =
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>
>   and caching = 'NONE'****
>
>   and bloom_filter_fp_chance = 1.0****
>
>   and compression_options = {'chunk_length_kb' : '64',
> 'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>
> ** **
>
> ** **
>
> Thanks in advance for help in understanding why rows such as this are not
> removed!****
>
> ** **
>
> -Bryan****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>

RE: LCS not removing rows with all TTL expired columns

Posted by Viktor Jevdokimov <Vi...@adform.com>.

@Bryan,

To keep data size as low as possible with TTL columns we still use STCS and nightly major compactions.

Experience with LCS was not successful in our case, data size keeps too high along with amount of compactions.

IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I have not tested 1.2 LCS behavior, we're still on 1.0.x


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<ma...@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>

[Adform News] <http://www.adform.com>
[Adform awarded the Best Employer 2012] <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>


Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: Thursday, January 17, 2013 06:24
To: user@cassandra.apache.org
Subject: Re: LCS not removing rows with all TTL expired columns

Minor compaction (with Size Tiered) will only purge tombstones if all fragments of a row are contained in the SSTables being compacted. So if you have a long lived row, that is present in many size tiers, the columns will not be purged.

 (thus compacted compacted) 3 days after all columns for that row had expired
Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If not they do not get a chance to delete previous versions of the column which already exist on disk. So when the compaction ran your ExpiringColumn was turned into a DeletedColumn and placed on disk.

I would expect the next round of compaction to remove these columns.

There is a new feature in 1.2 that may help you here. It will do a special compaction of individual sstables when they have a certain proportion of dead columns https://issues.apache.org/jira/browse/CASSANDRA-3442

Also interested to know if LCS helps.

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 2:55 PM, Bryan Talbot <bt...@aeriagames.com>> wrote:


According to the timestamps (see original post) the SSTable was written (thus compacted compacted) 3 days after all columns for that row had expired and 6 days after the row was created; yet all columns are still showing up in the SSTable.  Note that the column shows now rows when a "get" for that key is run so that's working correctly, but the data is lugged around far longer than it should be -- maybe forever.


-Bryan

On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com>> wrote:
To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey


On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com>> wrote:
On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL.  We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever.

Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node.  Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own.

Why aren't rows being removed?  Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired.

$> ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$> ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
-rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$> ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
{
"34353966623436302d356163652d313165322d396239322d313164363762363136336234": [["app_name","50f21d3d",1357785277207001,"d"], ["client_ip","50f21d3d",1357785277207001,"d"], ["client_req_id","50f21d3d",1357785277207001,"d"], ["mysql_call_cnt","50f21d3d",1357785277207001,"d"], ["mysql_duration_us","50f21d3d",1357785277207001,"d"], ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"], ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"], ["req_duration_us","50f21d3d",1357785277207001,"d"], ["req_finish_time_us","50f21d3d",1357785277207001,"d"], ["req_method","50f21d3d",1357785277207001,"d"], ["req_service","50f21d3d",1357785277207001,"d"], ["req_start_time_us","50f21d3d",1357785277207001,"d"], ["success","50f21d3d",1357785277207001,"d"]]
}


Decoding the column timestamps to shows that the columns were written at "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out.


The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted.

create column family request_summary
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 0
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and bloom_filter_fp_chance = 1.0
  and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};


Thanks in advance for help in understanding why rows such as this are not removed!

-Bryan

Re: LCS not removing rows with all TTL expired columns

Posted by aaron morton <aa...@thelastpickle.com>.

Minor compaction (with Size Tiered) will only purge tombstones if all fragments of a row are contained in the SSTables being compacted. So if you have a long lived row, that is present in many size tiers, the columns will not be purged. 

>  (thus compacted compacted) 3 days after all columns for that row had expired
Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If not they do not get a chance to delete previous versions of the column which already exist on disk. So when the compaction ran your ExpiringColumn was turned into a DeletedColumn and placed on disk. 

I would expect the next round of compaction to remove these columns. 

There is a new feature in 1.2 that may help you here. It will do a special compaction of individual sstables when they have a certain proportion of dead columns https://issues.apache.org/jira/browse/CASSANDRA-3442 

Also interested to know if LCS helps. 

Cheers
 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 2:55 PM, Bryan Talbot <bt...@aeriagames.com> wrote:

> According to the timestamps (see original post) the SSTable was written (thus compacted compacted) 3 days after all columns for that row had expired and 6 days after the row was created; yet all columns are still showing up in the SSTable.  Note that the column shows now rows when a "get" for that key is run so that's working correctly, but the data is lugged around far longer than it should be -- maybe forever.
> 
> 
> -Bryan
> 
> 
> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com> wrote:
> To get column removed you have to meet two requirements 
> 1. column should be expired
> 2. after that CF gets compacted
> 
> I guess your expired columns are propagated to high tier CF, which gets compacted rarely.
> So, you have to wait when high tier CF gets compacted.  
> 
> Andrey
> 
> 
> 
> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com> wrote:
> On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL.  We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever.  
> 
> Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node.  Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own.
> 
> Why aren't rows being removed?  Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired.
> 
> $> ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> 
> $> ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> 
> $> ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
> {
> "34353966623436302d356163652d313165322d396239322d313164363762363136336234": [["app_name","50f21d3d",1357785277207001,"d"], ["client_ip","50f21d3d",1357785277207001,"d"], ["client_req_id","50f21d3d",1357785277207001,"d"], ["mysql_call_cnt","50f21d3d",1357785277207001,"d"], ["mysql_duration_us","50f21d3d",1357785277207001,"d"], ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"], ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"], ["req_duration_us","50f21d3d",1357785277207001,"d"], ["req_finish_time_us","50f21d3d",1357785277207001,"d"], ["req_method","50f21d3d",1357785277207001,"d"], ["req_service","50f21d3d",1357785277207001,"d"], ["req_start_time_us","50f21d3d",1357785277207001,"d"], ["success","50f21d3d",1357785277207001,"d"]]
> }
> 
> 
> Decoding the column timestamps to shows that the columns were written at "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out.
> 
> 
> The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted.
> 
> create column family request_summary
>   with column_type = 'Standard'
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'UTF8Type'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 0
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>   and caching = 'NONE'
>   and bloom_filter_fp_chance = 1.0
>   and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
> 
> 
> Thanks in advance for help in understanding why rows such as this are not removed!
> 
> -Bryan
> 
> 
>

Re: LCS not removing rows with all TTL expired columns

Posted by Bryan Talbot <bt...@aeriagames.com>.

According to the timestamps (see original post) the SSTable was written
(thus compacted compacted) 3 days after all columns for that row had
expired and 6 days after the row was created; yet all columns are still
showing up in the SSTable.  Note that the column shows now rows when a
"get" for that key is run so that's working correctly, but the data is
lugged around far longer than it should be -- maybe forever.


-Bryan


On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ai...@gmail.com> wrote:

> To get column removed you have to meet two requirements
> 1. column should be expired
> 2. after that CF gets compacted
>
> I guess your expired columns are propagated to high tier CF, which gets
> compacted rarely.
> So, you have to wait when high tier CF gets compacted.
>
> Andrey
>
>
>
> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com>wrote:
>
>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>> getting rows to be compacted away (removed) even though all columns have
>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>> same symptom: the data stays around essentially forever.
>>
>> Currently we write all columns with a TTL of 72 hours (259200 seconds)
>> and expect to add 10 GB of data to this CF per day per node.  Each node
>> currently has 73 GB for the affected CF and shows no indications that old
>> rows will be removed on their own.
>>
>> Why aren't rows being removed?  Below is some data from a sample row
>> which should have been removed several days ago but is still around even
>> though it has been involved in numerous compactions since being expired.
>>
>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>>
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>
>> $> ls -alF
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>
>> $> ./bin/sstable2json
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>> {
>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>> [["app_name","50f21d3d",1357785277207001,"d"],
>> ["client_ip","50f21d3d",1357785277207001,"d"],
>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>> ["req_method","50f21d3d",1357785277207001,"d"],
>> ["req_service","50f21d3d",1357785277207001,"d"],
>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>> ["success","50f21d3d",1357785277207001,"d"]]
>> }
>>
>>
>> Decoding the column timestamps to shows that the columns were written at
>> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
>> 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
>> Jan 16 which is 3 days after all columns have TTL-ed out.
>>
>>
>> The schema shows that gc_grace is set to 0 since this data is write-once,
>> read-seldom and is never updated or deleted.
>>
>> create column family request_summary
>>   with column_type = 'Standard'
>>   and comparator = 'UTF8Type'
>>   and default_validation_class = 'UTF8Type'
>>   and key_validation_class = 'UTF8Type'
>>   and read_repair_chance = 0.1
>>   and dclocal_read_repair_chance = 0.0
>>   and gc_grace = 0
>>   and min_compaction_threshold = 4
>>   and max_compaction_threshold = 32
>>   and replicate_on_write = true
>>   and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>>   and caching = 'NONE'
>>   and bloom_filter_fp_chance = 1.0
>>   and compression_options = {'chunk_length_kb' : '64',
>> 'sstable_compression' :
>> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>>
>>
>> Thanks in advance for help in understanding why rows such as this are not
>> removed!
>>
>> -Bryan
>>
>>
>

Re: LCS not removing rows with all TTL expired columns

Posted by Andrey Ilinykh <ai...@gmail.com>.

To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets
compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey



On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <bt...@aeriagames.com>wrote:

> On cassandra 1.1.5 with a write heavy workload, we're having problems
> getting rows to be compacted away (removed) even though all columns have
> expired TTL.  We've tried size tiered and now leveled and are seeing the
> same symptom: the data stays around essentially forever.
>
> Currently we write all columns with a TTL of 72 hours (259200 seconds) and
> expect to add 10 GB of data to this CF per day per node.  Each node
> currently has 73 GB for the affected CF and shows no indications that old
> rows will be removed on their own.
>
> Why aren't rows being removed?  Below is some data from a sample row which
> should have been removed several days ago but is still around even though
> it has been involved in numerous compactions since being expired.
>
> $> ./bin/nodetool -h localhost getsstables metrics request_summary
> 459fb460-5ace-11e2-9b92-11d67b6163b4
>
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>
> $> ls -alF
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>
> $> ./bin/sstable2json
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
> {
> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
> [["app_name","50f21d3d",1357785277207001,"d"],
> ["client_ip","50f21d3d",1357785277207001,"d"],
> ["client_req_id","50f21d3d",1357785277207001,"d"],
> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
> ["req_duration_us","50f21d3d",1357785277207001,"d"],
> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
> ["req_method","50f21d3d",1357785277207001,"d"],
> ["req_service","50f21d3d",1357785277207001,"d"],
> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
> ["success","50f21d3d",1357785277207001,"d"]]
> }
>
>
> Decoding the column timestamps to shows that the columns were written at
> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
> 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
> Jan 16 which is 3 days after all columns have TTL-ed out.
>
>
> The schema shows that gc_grace is set to 0 since this data is write-once,
> read-seldom and is never updated or deleted.
>
> create column family request_summary
>   with column_type = 'Standard'
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'UTF8Type'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 0
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy =
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>   and caching = 'NONE'
>   and bloom_filter_fp_chance = 1.0
>   and compression_options = {'chunk_length_kb' : '64',
> 'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>
>
> Thanks in advance for help in understanding why rows such as this are not
> removed!
>
> -Bryan
>
>