You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Romain GERARD (JIRA)" <ji...@apache.org> on 2017/08/17 09:42:03 UTC

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

    [ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130152#comment-16130152 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:41 AM:
--------------------------------------------------------------------

Hi,

I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
    It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest somes)
+ Rebased upon trunk

Every tests passed and I will deploy this patch internally to confirm that it works as expected


was (Author: rgerard):
Hi,

I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
    It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest somes)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-13418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Corentin Chary
>              Labels: twcs
>         Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If you really want read-repairs you're going to have sstables blocking the expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a very low value and that will purge the blockers of old data that should already have expired, thus removing the overlaps and allowing the other SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have time series, you might not care if all your data doesn't exactly expire at the right time, or if data re-appears for some time, as long as it gets deleted as soon as it can. And in this situation I believe it would be really beneficial to allow users to simply ignore overlapping SSTables when looking for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be enough to greatly reduce entropy of the most used data (and if you have timeseries, you're likely to have a dashboard doing the same important queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org