You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Nick Hatfield <ni...@metricly.com> on 2019/04/03 03:12:47 UTC

RE: TWCS Compactions & Tombstones

Well, I got something going on, I just don’t know what to make of it. So I went through and removed the custom TWCS jar so that C* would default to the built version. This worked out great…

compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
              'compaction_window_size': '1',
              'compaction_window_unit': 'DAYS',
              'max_threshold': '32',
              'min_threshold': '4',
              'timestamp_resolution': 'MILLISECONDS',
              'tombstone_compaction_interval': '86400',
              'tombstone_threshold': '0.2',
              'unchecked_tombstone_compaction': 'true'}

I ran through an exercise with my concurrent compactors and compaction throughput until  I find a nice sweet spot. Not too overloaded, and still able to finish compactions without falling behind. After the first 24 hours of compactions, I could see a noticeable change in the way it was interacting with the LiveDiskSpace of the keyspace.

[cid:image002.png@01D4E9A9.5C5A65C0]

You can really see some MASSIVE dips in the live disk space used by the keyspace as these compactions attempt to chew through the overlapping timestamped sstables. What is baffling though, is that it all comes right back.  When I check the sstables, I still have the same batch of overlaps, on disk as if they never went through the minor compaction / tombstone gc at all.

Max: 11/08/2018 Min: 11/07/2018 Estimated droppable tombstones: 0.8828213760998842        20G Apr 2 22:26 mc-265039-big-Data.db
Max: 11/09/2018 Min: 09/13/2018 Estimated droppable tombstones: 0.8794850487036547        22G Apr 2 02:38 mc-263817-big-Data.db
Max: 11/10/2018 Min: 11/09/2018 Estimated droppable tombstones: 0.8881616441307754        20G Apr 3 02:39 mc-265317-big-Data.db
Max: 11/11/2018 Min: 11/10/2018 Estimated droppable tombstones: 0.8818055463164647        20G Apr 3 02:37 mc-265316-big-Data.db
Max: 11/12/2018 Min: 11/11/2018 Estimated droppable tombstones: 0.8886666531745852        21G Apr 2 02:17 mc-263787-big-Data.db
Max: 11/13/2018 Min: 09/17/2018 Estimated droppable tombstones: 0.8833019267294612        22G Apr 2 02:41 mc-263822-big-Data.db

Any other ideas that I can try checking/looking into? I’m using version 3.11.3, what is the best way to verify the data in each sstable, to know if it has truly TTL’d so I can start manual removal?

Thanks again for the assistance!

From: Nick Hatfield [mailto:nick.hatfield@metricly.com]
Sent: Wednesday, March 27, 2019 2:05 PM
To: user@cassandra.apache.org
Subject: RE: TWCS Compactions & Tombstones

Awesome, thanks again!

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Wednesday, March 27, 2019 1:36 PM
To: cassandra <us...@cassandra.apache.org>>
Subject: Re: TWCS Compactions & Tombstones

You would need to swap your class from the com.jeffjirsa variant (probably from 2.1 / 2.2) to the official TWCS class.

Once that happens I suspect it'll happen quite quickly, but I'm not sure.

On Wed, Mar 27, 2019 at 7:30 AM Nick Hatfield <ni...@metricly.com>> wrote:
Awesome, thank you Jeff. Sorry I had not seen this yet. So we have this enabled, I guess it will just take time to finally chew through it all?

From: Jeff Jirsa [mailto:jjirsa@gmail.com<ma...@gmail.com>]
Sent: Tuesday, March 26, 2019 9:41 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: TWCS Compactions & Tombstones


Or Upgrade to a version with https://issues.apache.org/jira/browse/CASSANDRA-13418 and enable that feature

--
Jeff Jirsa


On Mar 26, 2019, at 6:23 PM, Rahul Singh <ra...@gmail.com>> wrote:
What's your timewindow? Roughly how much data is in each window?

If you examine the sstable data and see that is truly old data with little chance that it has any new data, you can just remove the SStables. You can do a rolling restart -- take down a node, remove mc-254400-* and then start it up.


rahul.xavier.singh@gmail.com<ma...@gmail.com>

http://cassandra.link



On Tue, Mar 26, 2019 at 8:01 AM Nick Hatfield <ni...@metricly.com>> wrote:
How does one properly rid of sstables that have fallen victim to overlapping timestamps? I realized that we had TWCS set in our CF which also had a read_repair = 0.1 and after correcting this to 0.0 I can clearly see the affects over time on the new sstables. However, I still have old sstables that date back some time last year, and I need to remove them:

Max: 09/05/2018 Min: 09/04/2018 Estimated droppable tombstones: 0.8832057909932046    13G Mar 26 11:34 mc-254400-big-Data.db


What is the best way to do this? This is on a production system so any help would be greatly appreciated.

Thanks,