You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Paulo Motta <pa...@gmail.com> on 2015/10/15 18:01:31 UTC

Re: unchecked_tombstone_compaction - query

Hello Deepak,

The dev@cassandra list is exclusive for development announcements and
discussions, so I will reply to users@cassandra as someone else might have
a similar question.

Basically, there is pre-check, that defines which sstables are eligible for
single-sstable tombstone compaction, and an actual check that determines if
a key is present in a single sstable before performing the actual tombstone
removal (otherwise it doesn't do anything).

The pre-check is cheap, and only consider an sstable eligible for tombstone
compaction if the sstable key range does not overlap with any other sstable
key range. The actual check might need to read from disk (=more seeks on
spindles) to confirm if the sstables with range overlap actually contain
the tombstoned key, in order to define if it's safe or not to drop the
tombstone.

In the case of size tiered compaction, it's common for many sstables to
have overlapping ranges, so the tombstone compaction is almost never
triggered, so you will need to wait until compactions organically remove
tomsbtones. The unchecked_tombstone_compaction removes the pre-check for
overlapping ranges, but still performs the check to determine if it's safe
to drop a tombstone. So, it's possible that your I/O will increase if you
enable this property, since more data will need to be read to perform the
actual checks, but otherwise it's safe to use this setting. A good way to
check if the setting is useful is to watch your droppable tombstone ratio
metrics after enabling it.

Cheers,

Paulo

2015-10-14 23:27 GMT-07:00 Deepak Nagaraj <n....@gmail.com>:

> Hi Paulo, C* devs,
>
> I have a question on "unchecked_tombstone_compaction" option.  I
> understand that setting this to true prevents a heuristic check on keys
> that span multiple sstables.
>
> But I also read that the heuristic was introduced because not having it
> can cause resurrections (i.e. sstable1 may have data, sstable2 may have
> tombstone, and when we delete the tombstone, deleted data suddenly shows
> up).
>
> So - isn't setting unchecked_tombstone_compaction to "true" a dangerous
> setting?  Won't it cause resurrections?  What is the use case for this
> knob, and when do I know I can set it to true safely?
>
> I've read the source code, Jira 6563, and relevant e-mail threads many
> times but I still don't have a clear understanding.
>
> Thanks in advance,
> -deepak
>
>

Re: unchecked_tombstone_compaction - query

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Oct 15, 2015 at 9:01 AM, Paulo Motta <pa...@gmail.com>
wrote:

> (OP says:) So - isn't setting unchecked_tombstone_compaction to "true" a
>> dangerous setting?  Won't it cause resurrections?  What is the use case for
>> this knob, and when do I know I can set it to true safely?
>>
>
To expand slightly on Paulo's great answer :

The only time to really consider use this feature is if you have a
reasonable suspicion that because of your write patterns that you will do
less net work if you simply skip the pre-check. Like many other performance
centric features whose use case seems difficult to grasp, it was likely
added because of a single significant user who was in exactly that case.

=Rob