You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by William Oberman <ob...@civicscience.com> on 2014/04/11 16:05:46 UTC

clearing tombstones?

I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
repair, or time (as in just wait)?

I had a CF that was more or less storing session information.  After some
time, we decided that one piece of this information was pointless to track
(and was 90%+ of the columns, and in 99% of those cases was ALL columns for
a row).   I wrote a process to remove all of those columns (which again in
a vast majority of cases had the effect of removing the whole row).

This CF had ~1 billion rows, so I expect to be left with ~100m rows.  After
I did this mass delete, everything was the same size on disk (which I
expected, knowing how tombstoning works).  It wasn't 100% clear to me what
to poke to cause compactions to clear the tombstones.  First I tried
nodetool cleanup on a candidate node.  But, afterwards the disk usage was
the same.  Then I tried nodetool repair on that same node.  But again, disk
usage is still the same.  The CF has no snapshots.

So, am I misunderstanding something?  Is there another operation to try?
 Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
I have to run one or the other over all nodes to clear tombstones?

Cassandra 1.2.15 if it matters,

Thanks!

will

Re: clearing tombstones?

Posted by Robert Coli <rc...@eventbrite.com>.

(probably should have read downthread before writing my reply.. briefly, +1
most of the thread's commentary regarding major compaction, but don't
listen to the FUD about major compaction, unless you have a really large
amount of data you'll probably be fine..)

On Fri, Apr 11, 2014 at 7:05 AM, William Oberman
<ob...@civicscience.com>wrote:

> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
> repair, or time (as in just wait)?
>

The only operation guaranteed to collect 100% of tombstones is major
compaction. gc_grace_seconds duration is also involved, so be sure to
understand its value.

> I had a CF that was more or less storing session information.  After some
> time, we decided that one piece of this information was pointless to track
> (and was 90%+ of the columns, and in 99% of those cases was ALL columns for
> a row).   I wrote a process to remove all of those columns (which again in
> a vast majority of cases had the effect of removing the whole row).
>

https://issues.apache.org/jira/browse/CASSANDRA-1581

Describes a tool which "filtered" sstables to remove rows. In a future case
like this one, you might want to consider this approach.

> It wasn't 100% clear to me what to poke to cause compactions to clear the
> tombstones.
>

In order to delete a tombstone, all fragments of the row must be in a
sstable involved in the current compaction.

Some discussion here : https://issues.apache.org/jira/browse/CASSANDRA-1074

>  First I tried nodetool cleanup on a candidate node.  But, afterwards the
> disk usage was the same.
>

Cleanup writes out sstables 1:1, removing data which no belongs to a range
if the node cleaning up no longer owns that range. It is meant for use when
ranges are split, in order to "clean up" the data from the range being
given up.

>  Then I tried nodetool repair on that same node.  But again, disk usage is
> still the same.  The CF has no snapshots.
>

Repair is unrelated to the purging of tombstones.

> So, am I misunderstanding something?  Is there another operation to try?
>  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
> I have to run one or the other over all nodes to clear tombstones?
>

If you are using size tiered compaction, run a major compaction. ("nodetool
compact"). If you aren't, I believe that there is nothing you can do.

=Rob

Re: clearing tombstones?

Posted by William Oberman <ob...@civicscience.com>.

I'm still somewhat in the middle of the process, but it's far enough along
to report back.

1.) I changed GCGraceSeconds of the CF to 0 using cassandra-cli
2.)  I ran nodetool compact on a single node of the nine (I'll call it
"1").  It took 5-7 hours, and reduced the CF from ~450 to ~75GG (*).
3.)  I ran nodetool compact on nodes 2, 3, .... while watching write/read
latency averages in OpsCenter.  I got all of the way to 9 without any ill
effect
4.) 2->9 all completed with similar results

(*) So, I left one one detail that changed the math (I said above I
expected to clear down to at most 50GB).  I found a small bug in my delete
code mid-last week.  Basically, it deleted all of the rows I wanted, but
due to a race condition, there was a chance I'd delete rows in the middle
of doing new inserts.  Luckily, even in this case, it wasn't "end of the
world", but I stopped the cleanup anyways and added a time check (as all of
the rows I wanted to delete were older than 30 days).  I *thought* I'd
restarted the cleanup threads on a smaller dataset due to all of the
deletes, but instead I saw millions & millions of empty rows (the
tombstones).  Thus the start of this "clear the tombstones" subtask to the
original goal, and the reason I didn't see a 90%+ reduction in size.

In any case, now I'm running the cleanup process again, which will be
followed by ANOTHER round of compactions, and then I'll finally turn
GCGraceSeconds back on.

On the read/write production side, you'd never know anything happened.
 Good job on the distributed system! :-)

Thanks again,

will


On Fri, Apr 11, 2014 at 1:02 PM, Mark Reddy <ma...@boxever.com> wrote:

> Thats great Will, if you could update the thread with the actions you
> decide to take and the results that would be great.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
>> I've learned a *lot* from this thread.  My thanks to all of the
>> contributors!
>>
>> Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
>> are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)
>>
>> will
>>
>>
>>
>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mi...@adgear.com>wrote:
>>
>>>
>>> Levelled Compaction is a wholly different beast when it comes to
>>> tombstones.
>>>
>>> The tombstones are inserted, like any other write really, at the lower
>>> levels in the leveldb hierarchy.
>>>
>>> They are only removed after they have had the chance to "naturally"
>>> migrate upwards in the leveldb hierarchy to the highest level in your data
>>> store.  How long that takes depends on:
>>>  1. The amount of data in your store and the number of levels your LCS
>>> strategy has
>>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>>> forcing upwards compaction and combining
>>>
>>> To give you an idea, I had a similar scenario and ran a (slow,
>>> throttled) delete job on my cluster around December-January.  Here's a
>>> graph of the disk space usage on one node.  Notice the still-diclining
>>> usage long after the cleanup job has finished (sometime in January).  I
>>> tend to think of tombstones in LCS as little bombs that get to explode much
>>> later in time:
>>>
>>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>>
>>>
>>>
>>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>>> paulo.motta@chaordicsystems.com> wrote:
>>>
>>> I have a similar problem here, I deleted about 30% of a very large CF
>>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
>>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>>> scrub forces a minor compaction?
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <ma...@boxever.com>wrote:
>>>
>>>> Yes, running nodetool compact (major compaction) creates one large
>>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>>> this the compaction strategy you are using?) leading to multiple 'small'
>>>> SSTables alongside the single large SSTable, which results in increased
>>>> read latency. You will incur the operational overhead of having to manage
>>>> compactions if you wish to compact these smaller SSTables. For all these
>>>> reasons it is generally advised to stay away from running compactions
>>>> manually.
>>>>
>>>> Assuming that this is a production environment and you want to keep
>>>> everything running as smoothly as possible I would reduce the gc_grace on
>>>> the CF, allow automatic minor compactions to kick in and then increase the
>>>> gc_grace once again after the tombstones have been removed.
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>>> oberman@civicscience.com> wrote:
>>>>
>>>>> So, if I was impatient and just "wanted to make this happen now", I
>>>>> could:
>>>>>
>>>>> 1.) Change GCGraceSeconds of the CF to 0
>>>>> 2.) run nodetool compact (*)
>>>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>>>
>>>>> Since I have ~900M tombstones, even if I miss a few due to impatience,
>>>>> I don't care *that* much as I could re-run my clean up tool against the now
>>>>> much smaller CF.
>>>>>
>>>>> (*) A long long time ago I seem to recall reading advice about "don't
>>>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>>>> long term consequence?  Short term there are several:
>>>>> -a heavy operation
>>>>> -temporary 2x disk space
>>>>> -one big SSTable afterwards
>>>>> But moving forward, everything is ok right?
>>>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>>>> etc...  The only flaw I can think of is it will take forever until the
>>>>> SSTable minor compactions build up enough to consider including the big
>>>>> SSTable in a compaction, making it likely I'll have to self manage
>>>>> compactions.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>>>>
>>>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>>>> elapsed. The default value is set to 10 days which allows a great deal of
>>>>>> time for consistency to be achieved prior to deletion. If you are
>>>>>> operationally confident that you can achieve consistency via anti-entropy
>>>>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>>>>
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>>>> oberman@civicscience.com> wrote:
>>>>>>
>>>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>>
>>>>>>>> compaction should take care of it; for me it never worked so I run
>>>>>>>> nodetool compaction on every node; that does it.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <
>>>>>>>> oberman@civicscience.com>:
>>>>>>>>
>>>>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>>>
>>>>>>>>> I had a CF that was more or less storing session information.
>>>>>>>>>  After some time, we decided that one piece of this information was
>>>>>>>>> pointless to track (and was 90%+ of the columns, and in 99% of those cases
>>>>>>>>> was ALL columns for a row).   I wrote a process to remove all of those
>>>>>>>>> columns (which again in a vast majority of cases had the effect of removing
>>>>>>>>> the whole row).
>>>>>>>>>
>>>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m
>>>>>>>>> rows.  After I did this mass delete, everything was the same size on disk
>>>>>>>>> (which I expected, knowing how tombstoning works).  It wasn't 100% clear to
>>>>>>>>> me what to poke to cause compactions to clear the tombstones.  First I
>>>>>>>>> tried nodetool cleanup on a candidate node.  But, afterwards the disk usage
>>>>>>>>> was the same.  Then I tried nodetool repair on that same node.  But again,
>>>>>>>>> disk usage is still the same.  The CF has no snapshots.
>>>>>>>>>
>>>>>>>>> So, am I misunderstanding something?  Is there another operation
>>>>>>>>> to try?  Do I have to "just wait"?  I've only done cleanup/repair on one
>>>>>>>>> node.  Do I have to run one or the other over all nodes to clear
>>>>>>>>> tombstones?
>>>>>>>>>
>>>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> will
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Paulo Motta*
>>>
>>> Chaordic | *Platform*
>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>> +55 48 3232.3200
>>>
>>>
>>>
>>
>>
>>
>

Re: clearing tombstones?

Posted by "Laing, Michael" <mi...@nytimes.com>.

At the cost of really quite a lot of compaction, you can temporarily switch
to SizeTiered, and when that is completely done (check each node), switch
back to Leveled.

it's like doing the laundry twice :)

I've done this on CFs that were about 5GB but I don't see why it wouldn't
work on larger ones.

ml


On Fri, Apr 11, 2014 at 1:33 PM, Paulo Ricardo Motta Gomes <
paulo.motta@chaordicsystems.com> wrote:

> This thread is really informative, thanks for the good feedback.
>
> My question is : Is there a way to force tombstones to be clared with LCS?
> Does scrub help in any case? Or the only solution would be to create a new
> CF and migrate all the data if you intend to do a large CF cleanup?
>
> Cheers,
>
>
> On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy <ma...@boxever.com>wrote:
>
>> Thats great Will, if you could update the thread with the actions you
>> decide to take and the results that would be great.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman <
>> oberman@civicscience.com> wrote:
>>
>>> I've learned a *lot* from this thread.  My thanks to all of the
>>> contributors!
>>>
>>> Paulo: Good luck with LCS.  I wish I could help there, but all of my
>>> CF's are SizeTiered (mostly as I'm on the same schema/same settings since
>>> 0.7...)
>>>
>>> will
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mi...@adgear.com>wrote:
>>>
>>>>
>>>> Levelled Compaction is a wholly different beast when it comes to
>>>> tombstones.
>>>>
>>>> The tombstones are inserted, like any other write really, at the lower
>>>> levels in the leveldb hierarchy.
>>>>
>>>> They are only removed after they have had the chance to "naturally"
>>>> migrate upwards in the leveldb hierarchy to the highest level in your data
>>>> store.  How long that takes depends on:
>>>>  1. The amount of data in your store and the number of levels your LCS
>>>> strategy has
>>>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>>>> forcing upwards compaction and combining
>>>>
>>>> To give you an idea, I had a similar scenario and ran a (slow,
>>>> throttled) delete job on my cluster around December-January.  Here's a
>>>> graph of the disk space usage on one node.  Notice the still-diclining
>>>> usage long after the cleanup job has finished (sometime in January).  I
>>>> tend to think of tombstones in LCS as little bombs that get to explode much
>>>> later in time:
>>>>
>>>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>>>
>>>>
>>>>
>>>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>>>> paulo.motta@chaordicsystems.com> wrote:
>>>>
>>>> I have a similar problem here, I deleted about 30% of a very large CF
>>>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
>>>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>>>> scrub forces a minor compaction?
>>>>
>>>> Cheers,
>>>>
>>>> Paulo
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <ma...@boxever.com>wrote:
>>>>
>>>>> Yes, running nodetool compact (major compaction) creates one large
>>>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>>>> this the compaction strategy you are using?) leading to multiple 'small'
>>>>> SSTables alongside the single large SSTable, which results in increased
>>>>> read latency. You will incur the operational overhead of having to manage
>>>>> compactions if you wish to compact these smaller SSTables. For all these
>>>>> reasons it is generally advised to stay away from running compactions
>>>>> manually.
>>>>>
>>>>> Assuming that this is a production environment and you want to keep
>>>>> everything running as smoothly as possible I would reduce the gc_grace on
>>>>> the CF, allow automatic minor compactions to kick in and then increase the
>>>>> gc_grace once again after the tombstones have been removed.
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>>>> oberman@civicscience.com> wrote:
>>>>>
>>>>>> So, if I was impatient and just "wanted to make this happen now", I
>>>>>> could:
>>>>>>
>>>>>> 1.) Change GCGraceSeconds of the CF to 0
>>>>>> 2.) run nodetool compact (*)
>>>>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>>>>
>>>>>> Since I have ~900M tombstones, even if I miss a few due to
>>>>>> impatience, I don't care *that* much as I could re-run my clean up tool
>>>>>> against the now much smaller CF.
>>>>>>
>>>>>> (*) A long long time ago I seem to recall reading advice about "don't
>>>>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>>>>> long term consequence?  Short term there are several:
>>>>>> -a heavy operation
>>>>>> -temporary 2x disk space
>>>>>> -one big SSTable afterwards
>>>>>> But moving forward, everything is ok right?
>>>>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>>>>> etc...  The only flaw I can think of is it will take forever until the
>>>>>> SSTable minor compactions build up enough to consider including the big
>>>>>> SSTable in a compaction, making it likely I'll have to self manage
>>>>>> compactions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>>>>>
>>>>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>>>>> elapsed. The default value is set to 10 days which allows a great deal of
>>>>>>> time for consistency to be achieved prior to deletion. If you are
>>>>>>> operationally confident that you can achieve consistency via anti-entropy
>>>>>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>>>>>
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>>>>> oberman@civicscience.com> wrote:
>>>>>>>
>>>>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> compaction should take care of it; for me it never worked so I run
>>>>>>>>> nodetool compaction on every node; that does it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <
>>>>>>>>> oberman@civicscience.com>:
>>>>>>>>>
>>>>>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>>>>
>>>>>>>>>> I had a CF that was more or less storing session information.
>>>>>>>>>>  After some time, we decided that one piece of this information was
>>>>>>>>>> pointless to track (and was 90%+ of the columns, and in 99% of those cases
>>>>>>>>>> was ALL columns for a row).   I wrote a process to remove all of those
>>>>>>>>>> columns (which again in a vast majority of cases had the effect of removing
>>>>>>>>>> the whole row).
>>>>>>>>>>
>>>>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m
>>>>>>>>>> rows.  After I did this mass delete, everything was the same size on disk
>>>>>>>>>> (which I expected, knowing how tombstoning works).  It wasn't 100% clear to
>>>>>>>>>> me what to poke to cause compactions to clear the tombstones.  First I
>>>>>>>>>> tried nodetool cleanup on a candidate node.  But, afterwards the disk usage
>>>>>>>>>> was the same.  Then I tried nodetool repair on that same node.  But again,
>>>>>>>>>> disk usage is still the same.  The CF has no snapshots.
>>>>>>>>>>
>>>>>>>>>> So, am I misunderstanding something?  Is there another operation
>>>>>>>>>> to try?  Do I have to "just wait"?  I've only done cleanup/repair on one
>>>>>>>>>> node.  Do I have to run one or the other over all nodes to clear
>>>>>>>>>> tombstones?
>>>>>>>>>>
>>>>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> will
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Paulo Motta*
>>>>
>>>> Chaordic | *Platform*
>>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>>> +55 48 3232.3200
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br <http://www.chaordic.com.br/>*
> +55 48 3232.3200
>

Re: clearing tombstones?

Posted by "Laing, Michael" <mi...@nytimes.com>.

I've never noticed that that setting tombstone_threshold has any effect...
at least in 2.0.6.

What gets written to the log?


On Fri, Apr 11, 2014 at 3:31 PM, DuyHai Doan <do...@gmail.com> wrote:

> I was wondering, to remove the tombstones from Sstables created by LCS,
> why don't we just set the tombstone_threshold table property to a very
> small value (say 0.01)..?
>
> As the doc said (
> www.datastax.com/documentation/cql/3.0/cql/cql_reference/compactSubprop.html)
> this will force compaction on the sstable itself for the purpose of
> cleaning tombstones, no merging with other sstables is done.
>
> In addition this property applies to both compaction strategies :-)
>
> Isn't a little bit lighter than changing strategy and hoping for the best?
>
> Regards
>
> Duy Hai DOAN
>  Le 11 avr. 2014 20:16, "Robert Coli" <rc...@eventbrite.com> a écrit :
>
> On Fri, Apr 11, 2014 at 10:33 AM, Paulo Ricardo Motta Gomes <
>> paulo.motta@chaordicsystems.com> wrote:
>>
>>> My question is : Is there a way to force tombstones to be clared with
>>> LCS? Does scrub help in any case?
>>>
>>
>> 1) Switch to size tiered compaction, compact, and switch back. Not only
>> "with LCS", but...
>>
>> 2)  scrub does a 1:1 rewrite of sstables, watching for corruption. I
>> believe it does throw away tombstones if it is able to, but that is not the
>> purpose of it.
>>
>> =Rob
>>
>>

Re: clearing tombstones?

Posted by DuyHai Doan <do...@gmail.com>.

I was wondering, to remove the tombstones from Sstables created by LCS, why
don't we just set the tombstone_threshold table property to a very small
value (say 0.01)..?

As the doc said (
www.datastax.com/documentation/cql/3.0/cql/cql_reference/compactSubprop.html)
this will force compaction on the sstable itself for the purpose of
cleaning tombstones, no merging with other sstables is done.

In addition this property applies to both compaction strategies :-)

Isn't a little bit lighter than changing strategy and hoping for the best?

Regards

Duy Hai DOAN
 Le 11 avr. 2014 20:16, "Robert Coli" <rc...@eventbrite.com> a écrit :

> On Fri, Apr 11, 2014 at 10:33 AM, Paulo Ricardo Motta Gomes <
> paulo.motta@chaordicsystems.com> wrote:
>
>> My question is : Is there a way to force tombstones to be clared with
>> LCS? Does scrub help in any case?
>>
>
> 1) Switch to size tiered compaction, compact, and switch back. Not only
> "with LCS", but...
>
> 2)  scrub does a 1:1 rewrite of sstables, watching for corruption. I
> believe it does throw away tombstones if it is able to, but that is not the
> purpose of it.
>
> =Rob
>
>

Re: clearing tombstones?

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Apr 11, 2014 at 10:33 AM, Paulo Ricardo Motta Gomes <
paulo.motta@chaordicsystems.com> wrote:

> My question is : Is there a way to force tombstones to be clared with LCS?
> Does scrub help in any case?
>

1) Switch to size tiered compaction, compact, and switch back. Not only
"with LCS", but...

2)  scrub does a 1:1 rewrite of sstables, watching for corruption. I
believe it does throw away tombstones if it is able to, but that is not the
purpose of it.

=Rob

Re: clearing tombstones?

Posted by Paulo Ricardo Motta Gomes <pa...@chaordicsystems.com>.

This thread is really informative, thanks for the good feedback.

My question is : Is there a way to force tombstones to be clared with LCS?
Does scrub help in any case? Or the only solution would be to create a new
CF and migrate all the data if you intend to do a large CF cleanup?

Cheers,


On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy <ma...@boxever.com> wrote:

> Thats great Will, if you could update the thread with the actions you
> decide to take and the results that would be great.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
>> I've learned a *lot* from this thread.  My thanks to all of the
>> contributors!
>>
>> Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
>> are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)
>>
>> will
>>
>>
>>
>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mi...@adgear.com>wrote:
>>
>>>
>>> Levelled Compaction is a wholly different beast when it comes to
>>> tombstones.
>>>
>>> The tombstones are inserted, like any other write really, at the lower
>>> levels in the leveldb hierarchy.
>>>
>>> They are only removed after they have had the chance to "naturally"
>>> migrate upwards in the leveldb hierarchy to the highest level in your data
>>> store.  How long that takes depends on:
>>>  1. The amount of data in your store and the number of levels your LCS
>>> strategy has
>>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>>> forcing upwards compaction and combining
>>>
>>> To give you an idea, I had a similar scenario and ran a (slow,
>>> throttled) delete job on my cluster around December-January.  Here's a
>>> graph of the disk space usage on one node.  Notice the still-diclining
>>> usage long after the cleanup job has finished (sometime in January).  I
>>> tend to think of tombstones in LCS as little bombs that get to explode much
>>> later in time:
>>>
>>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>>
>>>
>>>
>>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>>> paulo.motta@chaordicsystems.com> wrote:
>>>
>>> I have a similar problem here, I deleted about 30% of a very large CF
>>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
>>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>>> scrub forces a minor compaction?
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <ma...@boxever.com>wrote:
>>>
>>>> Yes, running nodetool compact (major compaction) creates one large
>>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>>> this the compaction strategy you are using?) leading to multiple 'small'
>>>> SSTables alongside the single large SSTable, which results in increased
>>>> read latency. You will incur the operational overhead of having to manage
>>>> compactions if you wish to compact these smaller SSTables. For all these
>>>> reasons it is generally advised to stay away from running compactions
>>>> manually.
>>>>
>>>> Assuming that this is a production environment and you want to keep
>>>> everything running as smoothly as possible I would reduce the gc_grace on
>>>> the CF, allow automatic minor compactions to kick in and then increase the
>>>> gc_grace once again after the tombstones have been removed.
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>>> oberman@civicscience.com> wrote:
>>>>
>>>>> So, if I was impatient and just "wanted to make this happen now", I
>>>>> could:
>>>>>
>>>>> 1.) Change GCGraceSeconds of the CF to 0
>>>>> 2.) run nodetool compact (*)
>>>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>>>
>>>>> Since I have ~900M tombstones, even if I miss a few due to impatience,
>>>>> I don't care *that* much as I could re-run my clean up tool against the now
>>>>> much smaller CF.
>>>>>
>>>>> (*) A long long time ago I seem to recall reading advice about "don't
>>>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>>>> long term consequence?  Short term there are several:
>>>>> -a heavy operation
>>>>> -temporary 2x disk space
>>>>> -one big SSTable afterwards
>>>>> But moving forward, everything is ok right?
>>>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>>>> etc...  The only flaw I can think of is it will take forever until the
>>>>> SSTable minor compactions build up enough to consider including the big
>>>>> SSTable in a compaction, making it likely I'll have to self manage
>>>>> compactions.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>>>>
>>>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>>>> elapsed. The default value is set to 10 days which allows a great deal of
>>>>>> time for consistency to be achieved prior to deletion. If you are
>>>>>> operationally confident that you can achieve consistency via anti-entropy
>>>>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>>>>
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>>>> oberman@civicscience.com> wrote:
>>>>>>
>>>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>>
>>>>>>>> compaction should take care of it; for me it never worked so I run
>>>>>>>> nodetool compaction on every node; that does it.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <
>>>>>>>> oberman@civicscience.com>:
>>>>>>>>
>>>>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>>>
>>>>>>>>> I had a CF that was more or less storing session information.
>>>>>>>>>  After some time, we decided that one piece of this information was
>>>>>>>>> pointless to track (and was 90%+ of the columns, and in 99% of those cases
>>>>>>>>> was ALL columns for a row).   I wrote a process to remove all of those
>>>>>>>>> columns (which again in a vast majority of cases had the effect of removing
>>>>>>>>> the whole row).
>>>>>>>>>
>>>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m
>>>>>>>>> rows.  After I did this mass delete, everything was the same size on disk
>>>>>>>>> (which I expected, knowing how tombstoning works).  It wasn't 100% clear to
>>>>>>>>> me what to poke to cause compactions to clear the tombstones.  First I
>>>>>>>>> tried nodetool cleanup on a candidate node.  But, afterwards the disk usage
>>>>>>>>> was the same.  Then I tried nodetool repair on that same node.  But again,
>>>>>>>>> disk usage is still the same.  The CF has no snapshots.
>>>>>>>>>
>>>>>>>>> So, am I misunderstanding something?  Is there another operation
>>>>>>>>> to try?  Do I have to "just wait"?  I've only done cleanup/repair on one
>>>>>>>>> node.  Do I have to run one or the other over all nodes to clear
>>>>>>>>> tombstones?
>>>>>>>>>
>>>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> will
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Paulo Motta*
>>>
>>> Chaordic | *Platform*
>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>> +55 48 3232.3200
>>>
>>>
>>>
>>
>>
>>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: clearing tombstones?

Posted by Mark Reddy <ma...@boxever.com>.

Thats great Will, if you could update the thread with the actions you
decide to take and the results that would be great.


Mark


On Fri, Apr 11, 2014 at 5:53 PM, William Oberman
<ob...@civicscience.com>wrote:

> I've learned a *lot* from this thread.  My thanks to all of the
> contributors!
>
> Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
> are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)
>
> will
>
>
>
> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mi...@adgear.com>wrote:
>
>>
>> Levelled Compaction is a wholly different beast when it comes to
>> tombstones.
>>
>> The tombstones are inserted, like any other write really, at the lower
>> levels in the leveldb hierarchy.
>>
>> They are only removed after they have had the chance to "naturally"
>> migrate upwards in the leveldb hierarchy to the highest level in your data
>> store.  How long that takes depends on:
>>  1. The amount of data in your store and the number of levels your LCS
>> strategy has
>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>> forcing upwards compaction and combining
>>
>> To give you an idea, I had a similar scenario and ran a (slow, throttled)
>> delete job on my cluster around December-January.  Here's a graph of the
>> disk space usage on one node.  Notice the still-diclining usage long after
>> the cleanup job has finished (sometime in January).  I tend to think of
>> tombstones in LCS as little bombs that get to explode much later in time:
>>
>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>
>>
>>
>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>> paulo.motta@chaordicsystems.com> wrote:
>>
>> I have a similar problem here, I deleted about 30% of a very large CF
>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>> scrub forces a minor compaction?
>>
>> Cheers,
>>
>> Paulo
>>
>>
>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <ma...@boxever.com>wrote:
>>
>>> Yes, running nodetool compact (major compaction) creates one large
>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>> this the compaction strategy you are using?) leading to multiple 'small'
>>> SSTables alongside the single large SSTable, which results in increased
>>> read latency. You will incur the operational overhead of having to manage
>>> compactions if you wish to compact these smaller SSTables. For all these
>>> reasons it is generally advised to stay away from running compactions
>>> manually.
>>>
>>> Assuming that this is a production environment and you want to keep
>>> everything running as smoothly as possible I would reduce the gc_grace on
>>> the CF, allow automatic minor compactions to kick in and then increase the
>>> gc_grace once again after the tombstones have been removed.
>>>
>>>
>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>> oberman@civicscience.com> wrote:
>>>
>>>> So, if I was impatient and just "wanted to make this happen now", I
>>>> could:
>>>>
>>>> 1.) Change GCGraceSeconds of the CF to 0
>>>> 2.) run nodetool compact (*)
>>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>>
>>>> Since I have ~900M tombstones, even if I miss a few due to impatience,
>>>> I don't care *that* much as I could re-run my clean up tool against the now
>>>> much smaller CF.
>>>>
>>>> (*) A long long time ago I seem to recall reading advice about "don't
>>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>>> long term consequence?  Short term there are several:
>>>> -a heavy operation
>>>> -temporary 2x disk space
>>>> -one big SSTable afterwards
>>>> But moving forward, everything is ok right?
>>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>>> etc...  The only flaw I can think of is it will take forever until the
>>>> SSTable minor compactions build up enough to consider including the big
>>>> SSTable in a compaction, making it likely I'll have to self manage
>>>> compactions.
>>>>
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>>>
>>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>>> elapsed. The default value is set to 10 days which allows a great deal of
>>>>> time for consistency to be achieved prior to deletion. If you are
>>>>> operationally confident that you can achieve consistency via anti-entropy
>>>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>>>
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>>> oberman@civicscience.com> wrote:
>>>>>
>>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>
>>>>>>> compaction should take care of it; for me it never worked so I run
>>>>>>> nodetool compaction on every node; that does it.
>>>>>>>
>>>>>>>
>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <oberman@civicscience.com
>>>>>>> >:
>>>>>>>
>>>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>>
>>>>>>>> I had a CF that was more or less storing session information.
>>>>>>>>  After some time, we decided that one piece of this information was
>>>>>>>> pointless to track (and was 90%+ of the columns, and in 99% of those cases
>>>>>>>> was ALL columns for a row).   I wrote a process to remove all of those
>>>>>>>> columns (which again in a vast majority of cases had the effect of removing
>>>>>>>> the whole row).
>>>>>>>>
>>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m
>>>>>>>> rows.  After I did this mass delete, everything was the same size on disk
>>>>>>>> (which I expected, knowing how tombstoning works).  It wasn't 100% clear to
>>>>>>>> me what to poke to cause compactions to clear the tombstones.  First I
>>>>>>>> tried nodetool cleanup on a candidate node.  But, afterwards the disk usage
>>>>>>>> was the same.  Then I tried nodetool repair on that same node.  But again,
>>>>>>>> disk usage is still the same.  The CF has no snapshots.
>>>>>>>>
>>>>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>>>>
>>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> will
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> *Paulo Motta*
>>
>> Chaordic | *Platform*
>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>> +55 48 3232.3200
>>
>>
>>
>
>
>

Re: clearing tombstones?

Posted by William Oberman <ob...@civicscience.com>.

I've learned a *lot* from this thread.  My thanks to all of the
contributors!

Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)

will


On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mi...@adgear.com>wrote:

>
> Levelled Compaction is a wholly different beast when it comes to
> tombstones.
>
> The tombstones are inserted, like any other write really, at the lower
> levels in the leveldb hierarchy.
>
> They are only removed after they have had the chance to "naturally"
> migrate upwards in the leveldb hierarchy to the highest level in your data
> store.  How long that takes depends on:
>  1. The amount of data in your store and the number of levels your LCS
> strategy has
> 2. The amount of new writes entering the bottom funnel of your leveldb,
> forcing upwards compaction and combining
>
> To give you an idea, I had a similar scenario and ran a (slow, throttled)
> delete job on my cluster around December-January.  Here's a graph of the
> disk space usage on one node.  Notice the still-diclining usage long after
> the cleanup job has finished (sometime in January).  I tend to think of
> tombstones in LCS as little bombs that get to explode much later in time:
>
> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>
>
>
> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
> paulo.motta@chaordicsystems.com> wrote:
>
> I have a similar problem here, I deleted about 30% of a very large CF
> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
> scrub forces a minor compaction?
>
> Cheers,
>
> Paulo
>
>
> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <ma...@boxever.com>wrote:
>
>> Yes, running nodetool compact (major compaction) creates one large
>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>> this the compaction strategy you are using?) leading to multiple 'small'
>> SSTables alongside the single large SSTable, which results in increased
>> read latency. You will incur the operational overhead of having to manage
>> compactions if you wish to compact these smaller SSTables. For all these
>> reasons it is generally advised to stay away from running compactions
>> manually.
>>
>> Assuming that this is a production environment and you want to keep
>> everything running as smoothly as possible I would reduce the gc_grace on
>> the CF, allow automatic minor compactions to kick in and then increase the
>> gc_grace once again after the tombstones have been removed.
>>
>>
>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>> oberman@civicscience.com> wrote:
>>
>>> So, if I was impatient and just "wanted to make this happen now", I
>>> could:
>>>
>>> 1.) Change GCGraceSeconds of the CF to 0
>>> 2.) run nodetool compact (*)
>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>
>>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>>> don't care *that* much as I could re-run my clean up tool against the now
>>> much smaller CF.
>>>
>>> (*) A long long time ago I seem to recall reading advice about "don't
>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>> long term consequence?  Short term there are several:
>>> -a heavy operation
>>> -temporary 2x disk space
>>> -one big SSTable afterwards
>>> But moving forward, everything is ok right?
>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>> etc...  The only flaw I can think of is it will take forever until the
>>> SSTable minor compactions build up enough to consider including the big
>>> SSTable in a compaction, making it likely I'll have to self manage
>>> compactions.
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>>
>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>> elapsed. The default value is set to 10 days which allows a great deal of
>>>> time for consistency to be achieved prior to deletion. If you are
>>>> operationally confident that you can achieve consistency via anti-entropy
>>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>>
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>> oberman@civicscience.com> wrote:
>>>>
>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>> tbarbugli@gmail.com> wrote:
>>>>>
>>>>>> compaction should take care of it; for me it never worked so I run
>>>>>> nodetool compaction on every node; that does it.
>>>>>>
>>>>>>
>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>
>>>>>> :
>>>>>>
>>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>
>>>>>>> I had a CF that was more or less storing session information.  After
>>>>>>> some time, we decided that one piece of this information was pointless to
>>>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>>>>> columns for a row).   I wrote a process to remove all of those columns
>>>>>>> (which again in a vast majority of cases had the effect of removing the
>>>>>>> whole row).
>>>>>>>
>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>>>>  After I did this mass delete, everything was the same size on disk (which
>>>>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>>>>> usage is still the same.  The CF has no snapshots.
>>>>>>>
>>>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>>>
>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> will
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br <http://www.chaordic.com.br/>*
> +55 48 3232.3200
>
>
>

Re: clearing tombstones?

Posted by Mina Naguib <mi...@adgear.com>.

Levelled Compaction is a wholly different beast when it comes to tombstones.

The tombstones are inserted, like any other write really, at the lower levels in the leveldb hierarchy.

They are only removed after they have had the chance to "naturally" migrate upwards in the leveldb hierarchy to the highest level in your data store.  How long that takes depends on:
	1. The amount of data in your store and the number of levels your LCS strategy has
	2. The amount of new writes entering the bottom funnel of your leveldb, forcing upwards compaction and combining

To give you an idea, I had a similar scenario and ran a (slow, throttled) delete job on my cluster around December-January.  Here's a graph of the disk space usage on one node.  Notice the still-diclining usage long after the cleanup job has finished (sometime in January).  I tend to think of tombstones in LCS as little bombs that get to explode much later in time:

http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg



On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <pa...@chaordicsystems.com> wrote:

> I have a similar problem here, I deleted about 30% of a very large CF using LCS (about 80GB per node), but still my data hasn't shrinked, even if I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool scrub forces a minor compaction?
> 
> Cheers,
> 
> Paulo
> 
> 
> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <ma...@boxever.com> wrote:
> Yes, running nodetool compact (major compaction) creates one large SSTable. This will mess up the heuristics of the SizeTiered strategy (is this the compaction strategy you are using?) leading to multiple 'small' SSTables alongside the single large SSTable, which results in increased read latency. You will incur the operational overhead of having to manage compactions if you wish to compact these smaller SSTables. For all these reasons it is generally advised to stay away from running compactions manually.
> 
> Assuming that this is a production environment and you want to keep everything running as smoothly as possible I would reduce the gc_grace on the CF, allow automatic minor compactions to kick in and then increase the gc_grace once again after the tombstones have been removed.
> 
> 
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <ob...@civicscience.com> wrote:
> So, if I was impatient and just "wanted to make this happen now", I could:
> 
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
> 
> Since I have ~900M tombstones, even if I miss a few due to impatience, I don't care *that* much as I could re-run my clean up tool against the now much smaller CF.
> 
> (*) A long long time ago I seem to recall reading advice about "don't ever run nodetool compact", but I can't remember why.  Is there any bad long term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables, minor compactions that merge SSTables, etc...  The only flaw I can think of is it will take forever until the SSTable minor compactions build up enough to consider including the big SSTable in a compaction, making it likely I'll have to self manage compactions.
> 
> 
> 
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com> wrote:
> Correct, a tombstone will only be removed after gc_grace period has elapsed. The default value is set to 10 days which allows a great deal of time for consistency to be achieved prior to deletion. If you are operationally confident that you can achieve consistency via anti-entropy repairs within a shorter period you can always reduce that 10 day interval.
> 
> 
> Mark
> 
> 
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <ob...@civicscience.com> wrote:
> I'm seeing a lot of articles about a dependency between removing tombstones and GCGraceSeconds, which might be my problem (I just checked, and this CF has GCGraceSeconds of 10 days).
> 
> 
> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com> wrote:
> compaction should take care of it; for me it never worked so I run nodetool compaction on every node; that does it.
> 
> 
> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
> 
> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool repair, or time (as in just wait)?
> 
> I had a CF that was more or less storing session information.  After some time, we decided that one piece of this information was pointless to track (and was 90%+ of the columns, and in 99% of those cases was ALL columns for a row).   I wrote a process to remove all of those columns (which again in a vast majority of cases had the effect of removing the whole row).
> 
> This CF had ~1 billion rows, so I expect to be left with ~100m rows.  After I did this mass delete, everything was the same size on disk (which I expected, knowing how tombstoning works).  It wasn't 100% clear to me what to poke to cause compactions to clear the tombstones.  First I tried nodetool cleanup on a candidate node.  But, afterwards the disk usage was the same.  Then I tried nodetool repair on that same node.  But again, disk usage is still the same.  The CF has no snapshots.  
> 
> So, am I misunderstanding something?  Is there another operation to try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do I have to run one or the other over all nodes to clear tombstones? 
> 
> Cassandra 1.2.15 if it matters,
> 
> Thanks!
> 
> will
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Paulo Motta
> 
> Chaordic | Platform
> www.chaordic.com.br
> +55 48 3232.3200

Re: clearing tombstones?

Posted by Paulo Ricardo Motta Gomes <pa...@chaordicsystems.com>.

I have a similar problem here, I deleted about 30% of a very large CF using
LCS (about 80GB per node), but still my data hasn't shrinked, even if I
used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
scrub forces a minor compaction?

Cheers,

Paulo


On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <ma...@boxever.com> wrote:

> Yes, running nodetool compact (major compaction) creates one large
> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
> this the compaction strategy you are using?) leading to multiple 'small'
> SSTables alongside the single large SSTable, which results in increased
> read latency. You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables. For all these
> reasons it is generally advised to stay away from running compactions
> manually.
>
> Assuming that this is a production environment and you want to keep
> everything running as smoothly as possible I would reduce the gc_grace on
> the CF, allow automatic minor compactions to kick in and then increase the
> gc_grace once again after the tombstones have been removed.
>
>
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
>> So, if I was impatient and just "wanted to make this happen now", I could:
>>
>> 1.) Change GCGraceSeconds of the CF to 0
>> 2.) run nodetool compact (*)
>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>
>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>> don't care *that* much as I could re-run my clean up tool against the now
>> much smaller CF.
>>
>> (*) A long long time ago I seem to recall reading advice about "don't
>> ever run nodetool compact", but I can't remember why.  Is there any bad
>> long term consequence?  Short term there are several:
>> -a heavy operation
>> -temporary 2x disk space
>> -one big SSTable afterwards
>> But moving forward, everything is ok right?
>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>> etc...  The only flaw I can think of is it will take forever until the
>> SSTable minor compactions build up enough to consider including the big
>> SSTable in a compaction, making it likely I'll have to self manage
>> compactions.
>>
>>
>>
>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>
>>> Correct, a tombstone will only be removed after gc_grace period has
>>> elapsed. The default value is set to 10 days which allows a great deal of
>>> time for consistency to be achieved prior to deletion. If you are
>>> operationally confident that you can achieve consistency via anti-entropy
>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>
>>>
>>> Mark
>>>
>>>
>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>> oberman@civicscience.com> wrote:
>>>
>>>> I'm seeing a lot of articles about a dependency between removing
>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>> and this CF has GCGraceSeconds of 10 days).
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tbarbugli@gmail.com
>>>> > wrote:
>>>>
>>>>> compaction should take care of it; for me it never worked so I run
>>>>> nodetool compaction on every node; that does it.
>>>>>
>>>>>
>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>>>>>
>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>
>>>>>> I had a CF that was more or less storing session information.  After
>>>>>> some time, we decided that one piece of this information was pointless to
>>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>>>> columns for a row).   I wrote a process to remove all of those columns
>>>>>> (which again in a vast majority of cases had the effect of removing the
>>>>>> whole row).
>>>>>>
>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>>>  After I did this mass delete, everything was the same size on disk (which
>>>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>>>> usage is still the same.  The CF has no snapshots.
>>>>>>
>>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>>
>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> will
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: clearing tombstones?

Posted by Mark Reddy <ma...@boxever.com>.

To clarify, you would want to manage compactions only if you were concerned
about read latency. If you update rows, those rows may become spread across
an increasing number of SSTables leading to increased read latency.

Thanks for providing some insight into your use case as it does differ from
the norm. If you would consider 50GB a small CF and your data ingestion
sufficient enough to result in more SSTables of similar size soon, yes you
could run a major compaction will little operational overhead and the
compaction strategies heuristics would level out after some time.


On Fri, Apr 11, 2014 at 4:52 PM, Laing, Michael
<mi...@nytimes.com>wrote:

> I have played with this quite a bit and recommend you set gc_grace_seconds
> to 0 and use 'nodetool compact [keyspace] [cfname]' on your table.
>
> A caveat I have is that we use C* 2.0.6 - but the space we expect to
> recover is in fact recovered.
>
> Actually, since we never delete explicitly (just ttl) we always have
> gc_grace_seconds set to 0.
>
> Another important caveat is to be careful with repair: having set gc to 0
> and compacted on a node, if you then repair it, data may come streaming in
> from the other nodes. We don't run into this, as our gc is always 0, but
> others may be able to comment.
>
> ml
>
>
> On Fri, Apr 11, 2014 at 11:26 AM, William Oberman <
> oberman@civicscience.com> wrote:
>
>> Yes, I'm using SizeTiered.
>>
>> I totally understand the "mess up the heuristics" issue.  But, I don't
>> understand "You will incur the operational overhead of having to manage
>> compactions if you wish to compact these smaller SSTables".  My
>> understanding is the small tables will still compact.  The problem is that
>> until I have 3 other (by default) tables of the same size as the "big
>> table", it won't be compacted.
>>
>> In my case, this might not be terrible though, right?  To get into the
>> trees, I have 9 nodes with RF=3 and this CF is ~500GB/node.  I deleted like
>> 90-95% of the data, so I expect the data to be 25-50GB after the tombstones
>> are cleared, but call it 50GB.  That means I won't compact this 50GB file
>> until I gather another 150GB (50,50,50,50->200).   But, that's not
>> *horrible*.  Now, if I only deleted 10% of the data, waiting to compact
>> 450GB until I had another 1.3TB would be rough...
>>
>> I think your advice is great for people looking for "normal" answers in
>> the forum, but I don't think my use case is very normal :-)
>>
>> will
>>
>> On Fri, Apr 11, 2014 at 11:12 AM, Mark Reddy <ma...@boxever.com>wrote:
>>
>>> Yes, running nodetool compact (major compaction) creates one large
>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>> this the compaction strategy you are using?) leading to multiple 'small'
>>> SSTables alongside the single large SSTable, which results in increased
>>> read latency. You will incur the operational overhead of having to manage
>>> compactions if you wish to compact these smaller SSTables. For all these
>>> reasons it is generally advised to stay away from running compactions
>>> manually.
>>>
>>> Assuming that this is a production environment and you want to keep
>>> everything running as smoothly as possible I would reduce the gc_grace on
>>> the CF, allow automatic minor compactions to kick in and then increase the
>>> gc_grace once again after the tombstones have been removed.
>>>
>>>
>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>> oberman@civicscience.com> wrote:
>>>
>>>> So, if I was impatient and just "wanted to make this happen now", I
>>>> could:
>>>>
>>>> 1.) Change GCGraceSeconds of the CF to 0
>>>> 2.) run nodetool compact (*)
>>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>>
>>>> Since I have ~900M tombstones, even if I miss a few due to impatience,
>>>> I don't care *that* much as I could re-run my clean up tool against the now
>>>> much smaller CF.
>>>>
>>>> (*) A long long time ago I seem to recall reading advice about "don't
>>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>>> long term consequence?  Short term there are several:
>>>> -a heavy operation
>>>> -temporary 2x disk space
>>>> -one big SSTable afterwards
>>>> But moving forward, everything is ok right?
>>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>>> etc...  The only flaw I can think of is it will take forever until the
>>>> SSTable minor compactions build up enough to consider including the big
>>>> SSTable in a compaction, making it likely I'll have to self manage
>>>> compactions.
>>>>
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>>>
>>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>>> elapsed. The default value is set to 10 days which allows a great deal of
>>>>> time for consistency to be achieved prior to deletion. If you are
>>>>> operationally confident that you can achieve consistency via anti-entropy
>>>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>>>
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>>> oberman@civicscience.com> wrote:
>>>>>
>>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>
>>>>>>> compaction should take care of it; for me it never worked so I run
>>>>>>> nodetool compaction on every node; that does it.
>>>>>>>
>>>>>>>
>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <oberman@civicscience.com
>>>>>>> >:
>>>>>>>
>>>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>>
>>>>>>>> I had a CF that was more or less storing session information.
>>>>>>>>  After some time, we decided that one piece of this information was
>>>>>>>> pointless to track (and was 90%+ of the columns, and in 99% of those cases
>>>>>>>> was ALL columns for a row).   I wrote a process to remove all of those
>>>>>>>> columns (which again in a vast majority of cases had the effect of removing
>>>>>>>> the whole row).
>>>>>>>>
>>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m
>>>>>>>> rows.  After I did this mass delete, everything was the same size on disk
>>>>>>>> (which I expected, knowing how tombstoning works).  It wasn't 100% clear to
>>>>>>>> me what to poke to cause compactions to clear the tombstones.  First I
>>>>>>>> tried nodetool cleanup on a candidate node.  But, afterwards the disk usage
>>>>>>>> was the same.  Then I tried nodetool repair on that same node.  But again,
>>>>>>>> disk usage is still the same.  The CF has no snapshots.
>>>>>>>>
>>>>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>>>>
>>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> will
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: clearing tombstones?

Posted by "Laing, Michael" <mi...@nytimes.com>.

I have played with this quite a bit and recommend you set gc_grace_seconds
to 0 and use 'nodetool compact [keyspace] [cfname]' on your table.

A caveat I have is that we use C* 2.0.6 - but the space we expect to
recover is in fact recovered.

Actually, since we never delete explicitly (just ttl) we always have
gc_grace_seconds set to 0.

Another important caveat is to be careful with repair: having set gc to 0
and compacted on a node, if you then repair it, data may come streaming in
from the other nodes. We don't run into this, as our gc is always 0, but
others may be able to comment.

ml


On Fri, Apr 11, 2014 at 11:26 AM, William Oberman
<ob...@civicscience.com>wrote:

> Yes, I'm using SizeTiered.
>
> I totally understand the "mess up the heuristics" issue.  But, I don't
> understand "You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables".  My
> understanding is the small tables will still compact.  The problem is that
> until I have 3 other (by default) tables of the same size as the "big
> table", it won't be compacted.
>
> In my case, this might not be terrible though, right?  To get into the
> trees, I have 9 nodes with RF=3 and this CF is ~500GB/node.  I deleted like
> 90-95% of the data, so I expect the data to be 25-50GB after the tombstones
> are cleared, but call it 50GB.  That means I won't compact this 50GB file
> until I gather another 150GB (50,50,50,50->200).   But, that's not
> *horrible*.  Now, if I only deleted 10% of the data, waiting to compact
> 450GB until I had another 1.3TB would be rough...
>
> I think your advice is great for people looking for "normal" answers in
> the forum, but I don't think my use case is very normal :-)
>
> will
>
> On Fri, Apr 11, 2014 at 11:12 AM, Mark Reddy <ma...@boxever.com>wrote:
>
>> Yes, running nodetool compact (major compaction) creates one large
>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>> this the compaction strategy you are using?) leading to multiple 'small'
>> SSTables alongside the single large SSTable, which results in increased
>> read latency. You will incur the operational overhead of having to manage
>> compactions if you wish to compact these smaller SSTables. For all these
>> reasons it is generally advised to stay away from running compactions
>> manually.
>>
>> Assuming that this is a production environment and you want to keep
>> everything running as smoothly as possible I would reduce the gc_grace on
>> the CF, allow automatic minor compactions to kick in and then increase the
>> gc_grace once again after the tombstones have been removed.
>>
>>
>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>> oberman@civicscience.com> wrote:
>>
>>> So, if I was impatient and just "wanted to make this happen now", I
>>> could:
>>>
>>> 1.) Change GCGraceSeconds of the CF to 0
>>> 2.) run nodetool compact (*)
>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>
>>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>>> don't care *that* much as I could re-run my clean up tool against the now
>>> much smaller CF.
>>>
>>> (*) A long long time ago I seem to recall reading advice about "don't
>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>> long term consequence?  Short term there are several:
>>> -a heavy operation
>>> -temporary 2x disk space
>>> -one big SSTable afterwards
>>> But moving forward, everything is ok right?
>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>> etc...  The only flaw I can think of is it will take forever until the
>>> SSTable minor compactions build up enough to consider including the big
>>> SSTable in a compaction, making it likely I'll have to self manage
>>> compactions.
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>>
>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>> elapsed. The default value is set to 10 days which allows a great deal of
>>>> time for consistency to be achieved prior to deletion. If you are
>>>> operationally confident that you can achieve consistency via anti-entropy
>>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>>
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>> oberman@civicscience.com> wrote:
>>>>
>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>> tbarbugli@gmail.com> wrote:
>>>>>
>>>>>> compaction should take care of it; for me it never worked so I run
>>>>>> nodetool compaction on every node; that does it.
>>>>>>
>>>>>>
>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>
>>>>>> :
>>>>>>
>>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>
>>>>>>> I had a CF that was more or less storing session information.  After
>>>>>>> some time, we decided that one piece of this information was pointless to
>>>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>>>>> columns for a row).   I wrote a process to remove all of those columns
>>>>>>> (which again in a vast majority of cases had the effect of removing the
>>>>>>> whole row).
>>>>>>>
>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>>>>  After I did this mass delete, everything was the same size on disk (which
>>>>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>>>>> usage is still the same.  The CF has no snapshots.
>>>>>>>
>>>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>>>
>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> will
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: clearing tombstones?

Posted by William Oberman <ob...@civicscience.com>.

Yes, I'm using SizeTiered.

I totally understand the "mess up the heuristics" issue.  But, I don't
understand "You will incur the operational overhead of having to manage
compactions if you wish to compact these smaller SSTables".  My
understanding is the small tables will still compact.  The problem is that
until I have 3 other (by default) tables of the same size as the "big
table", it won't be compacted.

In my case, this might not be terrible though, right?  To get into the
trees, I have 9 nodes with RF=3 and this CF is ~500GB/node.  I deleted like
90-95% of the data, so I expect the data to be 25-50GB after the tombstones
are cleared, but call it 50GB.  That means I won't compact this 50GB file
until I gather another 150GB (50,50,50,50->200).   But, that's not
*horrible*.  Now, if I only deleted 10% of the data, waiting to compact
450GB until I had another 1.3TB would be rough...

I think your advice is great for people looking for "normal" answers in the
forum, but I don't think my use case is very normal :-)

will

On Fri, Apr 11, 2014 at 11:12 AM, Mark Reddy <ma...@boxever.com> wrote:

> Yes, running nodetool compact (major compaction) creates one large
> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
> this the compaction strategy you are using?) leading to multiple 'small'
> SSTables alongside the single large SSTable, which results in increased
> read latency. You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables. For all these
> reasons it is generally advised to stay away from running compactions
> manually.
>
> Assuming that this is a production environment and you want to keep
> everything running as smoothly as possible I would reduce the gc_grace on
> the CF, allow automatic minor compactions to kick in and then increase the
> gc_grace once again after the tombstones have been removed.
>
>
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
>> So, if I was impatient and just "wanted to make this happen now", I could:
>>
>> 1.) Change GCGraceSeconds of the CF to 0
>> 2.) run nodetool compact (*)
>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>
>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>> don't care *that* much as I could re-run my clean up tool against the now
>> much smaller CF.
>>
>> (*) A long long time ago I seem to recall reading advice about "don't
>> ever run nodetool compact", but I can't remember why.  Is there any bad
>> long term consequence?  Short term there are several:
>> -a heavy operation
>> -temporary 2x disk space
>> -one big SSTable afterwards
>> But moving forward, everything is ok right?
>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>> etc...  The only flaw I can think of is it will take forever until the
>> SSTable minor compactions build up enough to consider including the big
>> SSTable in a compaction, making it likely I'll have to self manage
>> compactions.
>>
>>
>>
>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>>
>>> Correct, a tombstone will only be removed after gc_grace period has
>>> elapsed. The default value is set to 10 days which allows a great deal of
>>> time for consistency to be achieved prior to deletion. If you are
>>> operationally confident that you can achieve consistency via anti-entropy
>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>
>>>
>>> Mark
>>>
>>>
>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>> oberman@civicscience.com> wrote:
>>>
>>>> I'm seeing a lot of articles about a dependency between removing
>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>>> and this CF has GCGraceSeconds of 10 days).
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tbarbugli@gmail.com
>>>> > wrote:
>>>>
>>>>> compaction should take care of it; for me it never worked so I run
>>>>> nodetool compaction on every node; that does it.
>>>>>
>>>>>
>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>>>>>
>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>
>>>>>> I had a CF that was more or less storing session information.  After
>>>>>> some time, we decided that one piece of this information was pointless to
>>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>>>> columns for a row).   I wrote a process to remove all of those columns
>>>>>> (which again in a vast majority of cases had the effect of removing the
>>>>>> whole row).
>>>>>>
>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>>>  After I did this mass delete, everything was the same size on disk (which
>>>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>>>> usage is still the same.  The CF has no snapshots.
>>>>>>
>>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>>
>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> will
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: clearing tombstones?

Posted by Mark Reddy <ma...@boxever.com>.

Yes, running nodetool compact (major compaction) creates one large SSTable.
This will mess up the heuristics of the SizeTiered strategy (is this the
compaction strategy you are using?) leading to multiple 'small' SSTables
alongside the single large SSTable, which results in increased read
latency. You will incur the operational overhead of having to manage
compactions if you wish to compact these smaller SSTables. For all these
reasons it is generally advised to stay away from running compactions
manually.

Assuming that this is a production environment and you want to keep
everything running as smoothly as possible I would reduce the gc_grace on
the CF, allow automatic minor compactions to kick in and then increase the
gc_grace once again after the tombstones have been removed.


On Fri, Apr 11, 2014 at 3:44 PM, William Oberman
<ob...@civicscience.com>wrote:

> So, if I was impatient and just "wanted to make this happen now", I could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to impatience, I
> don't care *that* much as I could re-run my clean up tool against the now
> much smaller CF.
>
> (*) A long long time ago I seem to recall reading advice about "don't ever
> run nodetool compact", but I can't remember why.  Is there any bad long
> term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
> minor compactions that merge SSTables, etc...  The only flaw I can think of
> is it will take forever until the SSTable minor compactions build up enough
> to consider including the big SSTable in a compaction, making it likely
> I'll have to self manage compactions.
>
>
>
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>
>> Correct, a tombstone will only be removed after gc_grace period has
>> elapsed. The default value is set to 10 days which allows a great deal of
>> time for consistency to be achieved prior to deletion. If you are
>> operationally confident that you can achieve consistency via anti-entropy
>> repairs within a shorter period you can always reduce that 10 day interval.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>> oberman@civicscience.com> wrote:
>>
>>> I'm seeing a lot of articles about a dependency between removing
>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>> and this CF has GCGraceSeconds of 10 days).
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com>wrote:
>>>
>>>> compaction should take care of it; for me it never worked so I run
>>>> nodetool compaction on every node; that does it.
>>>>
>>>>
>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>>>>
>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>> nodetool repair, or time (as in just wait)?
>>>>>
>>>>> I had a CF that was more or less storing session information.  After
>>>>> some time, we decided that one piece of this information was pointless to
>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>>> columns for a row).   I wrote a process to remove all of those columns
>>>>> (which again in a vast majority of cases had the effect of removing the
>>>>> whole row).
>>>>>
>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>>  After I did this mass delete, everything was the same size on disk (which
>>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>>> usage is still the same.  The CF has no snapshots.
>>>>>
>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>
>>>>> Cassandra 1.2.15 if it matters,
>>>>>
>>>>> Thanks!
>>>>>
>>>>> will
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: clearing tombstones?

Posted by William Oberman <ob...@civicscience.com>.

Answered my own question.  Good writeup here of the pros/cons of compact:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_about_config_compact_c.html

And I was thinking of bad information that used to float in this forum
about major compactions (with respect to the impact to minor compactions).
 I'm hesitant to write the offending sentence again :-)


On Fri, Apr 11, 2014 at 10:44 AM, William Oberman
<ob...@civicscience.com>wrote:

> So, if I was impatient and just "wanted to make this happen now", I could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to impatience, I
> don't care *that* much as I could re-run my clean up tool against the now
> much smaller CF.
>
> (*) A long long time ago I seem to recall reading advice about "don't ever
> run nodetool compact", but I can't remember why.  Is there any bad long
> term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
> minor compactions that merge SSTables, etc...  The only flaw I can think of
> is it will take forever until the SSTable minor compactions build up enough
> to consider including the big SSTable in a compaction, making it likely
> I'll have to self manage compactions.
>
>
>
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>
>> Correct, a tombstone will only be removed after gc_grace period has
>> elapsed. The default value is set to 10 days which allows a great deal of
>> time for consistency to be achieved prior to deletion. If you are
>> operationally confident that you can achieve consistency via anti-entropy
>> repairs within a shorter period you can always reduce that 10 day interval.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>> oberman@civicscience.com> wrote:
>>
>>> I'm seeing a lot of articles about a dependency between removing
>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>> and this CF has GCGraceSeconds of 10 days).
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com>wrote:
>>>
>>>> compaction should take care of it; for me it never worked so I run
>>>> nodetool compaction on every node; that does it.
>>>>
>>>>
>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>>>>
>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>> nodetool repair, or time (as in just wait)?
>>>>>
>>>>> I had a CF that was more or less storing session information.  After
>>>>> some time, we decided that one piece of this information was pointless to
>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>>> columns for a row).   I wrote a process to remove all of those columns
>>>>> (which again in a vast majority of cases had the effect of removing the
>>>>> whole row).
>>>>>
>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>>  After I did this mass delete, everything was the same size on disk (which
>>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>>> usage is still the same.  The CF has no snapshots.
>>>>>
>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>
>>>>> Cassandra 1.2.15 if it matters,
>>>>>
>>>>> Thanks!
>>>>>
>>>>> will
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: clearing tombstones?

Posted by William Oberman <ob...@civicscience.com>.

Not an expert, just a user of cassandra. For me, "before" was a cf with a
set of files (I forget the official naming system, so I'll make up my own):
A0
A1
...
AN

"During":
A0
A1
...
AN
B0

Where B0 is the union of Ai. Due to tombstones, mutations, etc.  B0 is "at
most" 2x, but also probably close to 2x (unless you are all tombstones,
like me).

"After"
B0

Since cassandra can clean up Ai. Not sure when this happens.

Not sure what state you are in above. Sounds like between "during" and
"after".

Will

On Thursday, May 8, 2014, Ruchir Jha <ru...@gmail.com> wrote:

> I tried to do this, however the doubling in disk space is not "temporary"
> as you state in your note. What am I missing?
>
>
> On Fri, Apr 11, 2014 at 10:44 AM, William Oberman <
> oberman@civicscience.com<javascript:_e(%7B%7D,'cvml','oberman@civicscience.com');>
> > wrote:
>
> So, if I was impatient and just "wanted to make this happen now", I could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to impatience, I
> don't care *that* much as I could re-run my clean up tool against the now
> much smaller CF.
>
> (*) A long long time ago I seem to recall reading advice about "don't ever
> run nodetool compact", but I can't remember why.  Is there any bad long
> term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
> minor compactions that merge SSTables, etc...  The only flaw I can think of
> is it will take forever until the SSTable minor compactions build up enough
> to consider including the big SSTable in a compaction, making it likely
> I'll have to self manage compactions.
>
>
>
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>
> Correct, a tombstone will only be removed after gc_grace period has
> elapsed. The default value is set to 10 days which allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day interval.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
> I'm seeing a lot of articles about a dependency between removing
> tombstones and GCGraceSeconds, which might be my problem (I just checked,
> and this CF has GCGraceSeconds of 10 days).
>
>
> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com>wrote:
>
> compaction should take care of it; for me it never worked so I run
> nodetool compaction on every node; that does it.
>
>
> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>
> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
> repair, or time (as in just wait)?
>
> I had a CF that was more or less storing session information.  After some
> time, we decided that one piece of this information was pointless to track
> (and was 90%+ of the columns, and in 99% of those cases was ALL columns for
> a row).   I wrote a process to remove all of those columns (which again in
> a vast majority of cases had the effect of removing the whole row).
>
> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>  After I did this mass delete, everything was the same size on disk (which
> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
> what to poke to cause compactions to clear the tombstones.  First I tried
> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
> the same.  Then I tried nodetool repair on that same node.  But again, disk
> usage is still the same.  The CF has no snapshots.
>
> So, am I misunderstanding something?  Is there another operation to try?
>  Do I have to "just wait"?  I've only done cleanup/re
>
>
>

-- 
Will Oberman
Civic Science, Inc.
6101 Penn Avenue, Fifth Floor
Pittsburgh, PA 15206
(M) 412-480-7835
(E) oberman@civicscience.com

Re: clearing tombstones?

Posted by Ruchir Jha <ru...@gmail.com>.

I tried to do this, however the doubling in disk space is not "temporary"
as you state in your note. What am I missing?


On Fri, Apr 11, 2014 at 10:44 AM, William Oberman
<ob...@civicscience.com>wrote:

> So, if I was impatient and just "wanted to make this happen now", I could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to impatience, I
> don't care *that* much as I could re-run my clean up tool against the now
> much smaller CF.
>
> (*) A long long time ago I seem to recall reading advice about "don't ever
> run nodetool compact", but I can't remember why.  Is there any bad long
> term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
> minor compactions that merge SSTables, etc...  The only flaw I can think of
> is it will take forever until the SSTable minor compactions build up enough
> to consider including the big SSTable in a compaction, making it likely
> I'll have to self manage compactions.
>
>
>
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com>wrote:
>
>> Correct, a tombstone will only be removed after gc_grace period has
>> elapsed. The default value is set to 10 days which allows a great deal of
>> time for consistency to be achieved prior to deletion. If you are
>> operationally confident that you can achieve consistency via anti-entropy
>> repairs within a shorter period you can always reduce that 10 day interval.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>> oberman@civicscience.com> wrote:
>>
>>> I'm seeing a lot of articles about a dependency between removing
>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>> and this CF has GCGraceSeconds of 10 days).
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com>wrote:
>>>
>>>> compaction should take care of it; for me it never worked so I run
>>>> nodetool compaction on every node; that does it.
>>>>
>>>>
>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>>>>
>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>> nodetool repair, or time (as in just wait)?
>>>>>
>>>>> I had a CF that was more or less storing session information.  After
>>>>> some time, we decided that one piece of this information was pointless to
>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>>> columns for a row).   I wrote a process to remove all of those columns
>>>>> (which again in a vast majority of cases had the effect of removing the
>>>>> whole row).
>>>>>
>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>>  After I did this mass delete, everything was the same size on disk (which
>>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>>> usage is still the same.  The CF has no snapshots.
>>>>>
>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>
>>>>> Cassandra 1.2.15 if it matters,
>>>>>
>>>>> Thanks!
>>>>>
>>>>> will
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: clearing tombstones?

Posted by William Oberman <ob...@civicscience.com>.

So, if I was impatient and just "wanted to make this happen now", I could:

1.) Change GCGraceSeconds of the CF to 0
2.) run nodetool compact (*)
3.) Change GCGraceSeconds of the CF back to 10 days

Since I have ~900M tombstones, even if I miss a few due to impatience, I
don't care *that* much as I could re-run my clean up tool against the now
much smaller CF.

(*) A long long time ago I seem to recall reading advice about "don't ever
run nodetool compact", but I can't remember why.  Is there any bad long
term consequence?  Short term there are several:
-a heavy operation
-temporary 2x disk space
-one big SSTable afterwards
But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
minor compactions that merge SSTables, etc...  The only flaw I can think of
is it will take forever until the SSTable minor compactions build up enough
to consider including the big SSTable in a compaction, making it likely
I'll have to self manage compactions.



On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <ma...@boxever.com> wrote:

> Correct, a tombstone will only be removed after gc_grace period has
> elapsed. The default value is set to 10 days which allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day interval.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
>> I'm seeing a lot of articles about a dependency between removing
>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>> and this CF has GCGraceSeconds of 10 days).
>>
>>
>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com>wrote:
>>
>>> compaction should take care of it; for me it never worked so I run
>>> nodetool compaction on every node; that does it.
>>>
>>>
>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>>>
>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>> nodetool repair, or time (as in just wait)?
>>>>
>>>> I had a CF that was more or less storing session information.  After
>>>> some time, we decided that one piece of this information was pointless to
>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>> columns for a row).   I wrote a process to remove all of those columns
>>>> (which again in a vast majority of cases had the effect of removing the
>>>> whole row).
>>>>
>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>  After I did this mass delete, everything was the same size on disk (which
>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>> usage is still the same.  The CF has no snapshots.
>>>>
>>>> So, am I misunderstanding something?  Is there another operation to
>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>
>>>> Cassandra 1.2.15 if it matters,
>>>>
>>>> Thanks!
>>>>
>>>> will
>>>>
>>>
>>>
>>
>>
>>
>

Re: clearing tombstones?

Posted by tommaso barbugli <tb...@gmail.com>.

In my experience even after the gc_grace period tombstones remains stored
on disk (at least using cassandra 2.0.5) ; only a full compaction clears
them. Perhaps that is because my application never reads tombstones?


2014-04-11 16:31 GMT+02:00 Mark Reddy <ma...@boxever.com>:

> Correct, a tombstone will only be removed after gc_grace period has
> elapsed. The default value is set to 10 days which allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day interval.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
>> I'm seeing a lot of articles about a dependency between removing
>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>> and this CF has GCGraceSeconds of 10 days).
>>
>>
>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com>wrote:
>>
>>> compaction should take care of it; for me it never worked so I run
>>> nodetool compaction on every node; that does it.
>>>
>>>
>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>>>
>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>> nodetool repair, or time (as in just wait)?
>>>>
>>>> I had a CF that was more or less storing session information.  After
>>>> some time, we decided that one piece of this information was pointless to
>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>> columns for a row).   I wrote a process to remove all of those columns
>>>> (which again in a vast majority of cases had the effect of removing the
>>>> whole row).
>>>>
>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>  After I did this mass delete, everything was the same size on disk (which
>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>> usage is still the same.  The CF has no snapshots.
>>>>
>>>> So, am I misunderstanding something?  Is there another operation to
>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>
>>>> Cassandra 1.2.15 if it matters,
>>>>
>>>> Thanks!
>>>>
>>>> will
>>>>
>>>
>>>
>>
>>
>>
>

Re: clearing tombstones?

Posted by Mark Reddy <ma...@boxever.com>.

Correct, a tombstone will only be removed after gc_grace period has
elapsed. The default value is set to 10 days which allows a great deal of
time for consistency to be achieved prior to deletion. If you are
operationally confident that you can achieve consistency via anti-entropy
repairs within a shorter period you can always reduce that 10 day interval.


Mark


On Fri, Apr 11, 2014 at 3:16 PM, William Oberman
<ob...@civicscience.com>wrote:

> I'm seeing a lot of articles about a dependency between removing
> tombstones and GCGraceSeconds, which might be my problem (I just checked,
> and this CF has GCGraceSeconds of 10 days).
>
>
> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com>wrote:
>
>> compaction should take care of it; for me it never worked so I run
>> nodetool compaction on every node; that does it.
>>
>>
>> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>>
>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>> nodetool repair, or time (as in just wait)?
>>>
>>> I had a CF that was more or less storing session information.  After
>>> some time, we decided that one piece of this information was pointless to
>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>> columns for a row).   I wrote a process to remove all of those columns
>>> (which again in a vast majority of cases had the effect of removing the
>>> whole row).
>>>
>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>  After I did this mass delete, everything was the same size on disk (which
>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>> usage is still the same.  The CF has no snapshots.
>>>
>>> So, am I misunderstanding something?  Is there another operation to try?
>>>  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
>>> I have to run one or the other over all nodes to clear tombstones?
>>>
>>> Cassandra 1.2.15 if it matters,
>>>
>>> Thanks!
>>>
>>> will
>>>
>>
>>
>
>
>

Re: clearing tombstones?

Posted by William Oberman <ob...@civicscience.com>.

I'm seeing a lot of articles about a dependency between removing tombstones
and GCGraceSeconds, which might be my problem (I just checked, and this CF
has GCGraceSeconds of 10 days).


On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tb...@gmail.com>wrote:

> compaction should take care of it; for me it never worked so I run
> nodetool compaction on every node; that does it.
>
>
> 2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:
>
> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
>> repair, or time (as in just wait)?
>>
>> I had a CF that was more or less storing session information.  After some
>> time, we decided that one piece of this information was pointless to track
>> (and was 90%+ of the columns, and in 99% of those cases was ALL columns for
>> a row).   I wrote a process to remove all of those columns (which again in
>> a vast majority of cases had the effect of removing the whole row).
>>
>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>  After I did this mass delete, everything was the same size on disk (which
>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>> what to poke to cause compactions to clear the tombstones.  First I tried
>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>> usage is still the same.  The CF has no snapshots.
>>
>> So, am I misunderstanding something?  Is there another operation to try?
>>  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
>> I have to run one or the other over all nodes to clear tombstones?
>>
>> Cassandra 1.2.15 if it matters,
>>
>> Thanks!
>>
>> will
>>
>
>

Re: clearing tombstones?

Posted by tommaso barbugli <tb...@gmail.com>.

compaction should take care of it; for me it never worked so I run nodetool
compaction on every node; that does it.


2014-04-11 16:05 GMT+02:00 William Oberman <ob...@civicscience.com>:

> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
> repair, or time (as in just wait)?
>
> I had a CF that was more or less storing session information.  After some
> time, we decided that one piece of this information was pointless to track
> (and was 90%+ of the columns, and in 99% of those cases was ALL columns for
> a row).   I wrote a process to remove all of those columns (which again in
> a vast majority of cases had the effect of removing the whole row).
>
> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>  After I did this mass delete, everything was the same size on disk (which
> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
> what to poke to cause compactions to clear the tombstones.  First I tried
> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
> the same.  Then I tried nodetool repair on that same node.  But again, disk
> usage is still the same.  The CF has no snapshots.
>
> So, am I misunderstanding something?  Is there another operation to try?
>  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
> I have to run one or the other over all nodes to clear tombstones?
>
> Cassandra 1.2.15 if it matters,
>
> Thanks!
>
> will
>