You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rene Kochen <re...@schange.com> on 2012/09/14 20:32:55 UTC

minor compaction and delete expired column-tombstones

Hi all,

Does minor compaction delete expired column-tombstones when the row is
also present in another table which is not subject to the minor
compaction?

Example:

Say there are 5 SStables:

- Customers_0 (10 MB)
- Customers_1 (10 MB)
- Customers_2 (10 MB)
- Customers_3 (10 MB)
- Customers_4 (30 MB)

A minor compaction is triggered which will compact the similar sized
tables 0 to 3. In these tables is a customer record with key "C1" with
an expired column tombstone. Customer "C1" is also present in table 4.
Will the minor compaction delete the column (i.e. will the tombstone
be present in the newly created table)?

Thanks,

Rene

Re: minor compaction and delete expired column-tombstones

Posted by Sylvain Lebresne <sy...@datastax.com>.
> Is there any JIRA or enhancement to perhaps be able to detect when certain
> column tombstones can be deleted in minor compactions? The new introduction
> of SSTable min-max timestamps might help? or perhaps there are new ones
> coming up that I'm not aware of ....

https://issues.apache.org/jira/browse/CASSANDRA-4671

--
Sylvain

Re: minor compaction and delete expired column-tombstones

Posted by Josep Blanquer <bl...@rightscale.com>.
We've run exactly into the same problem recently. Some specific keys in a
couple CFs accumulate a fair amount of column churn over time.

Pre Cassandra 1.x we scheduled full compactions often to purge them.
However, when we moved to 1.x but we adopted the recommended practice of
avoiding full compactions. The problem took a while to manifest itself, but
over the course of several weeks (few months) of not doing full compactions
the load on those services slowly increased...and despite we have
everything monitored, it was not trivial to find out that it was the
accumulation of tombstones on 'some' keys, for 'some' CF in the cluster
that were really causing long latencies and CPU spikes (high CPU is a
typical signature when having a fair amount of tombstones in the SSTables).

Is there any JIRA or enhancement to perhaps be able to detect when certain
column tombstones can be deleted in minor compactions? The new introduction
of SSTable min-max timestamps might help? or perhaps there are new ones
coming up that I'm not aware of ....

I'm saying this because there is absolutely no way (that I know of) to find
out or monitor when Cassandra encounters many column tombstones when doing
searches. That alone could help detect these cases so one can change the
data model and/or realize that needs full compactions. For example a new
metric at the CF level that tracks % of tombstones read per row (ideally a
histogram based on row size), or perhaps spit something out in the logs (a
la mysql slowquery log) when a wide row is read and a certain % of
tombstone columns are encountered...this alone can be a huge help in at
least detecting the latent problem.

...what we had to do to fully debug and understand the issue was to build
some tools that scanned SSTables and provided some of those stats. In a
large cluster that is painful to do.

Anyway, just wanted to chime in the thread to provide our input in the
matter.

Cheers,

Josep M.

On Mon, Sep 17, 2012 at 2:01 AM, Rene Kochen
<re...@emea.schange.com>wrote:

> Oke, thanks!
>
> So a column tombstone will only be removed if all row fragments are
> present in the tables being compacted.
>
> I have a row called "Index" which contains columns like "page0",
> "page1", "page2", etc. Every several minutes, new columns are created
> and old ones deleted. The problem is that I now have an "Index" row in
> several SSTables, but the column tombstones are never deleted. And
> reading the "Index" row (and all its column tombstones) takes longer
> and longer.
>
> If I do a major compaction, all tombstones are deleted and reading the
> "index" row takes one millisecond again (and all the garbage-collect
> issues because of this).
>
> Is it not advised to use rows with many new column creates/deletes
> (because of how minor compactions work)?
>
> Thanks!
>
> Rene
>
> 2012/9/17 aaron morton <aa...@thelastpickle.com>:
> > Does minor compaction delete expired column-tombstones when the row is
> > also present in another table which is
> >
> > No.
> > Compaction is per Column Family.
> >
> > Tombstones will be expired by Minor Compaction if all fragments of the
> row
> > are contained in the SSTables being compacted.
> >
> > Cheers
> >
> > -----------------
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 15/09/2012, at 6:32 AM, Rene Kochen <re...@schange.com> wrote:
> >
> > Hi all,
> >
> > Does minor compaction delete expired column-tombstones when the row is
> > also present in another table which is not subject to the minor
> > compaction?
> >
> > Example:
> >
> > Say there are 5 SStables:
> >
> > - Customers_0 (10 MB)
> > - Customers_1 (10 MB)
> > - Customers_2 (10 MB)
> > - Customers_3 (10 MB)
> > - Customers_4 (30 MB)
> >
> > A minor compaction is triggered which will compact the similar sized
> > tables 0 to 3. In these tables is a customer record with key "C1" with
> > an expired column tombstone. Customer "C1" is also present in table 4.
> > Will the minor compaction delete the column (i.e. will the tombstone
> > be present in the newly created table)?
> >
> > Thanks,
> >
> > Rene
> >
> >
>

Re: minor compaction and delete expired column-tombstones

Posted by Rene Kochen <re...@emea.schange.com>.
Oke, thanks!

So a column tombstone will only be removed if all row fragments are
present in the tables being compacted.

I have a row called "Index" which contains columns like "page0",
"page1", "page2", etc. Every several minutes, new columns are created
and old ones deleted. The problem is that I now have an "Index" row in
several SSTables, but the column tombstones are never deleted. And
reading the "Index" row (and all its column tombstones) takes longer
and longer.

If I do a major compaction, all tombstones are deleted and reading the
"index" row takes one millisecond again (and all the garbage-collect
issues because of this).

Is it not advised to use rows with many new column creates/deletes
(because of how minor compactions work)?

Thanks!

Rene

2012/9/17 aaron morton <aa...@thelastpickle.com>:
> Does minor compaction delete expired column-tombstones when the row is
> also present in another table which is
>
> No.
> Compaction is per Column Family.
>
> Tombstones will be expired by Minor Compaction if all fragments of the row
> are contained in the SSTables being compacted.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15/09/2012, at 6:32 AM, Rene Kochen <re...@schange.com> wrote:
>
> Hi all,
>
> Does minor compaction delete expired column-tombstones when the row is
> also present in another table which is not subject to the minor
> compaction?
>
> Example:
>
> Say there are 5 SStables:
>
> - Customers_0 (10 MB)
> - Customers_1 (10 MB)
> - Customers_2 (10 MB)
> - Customers_3 (10 MB)
> - Customers_4 (30 MB)
>
> A minor compaction is triggered which will compact the similar sized
> tables 0 to 3. In these tables is a customer record with key "C1" with
> an expired column tombstone. Customer "C1" is also present in table 4.
> Will the minor compaction delete the column (i.e. will the tombstone
> be present in the newly created table)?
>
> Thanks,
>
> Rene
>
>

Re: minor compaction and delete expired column-tombstones

Posted by aaron morton <aa...@thelastpickle.com>.
> Does minor compaction delete expired column-tombstones when the row is
> also present in another table which is
No. 
Compaction is per Column Family. 

Tombstones will be expired by Minor Compaction if all fragments of the row are contained in the SSTables being compacted. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2012, at 6:32 AM, Rene Kochen <re...@schange.com> wrote:

> Hi all,
> 
> Does minor compaction delete expired column-tombstones when the row is
> also present in another table which is not subject to the minor
> compaction?
> 
> Example:
> 
> Say there are 5 SStables:
> 
> - Customers_0 (10 MB)
> - Customers_1 (10 MB)
> - Customers_2 (10 MB)
> - Customers_3 (10 MB)
> - Customers_4 (30 MB)
> 
> A minor compaction is triggered which will compact the similar sized
> tables 0 to 3. In these tables is a customer record with key "C1" with
> an expired column tombstone. Customer "C1" is also present in table 4.
> Will the minor compaction delete the column (i.e. will the tombstone
> be present in the newly created table)?
> 
> Thanks,
> 
> Rene