You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Anuj Wadehra <an...@yahoo.co.in> on 2015/05/31 20:37:13 UTC

Minor Compactions Not Triggered

Hi,
I am using Cassandra 2.0.3 and we use STCS for all CFs. We have recently faced an issue where sstable count of certain CFs went into THOUSANDS. We realized that every week, when "repair -pr" ran on each node, it created 50+ tiny sstables of around 1kb. These tables were never compacted during minor compactions and thus sstable count kept on increasing with each repair.
Our Root Cause Analysis is as under:We were writing to 5 CFs simultaneously and one CF had 3 Secondary Indexes. Our memtable_flush_writers was set to default 1, which created bottleneck for writes from all CFs and lead to DROPPED mutations. Thus, many vnodes were damaged at the time of every weekly repair. While repairing inconsistent data, Repair created 50+ tiny sstables for each repair -pr ran on nodes.

Why tiny sstables created during repair didn't compact? As 2.0.3 has a known issue ( https://issues.apache.org/jira/browse/CASSANDRA-6483 ), where even if you dont specify cold_reads_to_omit or set it zero, still cold sstables are not compacted. We think that this prevented these tiny sstables from participating in minor compactions.
Moreover, we had long GC pauses leading to nodes being marked down.

Our Fix:1. Increases memtable_flush_writers to 3 , so that mutations are not dropped and data is consistent.As most of the data is consistent, usual repair would not create tiny sstables to repair vnode ranges.2. Executed major compaction on all sstables where sstable count was in thousands to control the situation.3. We also did some compaction throttling, reduced total_memtable_space_in_mb and did JVM tuning to prevent long GC pauses.
Queries:1.We have observed that after increasing memtable_flush_writers, doing compaction throttling and tuning JVM , tiny sstables which were not getting compacted during Repairs, started participating in compactions and sstable count for few CFs reduced considerably after repair(even though all tiny sstables were not compacted). As per our RCA, we understood that tiny sstables created with every repair are not getting compacted due to COLDNESS. What lead to compaction of these tiny tables now? How our changes affected minor compactions? Is there any gap in Root Cause Analysis?
2. We thought that CQL compaction subproperty of tombstone_threshold will help us after major compactions. This property will ensure that even if we have one huge sstable, once tombstone threshold of 20% has reached, sstables will be compacted and tombstones will be dropped after gc_grace_periods (even if there no similar sized sstables as need by STCS). But in our initial testing, single huge sstable is not getting compacted even if we drop all rows in it and gc_grace_period has passed. Why tombstone_threshold is behaving like that?

ThanksAnuj Wadehra

1. High sstable count2. damaged vnodes and coldness issue3. Our solution: We tuned our db to stop dropped mutation and thus damaged vnodes creating amll tables during repair and did major compaction

how to deal with side effects of major compaction?1. tombstone_threshhold compaction not triggered. STCS anyways never ensure reads through one sstabble
2. If coldness was the issue then Why db sync lead to compaction of cold sstables now ? Does date read during repair counted as read and ths data is made hot and compacted?

Re: Minor Compactions Not Triggered

Posted by Anuj Wadehra <an...@yahoo.co.in>.

Hi Robert,
I haven't yet asked that question in IRC channel. We are generating regular sstables in CF so that's not a concern. Important part is that we need to understand tombstone_threshold property because we think that it's very important once we have done major compaction. How the huge table will get rid of tombstones when there are no similar sized tables for long time in STCS? 

I think, I understand the behavior correctly now but the code issue in Cassandra 2.0.3 is preventing this property to work properly in our testing.
Below check is missing is 2.0.3 and available in 2.0.4 (https://issues.apache.org/jira/browse/CASSANDRA-6483). Thus, even if cold_reads_to_omit is 0, cold sstables are filtered out and never get compacted. tombstone_threshold property is applied after filtering so it's not working on single huge cold sstable generated after major compaction. I will again test this property by making some frequent reads and check how it works.

 static List<SSTableReader> filterColdSSTables(List<SSTableReader> sstables, double coldReadsToOmit)
    {
        if (coldReadsToOmit == 0.0)
            return sstables;

ThanksAnuj Wadehra


 


     On Tuesday, 2 June 2015 11:37 PM, Robert Coli <rc...@eventbrite.com> wrote:
   

 On Mon, Jun 1, 2015 at 11:25 AM, Anuj Wadehra <an...@yahoo.co.in> wrote:


| As per the algorithm shared in the CASSANDRA 6654, I understand that tombstone_threshold property only comes into picture if you have expirying columns and it wont have any effect if you have manually deleted rows in cf. Is my understanding correct?
According to you What would be the expected behavior of following steps??
I inserted x rowsI deleted x rowsRan major compaction to make sure that one big sstable contains all tombstonesWaited for gc grace period to see whether that big sstable formed after major compaction is compacted on its own without finding any other sstable |



That's a good question, and I don't actually know the answer. If you aren't generating new SSTables in the CF via writes and flushes, I would doubt any background process notices it's expired and re-compacts it. 
Have you considered asking this question in the #cassandra IRC channel on freenode?
=Rob

Re: Minor Compactions Not Triggered

Posted by Robert Coli <rc...@eventbrite.com>.

On Mon, Jun 1, 2015 at 11:25 AM, Anuj Wadehra <an...@yahoo.co.in>
wrote:

> As per the algorithm shared in the CASSANDRA 6654, I understand that
> tombstone_threshold property only comes into picture if you have expirying
> columns and it wont have any effect if you have manually deleted rows in
> cf. Is my understanding correct?
>
> According to you What would be the expected behavior of following steps??
>
> I inserted x rows
> I deleted x rows
> Ran major compaction to make sure that one big sstable contains all
> tombstones
> Waited for gc grace period to see whether that big sstable formed after
> major compaction is compacted on its own without finding any other sstable
>

That's a good question, and I don't actually know the answer. If you aren't
generating new SSTables in the CF via writes and flushes, I would doubt any
background process notices it's expired and re-compacts it.

Have you considered asking this question in the #cassandra IRC channel on
freenode?

=Rob

Re: Minor Compactions Not Triggered

Posted by Anuj Wadehra <an...@yahoo.co.in>.

Thanks Robert !!!


As per the algorithm shared in the CASSANDRA 6654, I understand that tombstone_threshold property only comes into picture if you have expirying columns and it wont have any effect if you have manually deleted rows in cf. Is my understanding correct?


According to you What would be the expected behavior of following steps??


I inserted x rows

I deleted x rows

Ran major compaction to make sure that one big sstable contains all tombstones

Waited for gc grace period to see whether that big sstable formed after major compaction is compacted on its own without finding any other sstable


Thanks

Anuj



Sent from Yahoo Mail on Android

From:"Robert Coli" <rc...@eventbrite.com>
Date:Mon, 1 Jun, 2015 at 10:56 pm
Subject:Re: Minor Compactions Not Triggered

On Sun, May 31, 2015 at 11:37 AM, Anuj Wadehra <an...@yahoo.co.in> wrote:

2. We thought that CQL compaction subproperty of tombstone_threshold will help us after major compactions. This property will ensure that even if we have one huge sstable, once tombstone threshold of 20% has reached, sstables will be compacted and tombstones will be dropped after gc_grace_periods (even if there no similar sized sstables as need by STCS). But in our initial testing, single huge sstable is not getting compacted even if we drop all rows in it and gc_grace_period has passed.  Why tombstone_threshold is behaving like that?


https://issues.apache.org/jira/browse/CASSANDRA-6654 ?


=Rob

Re: Minor Compactions Not Triggered

Posted by Robert Coli <rc...@eventbrite.com>.

On Sun, May 31, 2015 at 11:37 AM, Anuj Wadehra <an...@yahoo.co.in>
wrote:

> 2. We thought that CQL compaction subproperty of *tombstone_threshold*
> will help us after major compactions. This property will ensure that even
> if we have one huge sstable, once tombstone threshold of 20% has reached,
> sstables will be compacted and tombstones will be dropped after
> gc_grace_periods (even if there no similar sized sstables as need by STCS).
> But in our initial testing, single huge sstable is not getting compacted
> even if we drop all rows in it and gc_grace_period has passed.  *Why
> tombstone_threshold is behaving like that?*
>

https://issues.apache.org/jira/browse/CASSANDRA-6654 ?

=Rob