You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fredrik Stigbäck <fr...@sitevision.se> on 2012/05/01 11:19:49 UTC

Re: Question regarding major compaction.

Thank you Aaron.
That explanation cleared things up.

2012/4/30 aaron morton <aa...@thelastpickle.com>:
> Depends on your definition of significantly, there are a few things to
> consider.
>
> * Reading from SSTables for a request is a serial operation. Reading from 2
> SSTables will take twice as long as 1.
>
> * If the data in the One Big File™ has been overwritten, reading it is a
> waste of time. And it will continue to be read until it the row is compacted
> away.
>
> * You will need to get min_compaction_threshold (CF setting) SSTables that
> big before automatic compaction will pickup the big file.
>
> On the other side: Some people do report getting value from nightly major
> compactions. They also manage their cluster to reduce the impact of
> performing the compactions.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26/04/2012, at 9:37 PM, Fredrik wrote:
>
> Exactly, but why would reads be significantly slower over time when
> including just one more, although sometimes large, SSTable in the read?
>
> Ji Cheng skrev 2012-04-26 11:11:
>
> I'm also quite interested in this question. Here's my understanding on this
> problem.
>
> 1. If your workload is append-only, doing a major compaction shouldn't
> affect the read performance too much, because each row appears in one
> sstable anyway.
>
> 2. If your workload is mostly updating existing rows, then more and more
> columns will be obsoleted in that big sstable created by major compaction.
> And that super big sstable won't be compacted until you either have another
> 3 similar-sized sstables or start another major compaction. But I am not
> very sure whether this will be a major problem, because you only end up with
> reading one more sstable. Using size-tiered compaction against mostly-update
> workload itself may result in reading multiple sstables for a single row
> key.
>
> Please correct me if I am wrong.
>
> Cheng
>
>
> On Thu, Apr 26, 2012 at 3:50 PM, Fredrik <fr...@sitevision.se>
> wrote:
>>
>> In the tuning documentation regarding Cassandra, it's recomended not to
>> run major compactions.
>> I understand what a major compaction is all about but I'd like an in depth
>> explanation as to why reads "will continually degrade until the next major
>> compaction is manually invoked".
>>
>> From the doc:
>> "So while read performance will be good immediately following a major
>> compaction, it will continually degrade until the next major compaction is
>> manually invoked. For this reason, major compaction is NOT recommended by
>> DataStax."
>>
>> Regards
>> /Fredrik
>
>
>
>



-- 
Fredrik Larsson Stigbäck
SiteVision AB Vasagatan 10, 107 10 Örebro
019-17 30 30