You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rene Kochen <Re...@emea.schange.com> on 2011/11/04 15:28:27 UTC

Question about minor compaction

Assume the following default settings: min_compaction_threshold = 4, max_compaction_threshold = 32.

When I start a bulk insert in Cassandra, I see minor compactions work: all similar sized files are compacted when there are four of them. However, when files gets larger, Cassandra waits with minor compactions. For example: four files are compacted to a 1GB file. There were already 3 other 1GB files. However, Cassandra does not immediately compact the four 1GB files. What makes Cassandra decide to wait with this minor compaction?

Thank!

Re: Question about minor compaction

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Fri, Nov 4, 2011 at 6:01 PM, Rene Kochen
<Re...@emea.schange.com> wrote:
> Thanks for this very clear explanation.
>
> Could it be that Cassandra does not begin a minor compaction if there is memory pressure?

No (and in 0.7 Cassandra really does nothing specific under memory
pressure except being
constantly interrupted by the GC. But even in 1.0, compaction is not
one of the thing affected
by the memory pressure valves in place). It can not being a compaction
because it thinks that
it won't have enough disk space to do the compaction, but you'll have
a message in the log
if that happens.

--
Sylvain

>
> -----Original Message-----
> From: Sylvain Lebresne [mailto:sylvain@datastax.com]
> Sent: vrijdag 4 november 2011 17:48
> To: user@cassandra.apache.org
> Subject: Re: Question about minor compaction
>
> On Fri, Nov 4, 2011 at 5:22 PM, Rene Kochen
> <Re...@emea.schange.com> wrote:
>> Thanks for your quick response.
>>
>> I indeed see that similar sized files are compacted. However, for four similar 1GB files, this is not what I see.
>>
>> The documentation states:
>>
>> "These parameters set thresholds for the number of similar-sized SSTables that can accumulate before a minor compaction is triggered. With the default values, a minor compaction may begin any time after four SSTables are created on disk for a column family, and must begin before 32 SSTables accumulate."
>>
>> So a more general question:
>>
>> In which situation does Cassandra not start a minor compaction immediately (when there are four similar sized files), but waits (up to 32)?
>
> Cassandra looks if there is minor compaction that can be started after
> each flush and after each compaction, so basically fairly regularly.
> So it should usually compact files as soon as it can. That being said
> compaction in 0.7.9 is mono-threaded so first it has to wait to other
> running compaction before starting. Then it needs 4 files in the same
> bucket (i.e have similar size), but it is possible that the sizes are
> such that one of the sstable is just a little bit too small or too big
> to be in the same bucket than the other three (in which case you'd
> have to wait for some other sstable to come fill that bucket).
>
> --
> Sylvain
>
>>
>> Thanks!
>>
>> -----Original Message-----
>> From: Radim Kolar [mailto:hsn@sendmail.cz]
>> Sent: vrijdag 4 november 2011 16:48
>> To: user@cassandra.apache.org
>> Subject: Re: Question about minor compaction
>>
>> Dne 4.11.2011 16:16, Rene Kochen napsal(a):
>>> I'm using Cassandra 0.7.9.
>>>
>>> Ok, so in this version, Cassandra waits with compaction. But when (in my original example) are the four 1GB files compacted?
>> There are compacted when next file of similar size to 1 GB is created
>>
>

RE: Question about minor compaction

Posted by Rene Kochen <Re...@emea.schange.com>.
Thanks for this very clear explanation.

Could it be that Cassandra does not begin a minor compaction if there is memory pressure?

-----Original Message-----
From: Sylvain Lebresne [mailto:sylvain@datastax.com] 
Sent: vrijdag 4 november 2011 17:48
To: user@cassandra.apache.org
Subject: Re: Question about minor compaction

On Fri, Nov 4, 2011 at 5:22 PM, Rene Kochen
<Re...@emea.schange.com> wrote:
> Thanks for your quick response.
>
> I indeed see that similar sized files are compacted. However, for four similar 1GB files, this is not what I see.
>
> The documentation states:
>
> "These parameters set thresholds for the number of similar-sized SSTables that can accumulate before a minor compaction is triggered. With the default values, a minor compaction may begin any time after four SSTables are created on disk for a column family, and must begin before 32 SSTables accumulate."
>
> So a more general question:
>
> In which situation does Cassandra not start a minor compaction immediately (when there are four similar sized files), but waits (up to 32)?

Cassandra looks if there is minor compaction that can be started after
each flush and after each compaction, so basically fairly regularly.
So it should usually compact files as soon as it can. That being said
compaction in 0.7.9 is mono-threaded so first it has to wait to other
running compaction before starting. Then it needs 4 files in the same
bucket (i.e have similar size), but it is possible that the sizes are
such that one of the sstable is just a little bit too small or too big
to be in the same bucket than the other three (in which case you'd
have to wait for some other sstable to come fill that bucket).

--
Sylvain

>
> Thanks!
>
> -----Original Message-----
> From: Radim Kolar [mailto:hsn@sendmail.cz]
> Sent: vrijdag 4 november 2011 16:48
> To: user@cassandra.apache.org
> Subject: Re: Question about minor compaction
>
> Dne 4.11.2011 16:16, Rene Kochen napsal(a):
>> I'm using Cassandra 0.7.9.
>>
>> Ok, so in this version, Cassandra waits with compaction. But when (in my original example) are the four 1GB files compacted?
> There are compacted when next file of similar size to 1 GB is created
>

Re: Question about minor compaction

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Fri, Nov 4, 2011 at 5:22 PM, Rene Kochen
<Re...@emea.schange.com> wrote:
> Thanks for your quick response.
>
> I indeed see that similar sized files are compacted. However, for four similar 1GB files, this is not what I see.
>
> The documentation states:
>
> "These parameters set thresholds for the number of similar-sized SSTables that can accumulate before a minor compaction is triggered. With the default values, a minor compaction may begin any time after four SSTables are created on disk for a column family, and must begin before 32 SSTables accumulate."
>
> So a more general question:
>
> In which situation does Cassandra not start a minor compaction immediately (when there are four similar sized files), but waits (up to 32)?

Cassandra looks if there is minor compaction that can be started after
each flush and after each compaction, so basically fairly regularly.
So it should usually compact files as soon as it can. That being said
compaction in 0.7.9 is mono-threaded so first it has to wait to other
running compaction before starting. Then it needs 4 files in the same
bucket (i.e have similar size), but it is possible that the sizes are
such that one of the sstable is just a little bit too small or too big
to be in the same bucket than the other three (in which case you'd
have to wait for some other sstable to come fill that bucket).

--
Sylvain

>
> Thanks!
>
> -----Original Message-----
> From: Radim Kolar [mailto:hsn@sendmail.cz]
> Sent: vrijdag 4 november 2011 16:48
> To: user@cassandra.apache.org
> Subject: Re: Question about minor compaction
>
> Dne 4.11.2011 16:16, Rene Kochen napsal(a):
>> I'm using Cassandra 0.7.9.
>>
>> Ok, so in this version, Cassandra waits with compaction. But when (in my original example) are the four 1GB files compacted?
> There are compacted when next file of similar size to 1 GB is created
>

RE: Question about minor compaction

Posted by Rene Kochen <Re...@emea.schange.com>.
Thanks for your quick response.

I indeed see that similar sized files are compacted. However, for four similar 1GB files, this is not what I see.

The documentation states:

"These parameters set thresholds for the number of similar-sized SSTables that can accumulate before a minor compaction is triggered. With the default values, a minor compaction may begin any time after four SSTables are created on disk for a column family, and must begin before 32 SSTables accumulate."

So a more general question:

In which situation does Cassandra not start a minor compaction immediately (when there are four similar sized files), but waits (up to 32)?

Thanks!

-----Original Message-----
From: Radim Kolar [mailto:hsn@sendmail.cz] 
Sent: vrijdag 4 november 2011 16:48
To: user@cassandra.apache.org
Subject: Re: Question about minor compaction

Dne 4.11.2011 16:16, Rene Kochen napsal(a):
> I'm using Cassandra 0.7.9.
>
> Ok, so in this version, Cassandra waits with compaction. But when (in my original example) are the four 1GB files compacted?
There are compacted when next file of similar size to 1 GB is created

Re: Question about minor compaction

Posted by Radim Kolar <hs...@sendmail.cz>.
Dne 4.11.2011 16:16, Rene Kochen napsal(a):
> I'm using Cassandra 0.7.9.
>
> Ok, so in this version, Cassandra waits with compaction. But when (in my original example) are the four 1GB files compacted?
There are compacted when next file of similar size to 1 GB is created

RE: Question about minor compaction

Posted by Rene Kochen <Re...@emea.schange.com>.
I'm using Cassandra 0.7.9.

Ok, so in this version, Cassandra waits with compaction. But when (in my original example) are the four 1GB files compacted?

Thanks!

-----Original Message-----
From: Radim Kolar [mailto:hsn@sendmail.cz] 
Sent: vrijdag 4 november 2011 15:55
To: user@cassandra.apache.org
Subject: Re: Question about minor compaction

What makes Cassandra decide to wait with this minor compaction?

What version do you using? There were some patch for 1.x branch which 
will do it as you expect. Cassandra 0.8 waited with compactions.

Re: Question about minor compaction

Posted by Radim Kolar <hs...@sendmail.cz>.
What makes Cassandra decide to wait with this minor compaction?

What version do you using? There were some patch for 1.x branch which 
will do it as you expect. Cassandra 0.8 waited with compactions.