You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by A J <s5...@gmail.com> on 2011/02/25 18:22:49 UTC

2x storage

I read in some cassandra notes that each node should be allocated
twice the storage capacity you wish it to contain. I think the reason
was during compaction another copy of SSTables have to be made before
the original ones are discarded.

Can someone confirm if that is actually true ? During compaction,
don't just a few SSTables are involved. Why should it be twice the
full storage ? If I keep some buffer, it really means that I can use
40% or so space only.


Many thanks.

Re: 2x storage

Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 25, 2011 at 4:55 PM, Terje Marthinussen
<tm...@gmail.com> wrote:
> Cassandra never compacts more than one column family at the time?

Nope, compaction is single threaded currently.

https://issues.apache.org/jira/browse/CASSANDRA-2191
https://issues.apache.org/jira/browse/CASSANDRA-2191

=Rob

Re: 2x storage

Posted by Terje Marthinussen <tm...@gmail.com>.
Cassandra never compacts more than one column family at the time?

Regards,
Terje

On 26 Feb 2011, at 02:40, Robert Coli <rc...@digg.com> wrote:

> On Fri, Feb 25, 2011 at 9:22 AM, A J <s5...@gmail.com> wrote:
>> I read in some cassandra notes that each node should be allocated
>> twice the storage capacity you wish it to contain. I think the reason
>> was during compaction another copy of SSTables have to be made before
>> the original ones are discarded.
> 
> This rule of thumb only exactly applies when you have a single CF. It
> is better stated as "your node needs to have enough room to
> successfully compact your largest columnfamily."
> 
> =Rob

Re: 2x storage

Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 25, 2011 at 2:41 PM, A J <s5...@gmail.com> wrote:
> Can the minor compactions across nodes be staggered so that I can
> control how many nodes are compacting at any given point ?

Not without some crazy scheme where you control the compaction
thresholds dynamically via some external mechanism. You probably don't
actually want to do that? You generally want a system which can
tolerate minor compaction..

=Rob

Re: 2x storage

Posted by A J <s5...@gmail.com>.
Another related question:
Can the minor compactions across nodes be staggered so that I can
control how many nodes are compacting at any given point ?

On Fri, Feb 25, 2011 at 2:01 PM, A J <s5...@gmail.com> wrote:
> Thanks. What happens when my compaction fails for space reasons ?
> Is no compaction possible till I add more space ?
> I would assume writes are not impacted though the latency of reads
> would increase, right ?
>
> Also though writes are not seek-intensive, compactions are seek-intensive, no ?
>
> On Fri, Feb 25, 2011 at 1:44 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>> Ok, we are both correct here:
>>
>> Generally, a minor compaction takes less space than a major, but
>> occasionally it does not.
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax
>> Maintainer of the pycassa Cassandra Python client library
>>
>>
>

Re: 2x storage

Posted by A J <s5...@gmail.com>.
Thanks. What happens when my compaction fails for space reasons ?
Is no compaction possible till I add more space ?
I would assume writes are not impacted though the latency of reads
would increase, right ?

Also though writes are not seek-intensive, compactions are seek-intensive, no ?

On Fri, Feb 25, 2011 at 1:44 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> Ok, we are both correct here:
>
> Generally, a minor compaction takes less space than a major, but
> occasionally it does not.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
>
>

Re: 2x storage

Posted by Tyler Hobbs <ty...@datastax.com>.
Ok, we are both correct here:

Generally, a minor compaction takes less space than a major, but
occasionally it does not.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Re: 2x storage

Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 25, 2011 at 10:14 AM, A J <s5...@gmail.com> wrote:
> OK. Is it also driven by type of compaction ? Does a minor compaction
> require less working space than major compaction ?

Yes, unless that minor compaction happens to involve all SStables due
to compaction thresholds, at which time it is a major compaction.

=Rob

Re: 2x storage

Posted by Tyler Hobbs <ty...@datastax.com>.
On Fri, Feb 25, 2011 at 12:14 PM, A J <s5...@gmail.com> wrote:

> OK. Is it also driven by type of compaction ? Does a minor compaction
> require less working space than major compaction ?
>

No, every so often a minor compaction ends up compacting all SSTables, so
it's effectively the same as a major compaction.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Re: 2x storage

Posted by A J <s5...@gmail.com>.
OK. Is it also driven by type of compaction ? Does a minor compaction
require less working space than major compaction ?

On Fri, Feb 25, 2011 at 12:40 PM, Robert Coli <rc...@digg.com> wrote:
> On Fri, Feb 25, 2011 at 9:22 AM, A J <s5...@gmail.com> wrote:
>> I read in some cassandra notes that each node should be allocated
>> twice the storage capacity you wish it to contain. I think the reason
>> was during compaction another copy of SSTables have to be made before
>> the original ones are discarded.
>
> This rule of thumb only exactly applies when you have a single CF. It
> is better stated as "your node needs to have enough room to
> successfully compact your largest columnfamily."
>
> =Rob
>

Re: 2x storage

Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 25, 2011 at 9:22 AM, A J <s5...@gmail.com> wrote:
> I read in some cassandra notes that each node should be allocated
> twice the storage capacity you wish it to contain. I think the reason
> was during compaction another copy of SSTables have to be made before
> the original ones are discarded.

This rule of thumb only exactly applies when you have a single CF. It
is better stated as "your node needs to have enough room to
successfully compact your largest columnfamily."

=Rob