You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by A J <s5...@gmail.com> on 2011/02/25 18:22:49 UTC
2x storage
I read in some cassandra notes that each node should be allocated
twice the storage capacity you wish it to contain. I think the reason
was during compaction another copy of SSTables have to be made before
the original ones are discarded.
Can someone confirm if that is actually true ? During compaction,
don't just a few SSTables are involved. Why should it be twice the
full storage ? If I keep some buffer, it really means that I can use
40% or so space only.
Many thanks.
Re: 2x storage
Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 25, 2011 at 4:55 PM, Terje Marthinussen
<tm...@gmail.com> wrote:
> Cassandra never compacts more than one column family at the time?
Nope, compaction is single threaded currently.
https://issues.apache.org/jira/browse/CASSANDRA-2191
https://issues.apache.org/jira/browse/CASSANDRA-2191
=Rob
Re: 2x storage
Posted by Terje Marthinussen <tm...@gmail.com>.
Cassandra never compacts more than one column family at the time?
Regards,
Terje
On 26 Feb 2011, at 02:40, Robert Coli <rc...@digg.com> wrote:
> On Fri, Feb 25, 2011 at 9:22 AM, A J <s5...@gmail.com> wrote:
>> I read in some cassandra notes that each node should be allocated
>> twice the storage capacity you wish it to contain. I think the reason
>> was during compaction another copy of SSTables have to be made before
>> the original ones are discarded.
>
> This rule of thumb only exactly applies when you have a single CF. It
> is better stated as "your node needs to have enough room to
> successfully compact your largest columnfamily."
>
> =Rob
Re: 2x storage
Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 25, 2011 at 2:41 PM, A J <s5...@gmail.com> wrote:
> Can the minor compactions across nodes be staggered so that I can
> control how many nodes are compacting at any given point ?
Not without some crazy scheme where you control the compaction
thresholds dynamically via some external mechanism. You probably don't
actually want to do that? You generally want a system which can
tolerate minor compaction..
=Rob
Re: 2x storage
Posted by A J <s5...@gmail.com>.
Another related question:
Can the minor compactions across nodes be staggered so that I can
control how many nodes are compacting at any given point ?
On Fri, Feb 25, 2011 at 2:01 PM, A J <s5...@gmail.com> wrote:
> Thanks. What happens when my compaction fails for space reasons ?
> Is no compaction possible till I add more space ?
> I would assume writes are not impacted though the latency of reads
> would increase, right ?
>
> Also though writes are not seek-intensive, compactions are seek-intensive, no ?
>
> On Fri, Feb 25, 2011 at 1:44 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>> Ok, we are both correct here:
>>
>> Generally, a minor compaction takes less space than a major, but
>> occasionally it does not.
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax
>> Maintainer of the pycassa Cassandra Python client library
>>
>>
>
Re: 2x storage
Posted by A J <s5...@gmail.com>.
Thanks. What happens when my compaction fails for space reasons ?
Is no compaction possible till I add more space ?
I would assume writes are not impacted though the latency of reads
would increase, right ?
Also though writes are not seek-intensive, compactions are seek-intensive, no ?
On Fri, Feb 25, 2011 at 1:44 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> Ok, we are both correct here:
>
> Generally, a minor compaction takes less space than a major, but
> occasionally it does not.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
>
>
Re: 2x storage
Posted by Tyler Hobbs <ty...@datastax.com>.
Ok, we are both correct here:
Generally, a minor compaction takes less space than a major, but
occasionally it does not.
--
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library
Re: 2x storage
Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 25, 2011 at 10:14 AM, A J <s5...@gmail.com> wrote:
> OK. Is it also driven by type of compaction ? Does a minor compaction
> require less working space than major compaction ?
Yes, unless that minor compaction happens to involve all SStables due
to compaction thresholds, at which time it is a major compaction.
=Rob
Re: 2x storage
Posted by Tyler Hobbs <ty...@datastax.com>.
On Fri, Feb 25, 2011 at 12:14 PM, A J <s5...@gmail.com> wrote:
> OK. Is it also driven by type of compaction ? Does a minor compaction
> require less working space than major compaction ?
>
No, every so often a minor compaction ends up compacting all SSTables, so
it's effectively the same as a major compaction.
--
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library
Re: 2x storage
Posted by A J <s5...@gmail.com>.
OK. Is it also driven by type of compaction ? Does a minor compaction
require less working space than major compaction ?
On Fri, Feb 25, 2011 at 12:40 PM, Robert Coli <rc...@digg.com> wrote:
> On Fri, Feb 25, 2011 at 9:22 AM, A J <s5...@gmail.com> wrote:
>> I read in some cassandra notes that each node should be allocated
>> twice the storage capacity you wish it to contain. I think the reason
>> was during compaction another copy of SSTables have to be made before
>> the original ones are discarded.
>
> This rule of thumb only exactly applies when you have a single CF. It
> is better stated as "your node needs to have enough room to
> successfully compact your largest columnfamily."
>
> =Rob
>
Re: 2x storage
Posted by Robert Coli <rc...@digg.com>.
On Fri, Feb 25, 2011 at 9:22 AM, A J <s5...@gmail.com> wrote:
> I read in some cassandra notes that each node should be allocated
> twice the storage capacity you wish it to contain. I think the reason
> was during compaction another copy of SSTables have to be made before
> the original ones are discarded.
This rule of thumb only exactly applies when you have a single CF. It
is better stated as "your node needs to have enough room to
successfully compact your largest columnfamily."
=Rob