You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sean Bridges <se...@gmail.com> on 2010/05/26 18:50:17 UTC

using more than 50% of disk space

We're investigating Cassandra, and we are looking for a way to get Cassandra
use more than 50% of it's data disks.  Is this possible?

For major compactions, it looks like we can use more than 50% of the disk if
we use multiple similarly sized column families.  If we had 10 column
families of the same size, we could use 90% of the disk, since a major
compaction would only need as much free space as the largest column family
(in reality we would use less).  Is that right?

For bootstrapping new nodes, it looks like adding a new node will require
that an existing node does anti-compaction.  This anti-compaction could take
nearly 50% of the disk.  Is there a way around this?

Is there anything else that would prevent us from using more than 50% of the
data disk.

Thanks,

Sean

Re: using more than 50% of disk space

Posted by gabriele renzi <rf...@gmail.com>.
On Thu, May 27, 2010 at 9:23 PM, Sean Bridges <se...@gmail.com> wrote:
> But doesn't having multiple similarly sized column families mean in-node
> compaction does not require 50% of disk?  Looking at compaction manager,
> only 1 thread is doing a compaction, so we only need enough free disk space
> to compact the largest column family.

Yes AFAIK the compaction only happens in one Cf at a time.
Also the total amount may not reach twice the size (if for example
there is data that disappears in the compaction).

Yet, you still have to be careful about pther factors, for example, if
you have a snapshot of the data on the same machine (by default going
into $data/$cf/snapshot/123456/)

In 0.6 this is done via an hard link, which means it won't count as
busy space, but when the old data is deleted from $data/$cf the space
will not be freed and you will end up with ~twice the amount of data.

I am not completely confident that there are no other "edge" cases I
had not considered, so the "try to stay under 50% disk usage"
principle is ok for me :)

Re: using more than 50% of disk space

Posted by Sean Bridges <se...@gmail.com>.
But doesn't having multiple similarly sized column families mean in-node
compaction does not require 50% of disk?  Looking at compaction manager,
only 1 thread is doing a compaction, so we only need enough free disk space
to compact the largest column family.

Sean

On Thu, May 27, 2010 at 12:00 AM, gabriele renzi <rf...@gmail.com> wrote:

> On Wed, May 26, 2010 at 8:00 PM, Sean Bridges <se...@gmail.com>
> wrote:
> > So after CASSANDRA-579, anti compaction won't be done on the source node,
> > and we can use more than 50% of the disk space if we use multiple column
> > families?
>
> Sorry if I misunderstand, but #579 seems to only solve half of your
> question,
> I believe the first part of your email stays valid (as in: in-node
> compaction causes disk usage to double anyway in worst case).
>
> CASSANDRA-1041 may also be relevant (also includes mini-discussion on
> local data partitioning).
>

Re: using more than 50% of disk space

Posted by gabriele renzi <rf...@gmail.com>.
On Wed, May 26, 2010 at 8:00 PM, Sean Bridges <se...@gmail.com> wrote:
> So after CASSANDRA-579, anti compaction won't be done on the source node,
> and we can use more than 50% of the disk space if we use multiple column
> families?

Sorry if I misunderstand, but #579 seems to only solve half of your question,
I believe the first part of your email stays valid (as in: in-node
compaction causes disk usage to double anyway in worst case).

CASSANDRA-1041 may also be relevant (also includes mini-discussion on
local data partitioning).

Re: using more than 50% of disk space

Posted by Sean Bridges <se...@gmail.com>.
So after CASSANDRA-579, anti compaction won't be done on the source node,
and we can use more than 50% of the disk space if we use multiple column
families?

Thanks,

Sean

On Wed, May 26, 2010 at 10:01 AM, Stu Hood <st...@rackspace.com> wrote:

> See https://issues.apache.org/jira/browse/CASSANDRA-579 for some
> background here: I was just about to start working on this one, but it won't
> make it in until 0.7.
>
>
> -----Original Message-----
> From: "Sean Bridges" <se...@gmail.com>
> Sent: Wednesday, May 26, 2010 11:50am
> To: user@cassandra.apache.org
> Subject: using more than 50% of disk space
>
> We're investigating Cassandra, and we are looking for a way to get
> Cassandra
> use more than 50% of it's data disks.  Is this possible?
>
> For major compactions, it looks like we can use more than 50% of the disk
> if
> we use multiple similarly sized column families.  If we had 10 column
> families of the same size, we could use 90% of the disk, since a major
> compaction would only need as much free space as the largest column family
> (in reality we would use less).  Is that right?
>
> For bootstrapping new nodes, it looks like adding a new node will require
> that an existing node does anti-compaction.  This anti-compaction could
> take
> nearly 50% of the disk.  Is there a way around this?
>
> Is there anything else that would prevent us from using more than 50% of
> the
> data disk.
>
> Thanks,
>
> Sean
>
>
>

RE: using more than 50% of disk space

Posted by Stu Hood <st...@rackspace.com>.
See https://issues.apache.org/jira/browse/CASSANDRA-579 for some background here: I was just about to start working on this one, but it won't make it in until 0.7.


-----Original Message-----
From: "Sean Bridges" <se...@gmail.com>
Sent: Wednesday, May 26, 2010 11:50am
To: user@cassandra.apache.org
Subject: using more than 50% of disk space

We're investigating Cassandra, and we are looking for a way to get Cassandra
use more than 50% of it's data disks.  Is this possible?

For major compactions, it looks like we can use more than 50% of the disk if
we use multiple similarly sized column families.  If we had 10 column
families of the same size, we could use 90% of the disk, since a major
compaction would only need as much free space as the largest column family
(in reality we would use less).  Is that right?

For bootstrapping new nodes, it looks like adding a new node will require
that an existing node does anti-compaction.  This anti-compaction could take
nearly 50% of the disk.  Is there a way around this?

Is there anything else that would prevent us from using more than 50% of the
data disk.

Thanks,

Sean