You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Yiming Sun <yi...@gmail.com> on 2012/03/30 15:01:13 UTC

a question on cassandra data file size

Hi,

I have a question on the size of cassandra data files.  After we upgraded
from cassandra 0.8 to 1.0, and changed our schema to use regular columns
instead of supercolumns, the aggregated size of cassandra data files
reduced by more than half.  The source data set is the same, and we didn't
set any compression options in the new schema.

The reduction of data file is good, but we still would like to know a
little more about the reason behind this reduction.  Could someone
enlighten me, please?  Thanks.

-- Y.

Re: a question on cassandra data file size

Posted by Yiming Sun <yi...@gmail.com>.
Hi Ed,

the "comp actions" stand for compaction or compression?  Also, the size we
obtained from the supercolumn schema was also taken many days after the
data ingest, so it had to be after compact as well, no?  In neither case we
issued any nodetool compact commands.

you are right that we probably wouldn't have achieved 50% reduction right
off the bat -- I overlooked one detail... when we moved to the regular
column schema, we also removed redundant columns that had to be present in
the supercolumn schema, although the column values are small (ints and
longs), but they could aggregate to quite a bit of storage usage.

so aside from that, it sounds like the reduction in data size is attributed
more to the fact we moved from supercolumns to regular columns, than to the
moving from 0.8 to 1.0.   Thanks!

-- Y.



On Fri, Mar 30, 2012 at 10:18 AM, Edward Capriolo <ed...@gmail.com>wrote:

> Standard columns save size over super columns. Not 50% but depending
> on the size of the data (3 byte values) the overhead could be
> significant. I have noticed that post sstable rebuild, 1.0 kicked off
> some comp actions behind the scenes shrinking some files
> significantly.
>
> On Fri, Mar 30, 2012 at 9:01 AM, Yiming Sun <yi...@gmail.com> wrote:
> > Hi,
> >
> > I have a question on the size of cassandra data files.  After we upgraded
> > from cassandra 0.8 to 1.0, and changed our schema to use regular columns
> > instead of supercolumns, the aggregated size of cassandra data files
> reduced
> > by more than half.  The source data set is the same, and we didn't set
> any
> > compression options in the new schema.
> >
> > The reduction of data file is good, but we still would like to know a
> little
> > more about the reason behind this reduction.  Could someone enlighten me,
> > please?  Thanks.
> >
> > -- Y.
>

Re: a question on cassandra data file size

Posted by Edward Capriolo <ed...@gmail.com>.
Standard columns save size over super columns. Not 50% but depending
on the size of the data (3 byte values) the overhead could be
significant. I have noticed that post sstable rebuild, 1.0 kicked off
some comp actions behind the scenes shrinking some files
significantly.

On Fri, Mar 30, 2012 at 9:01 AM, Yiming Sun <yi...@gmail.com> wrote:
> Hi,
>
> I have a question on the size of cassandra data files.  After we upgraded
> from cassandra 0.8 to 1.0, and changed our schema to use regular columns
> instead of supercolumns, the aggregated size of cassandra data files reduced
> by more than half.  The source data set is the same, and we didn't set any
> compression options in the new schema.
>
> The reduction of data file is good, but we still would like to know a little
> more about the reason behind this reduction.  Could someone enlighten me,
> please?  Thanks.
>
> -- Y.