You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by nicolas lattuada <ni...@hotmail.fr> on 2011/01/05 15:35:59 UTC

The size of the data, I must be doing smth wrong....

Hi 

i have some data size issues:

i am storing super columns with the following content:

{a=>1, b=>2, c=>3.......n=>14}

i am storing it 300 000 times and i have a data size on the disk about 283Mo

And in other side i have a mysql table which stores a bunch of data the schema follows:
6 varchars +100
5 ints +6

I put about 1 300 000 records on it and end up with 150Mo of data and 57Mo of index.

Then i think i am certainly doing something wrong...

The other thing is when i run flush and then compact the size of my data increases, then i imagine something is copied up on compaction
So is there a way to remove the unused data? (cleanup doesn t seem to do the job).

Any help to reduce the size of the data would be greatly apreciated!
Greetings

Re: The size of the data, I must be doing smth wrong....

Posted by Edward Capriolo <ed...@gmail.com>.

On Wed, Jan 5, 2011 at 9:52 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> It's normal for Cassandra to use more disk space than MySQL.  It's
> part of what we trade for not having to rewrite every row when you add
> a new column.
>
> "SSTables that are obsoleted by a compaction are deleted
> asynchronously when the JVM performs a GC."
> http://wiki.apache.org/cassandra/MemtableSSTable
>
> On Wed, Jan 5, 2011 at 8:35 AM, nicolas lattuada
> <ni...@hotmail.fr> wrote:
>> Hi
>>
>> i have some data size issues:
>>
>> i am storing super columns with the following content:
>>
>> {a=>1, b=>2, c=>3.......n=>14}
>>
>> i am storing it 300 000 times and i have a data size on the disk about 283Mo
>>
>> And in other side i have a mysql table which stores a bunch of data the
>> schema follows:
>> 6 varchars +100
>> 5 ints +6
>>
>> I put about 1 300 000 records on it and end up with 150Mo of data and 57Mo
>> of index.
>>
>> Then i think i am certainly doing something wrong...
>>
>> The other thing is when i run flush and then compact the size of my data
>> increases, then i imagine something is copied up on compaction
>> So is there a way to remove the unused data? (cleanup doesn t seem to do the
>> job).
>>
>> Any help to reduce the size of the data would be greatly apreciated!
>> Greetings
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Unlike datastores that are delimited or have fixed column sizes
Cassandra does not. Each row is a Sorted Map of columns. A Column is a
tupple of {columnname,columnvalue,time}. Also the data is not stored
as tersely as it is inside mysql.

Re: The size of the data, I must be doing smth wrong....

Posted by Jonathan Ellis <jb...@gmail.com>.

It's normal for Cassandra to use more disk space than MySQL.  It's
part of what we trade for not having to rewrite every row when you add
a new column.

"SSTables that are obsoleted by a compaction are deleted
asynchronously when the JVM performs a GC."
http://wiki.apache.org/cassandra/MemtableSSTable

On Wed, Jan 5, 2011 at 8:35 AM, nicolas lattuada
<ni...@hotmail.fr> wrote:
> Hi
>
> i have some data size issues:
>
> i am storing super columns with the following content:
>
> {a=>1, b=>2, c=>3.......n=>14}
>
> i am storing it 300 000 times and i have a data size on the disk about 283Mo
>
> And in other side i have a mysql table which stores a bunch of data the
> schema follows:
> 6 varchars +100
> 5 ints +6
>
> I put about 1 300 000 records on it and end up with 150Mo of data and 57Mo
> of index.
>
> Then i think i am certainly doing something wrong...
>
> The other thing is when i run flush and then compact the size of my data
> increases, then i imagine something is copied up on compaction
> So is there a way to remove the unused data? (cleanup doesn t seem to do the
> job).
>
> Any help to reduce the size of the data would be greatly apreciated!
> Greetings
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com