You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by sharanabasava raddi <sh...@gmail.com> on 2010/05/25 07:46:08 UTC

Why Cassandra is "space inefficient" compared to MySQL?

Hi all,
Am running "Cassandra" on Windows XP (single node) machine.
I have made insertion of about "10 million" records into "Cassandra" , and
it took around 90 minutes to insert and 8GB of space.
For the same number of records MySQL will take "3 GB" space.

Could you please tell me why?
And please Give me the complete documentation for running Thrift API with
Java to talk to Cassandra in "Red Hat Linux".





Thanks,
Sharan

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by Jeff Zhang <zj...@gmail.com>.
I think maybe one reason is that Cassandra will also log the operation
into log files, and the log contains the records.



2010/5/25 casablinca126.com <ca...@126.com>:
> hi Sharan,
> what's the replication factor are you using ?
>
> regards,
> Cao Jiguang
>
>
> 2010-05-25
> ________________________________
> casablinca126.com
> ________________________________
> 发件人: sharanabasava raddi
> 发送时间: 2010-05-25  13:46:38
> 收件人: user@cassandra.apache.org
> 抄送:
> 主题: Why Cassandra is "space inefficient" compared to MySQL?
> Hi all,
> Am running "Cassandra" on Windows XP (single node) machine.
> I have made insertion of about "10 million" records into "Cassandra" , and
> it took around 90 minutes to insert and 8GB of space.
> For the same number of records MySQL will take "3 GB" space.
>
> Could you please tell me why?
> And please Give me the complete documentation for running Thrift API with
> Java to talk to Cassandra in "Red Hat Linux".
>
>
>
>
>
> Thanks,
> Sharan
>
>
>



-- 
Best Regards

Jeff Zhang

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by sharanabasava raddi <sh...@gmail.com>.
Hi Cao,
Thanks for your response.

actually am using ReplicationFactor = 1.




Thanks,
Sharan

2010/5/25 casablinca126.com <ca...@126.com>

>  hi Sharan,
> what's the replication factor are you using ?
>
> regards,
> Cao Jiguang
>
>
> 2010-05-25
> ------------------------------
>  casablinca126.com
> ------------------------------
>  *发件人:* sharanabasava raddi
> *发送时间:* 2010-05-25  13:46:38
> *收件人:* user@cassandra.apache.org
> *抄送:*
> *主题:* Why Cassandra is "space inefficient" compared to MySQL?
>  Hi all,
> Am running "Cassandra" on Windows XP (single node) machine.
> I have made insertion of about "10 million" records into "Cassandra" , and
> it took around 90 minutes to insert and 8GB of space.
> For the same number of records MySQL will take "3 GB" space.
>
> Could you please tell me why?
> And please Give me the complete documentation for running Thrift API with
> Java to talk to Cassandra in "Red Hat Linux".
>
>
>
>
>
> Thanks,
> Sharan
>
>
>

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by "casablinca126.com" <ca...@126.com>.
hi Sharan,
what's the replication factor are you using ? 

regards,
Cao Jiguang


2010-05-25 



casablinca126.com 



发件人: sharanabasava raddi 
发送时间: 2010-05-25  13:46:38 
收件人: user@cassandra.apache.org 
抄送: 
主题: Why Cassandra is "space inefficient" compared to MySQL? 
 
Hi all,
Am running "Cassandra" on Windows XP (single node) machine.
I have made insertion of about "10 million" records into "Cassandra" , and it took around 90 minutes to insert and 8GB of space. 
For the same number of records MySQL will take "3 GB" space.

Could you please tell me why?
And please Give me the complete documentation for running Thrift API with Java to talk to Cassandra in "Red Hat Linux".





Thanks,
Sharan
 
 

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by sharanabasava raddi <sh...@gmail.com>.
Hi Peter,
Thanks a lot.



Regards,
Sharan

2010/5/25 Peter Schüller <sc...@spotify.com>

> > Could you please tell me why?
>
> There might be pending sstable removals on disk, which won't happen
> until GC or restart. If you just did a bulk insert and checked
> diskspace immediately afterwards, I think this is a possible
> explanation.
>
> (See "Write path" on
> http://wiki.apache.org/cassandra/ArchitectureInternals)
>
> --
> / Peter Schuller aka scode
>

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by Jonathan Ellis <jb...@gmail.com>.
the only place we use a java serializer is for the BitSet in bloom filters.

On Tue, May 25, 2010 at 12:37 PM, Chris Goffinet <go...@digg.com> wrote:
> My money is on the fact that the serializer is just horribly verbose. It's
> using a basic set of the java serializer.
> -Chris
>
>
> On Tue, May 25, 2010 at 10:02 AM, Ryan King <ry...@twitter.com> wrote:
>>
>> Also, timestamps for each column.
>>
>> -ryan
>>
>> On Tue, May 25, 2010 at 5:41 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> > That's true.  But fundamentally Cassandra is expected to use more
>> > space than mysql for a few reasons; usually the biggest factor is that
>> > Cassandra has to write out each column name in each row, since column
>> > names are dynamic unlike in mysql where you declare the columns once
>> > for the whole table.
>> >
>> > 2010/5/25 Peter Schüller <sc...@spotify.com>:
>> >>> Could you please tell me why?
>> >>
>> >> There might be pending sstable removals on disk, which won't happen
>> >> until GC or restart. If you just did a bulk insert and checked
>> >> diskspace immediately afterwards, I think this is a possible
>> >> explanation.
>> >>
>> >> (See "Write path" on
>> >> http://wiki.apache.org/cassandra/ArchitectureInternals)
>> >>
>> >> --
>> >> / Peter Schuller aka scode
>> >>
>> >
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of Riptano, the source for professional Cassandra support
>> > http://riptano.com
>> >
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by Chris Goffinet <go...@digg.com>.
My money is on the fact that the serializer is just horribly verbose. It's
using a basic set of the java serializer.

-Chris


On Tue, May 25, 2010 at 10:02 AM, Ryan King <ry...@twitter.com> wrote:

> Also, timestamps for each column.
>
> -ryan
>
> On Tue, May 25, 2010 at 5:41 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> > That's true.  But fundamentally Cassandra is expected to use more
> > space than mysql for a few reasons; usually the biggest factor is that
> > Cassandra has to write out each column name in each row, since column
> > names are dynamic unlike in mysql where you declare the columns once
> > for the whole table.
> >
> > 2010/5/25 Peter Schüller <sc...@spotify.com>:
> >>> Could you please tell me why?
> >>
> >> There might be pending sstable removals on disk, which won't happen
> >> until GC or restart. If you just did a bulk insert and checked
> >> diskspace immediately afterwards, I think this is a possible
> >> explanation.
> >>
> >> (See "Write path" on
> http://wiki.apache.org/cassandra/ArchitectureInternals)
> >>
> >> --
> >> / Peter Schuller aka scode
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
> >
>

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by Ryan King <ry...@twitter.com>.
Also, timestamps for each column.

-ryan

On Tue, May 25, 2010 at 5:41 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> That's true.  But fundamentally Cassandra is expected to use more
> space than mysql for a few reasons; usually the biggest factor is that
> Cassandra has to write out each column name in each row, since column
> names are dynamic unlike in mysql where you declare the columns once
> for the whole table.
>
> 2010/5/25 Peter Schüller <sc...@spotify.com>:
>>> Could you please tell me why?
>>
>> There might be pending sstable removals on disk, which won't happen
>> until GC or restart. If you just did a bulk insert and checked
>> diskspace immediately afterwards, I think this is a possible
>> explanation.
>>
>> (See "Write path" on http://wiki.apache.org/cassandra/ArchitectureInternals)
>>
>> --
>> / Peter Schuller aka scode
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by Jonathan Ellis <jb...@gmail.com>.
Yes.  But I haven't yet seen a workload with enough data that that
would matter, that wasn't more cpu bound than disk space bound, so that
would usually be premature optimization.

On Tue, May 25, 2010 at 2:23 PM, Robert Edmonds <ed...@debian.org> wrote:
> On 2010-05-25, Jonathan Ellis <jb...@gmail.com> wrote:
>> That's true.  But fundamentally Cassandra is expected to use more
>> space than mysql for a few reasons; usually the biggest factor is that
>> Cassandra has to write out each column name in each row, since column
>> names are dynamic unlike in mysql where you declare the columns once
>> for the whole table.
>
> does this mean that using short column names (e.g., "f" instead of
> "first_seen") will save space when storing billions of rows?
>
> --
> Robert Edmonds
> edmonds@debian.org
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by Robert Edmonds <ed...@debian.org>.
On 2010-05-25, Jonathan Ellis <jb...@gmail.com> wrote:
> That's true.  But fundamentally Cassandra is expected to use more
> space than mysql for a few reasons; usually the biggest factor is that
> Cassandra has to write out each column name in each row, since column
> names are dynamic unlike in mysql where you declare the columns once
> for the whole table.

does this mean that using short column names (e.g., "f" instead of
"first_seen") will save space when storing billions of rows?

-- 
Robert Edmonds
edmonds@debian.org


Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by Jonathan Ellis <jb...@gmail.com>.
That's true.  But fundamentally Cassandra is expected to use more
space than mysql for a few reasons; usually the biggest factor is that
Cassandra has to write out each column name in each row, since column
names are dynamic unlike in mysql where you declare the columns once
for the whole table.

2010/5/25 Peter Schüller <sc...@spotify.com>:
>> Could you please tell me why?
>
> There might be pending sstable removals on disk, which won't happen
> until GC or restart. If you just did a bulk insert and checked
> diskspace immediately afterwards, I think this is a possible
> explanation.
>
> (See "Write path" on http://wiki.apache.org/cassandra/ArchitectureInternals)
>
> --
> / Peter Schuller aka scode
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Why Cassandra is "space inefficient" compared to MySQL?

Posted by Peter Schüller <sc...@spotify.com>.
> Could you please tell me why?

There might be pending sstable removals on disk, which won't happen
until GC or restart. If you just did a bulk insert and checked
diskspace immediately afterwards, I think this is a possible
explanation.

(See "Write path" on http://wiki.apache.org/cassandra/ArchitectureInternals)

-- 
/ Peter Schuller aka scode