You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Carlos Sanchez <ca...@riskmetrics.com> on 2010/03/31 01:16:28 UTC

Insertion time question

I was wondering if I could have a bit more insight as why we are seeing different insertion times between regular column families and super columns.

We have a group object (with its name) that may have a series of attributes (name/value). There can be up a million group object and different groups can share several attributes. In our first design we had a super column we have the column path as

ColumnPath ("Index", [attribute value], [group name]) and row key is the attribute name. The value
we are inserting is an empty byte array

In the second design we simply our model and

ColumnPath ("Index", null, [group name]) and the row key is simply the attribute name concatenated with the attribute value. The value inserted again is an empty array

In the first case we, inserting 250K group it took about 1.5 hours and in the second case it took 45 minutes. In both tests, we started Cassandra with no data, using OPP in two nodes (each 16 core 64 GB)

We are wondering why inserting when using super columns we get lower performance.

Thanks,

Carlos

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: Insertion time question

Posted by Jonathan Ellis <jb...@gmail.com>.

Hard to say without busting out the profiler.  "supercolumns are
slower" is not a surprise to anyone at this point, I'm afraid.

On Tue, Mar 30, 2010 at 6:16 PM, Carlos Sanchez
<ca...@riskmetrics.com> wrote:
> I was wondering if I could have a bit more insight as why we are seeing different insertion times between regular column families and super columns.
>
> We have a group object (with its name) that may have a series of attributes (name/value). There can be up a million group object and different groups can share several attributes. In our first design we had a super column we have the column path as
>
>        ColumnPath ("Index", [attribute value], [group name]) and row key is the attribute name. The value
>        we are inserting is an empty byte array
>
> In the second design we simply our model and
>
>        ColumnPath ("Index", null, [group name]) and the row key is simply the attribute name concatenated      with the attribute value. The value inserted again is an empty array
>
> In the first case we, inserting 250K group it took about 1.5 hours and in the second case it took 45 minutes. In both tests, we started Cassandra with no data, using OPP in two nodes (each 16 core 64 GB)
>
> We are wondering why inserting when using super columns we get lower performance.
>
> Thanks,
>
> Carlos
>
>
>
>
> This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.
>