You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Je...@nokia.com on 2010/03/02 23:13:07 UTC

Index values: data or pointers?

I'm exploring data layouts and it seems like the common practice is to store an index in one CF (e.g. userid for row key and thingid for column name) and then to fetch all the things by their thingids separately... so get index, and then get each key in the index.

If a thing changes relatively infrequently but gets read often, seems like it would be more performant (especially with writes being very fast) to just stuff whole objects into indexes rather than simply ids. A "whole object" could be a JSON object or a serialized class or who knows what.

Are there drawbacks to that approach, other than space?

Thanks,
Jeremey.




Re: Index values: data or pointers?

Posted by Jonathan Ellis <jb...@gmail.com>.
right.  as long as you don't have a ton of subcolumns (which is
usually the case for a denormalize like this) then you're fine.

On Tue, Mar 2, 2010 at 4:48 PM,  <Je...@nokia.com> wrote:
> On Mar 2, 2010, at 4:17 PM, ext Jonathan Ellis wrote:
>
>> On Tue, Mar 2, 2010 at 4:13 PM,  <Je...@nokia.com> wrote:
>>> I'm exploring data layouts and it seems like the common practice is to store an index in one CF (e.g. userid for row key and thingid for column name) and then to fetch all the things by their thingids separately... so get index, and then get each key in the index.
>>>
>>> If a thing changes relatively infrequently but gets read often, seems like it would be more performant (especially with writes being very fast) to just stuff whole objects into indexes rather than simply ids. A "whole object" could be a JSON object or a serialized class or who knows what.
>>
>> Yes.  This is one place supercolumns can be very useful, since it
>> allows doing this w/o nasty hacks like you mention. :)
>
> Good point. :)
>
> I got it in my head that supercolumns aren't indexed (from the ticket of that name http://issues.apache.org/jira/browse/CASSANDRA-598), but actually it's the subcolumns that aren't indexed, correct? (the former never made any sense to me)
>
> Thanks again,
> Jeremey.
>
>

Re: Index values: data or pointers?

Posted by Je...@nokia.com.
On Mar 2, 2010, at 4:17 PM, ext Jonathan Ellis wrote:

> On Tue, Mar 2, 2010 at 4:13 PM,  <Je...@nokia.com> wrote:
>> I'm exploring data layouts and it seems like the common practice is to store an index in one CF (e.g. userid for row key and thingid for column name) and then to fetch all the things by their thingids separately... so get index, and then get each key in the index.
>> 
>> If a thing changes relatively infrequently but gets read often, seems like it would be more performant (especially with writes being very fast) to just stuff whole objects into indexes rather than simply ids. A "whole object" could be a JSON object or a serialized class or who knows what.
> 
> Yes.  This is one place supercolumns can be very useful, since it
> allows doing this w/o nasty hacks like you mention. :)

Good point. :)

I got it in my head that supercolumns aren't indexed (from the ticket of that name http://issues.apache.org/jira/browse/CASSANDRA-598), but actually it's the subcolumns that aren't indexed, correct? (the former never made any sense to me)

Thanks again,
Jeremey.


Re: Index values: data or pointers?

Posted by Jonathan Ellis <jb...@gmail.com>.
On Tue, Mar 2, 2010 at 4:13 PM,  <Je...@nokia.com> wrote:
> I'm exploring data layouts and it seems like the common practice is to store an index in one CF (e.g. userid for row key and thingid for column name) and then to fetch all the things by their thingids separately... so get index, and then get each key in the index.
>
> If a thing changes relatively infrequently but gets read often, seems like it would be more performant (especially with writes being very fast) to just stuff whole objects into indexes rather than simply ids. A "whole object" could be a JSON object or a serialized class or who knows what.

Yes.  This is one place supercolumns can be very useful, since it
allows doing this w/o nasty hacks like you mention. :)

-Jonathan