You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Janne Jalkanen <ja...@ecyrd.com> on 2010/09/14 16:41:08 UTC

Minor question on index design

Hi all!

I'm pondering between a couple of alternatives here: I've got two CFs, one which contains Objects, and one which contains Users. Now, each Object has an owner associated to it, so obviously I need some sort of an index to point from Users to Objects.  This would be of course the perfect usecase for secondary indices on 0.7, but I'm still on 0.6.x.

So, esteemed Cassandra-heads, I'm pondering what would be a better design here:

1) I can create a separate CF "OwnerIdx" which has user id's as keys, and then each of the columns points at an object (with a dummy value, since I just need a list).  This would add a new CF, but on the other hand, this would be easy to drop once 0.7 comes along and I can just make a index query to the Objects CF, OR

2) Put the index inside the Users CF, with "object:<id>" for column name and a dummy value, and then get slices as necessary? This would mean less CFs (and hence no schema modification), but might mean that I have to clean it up at some point.

I don't yet have a lot of CFs, so I'm not worried about mem consumption really.  The Users CF is very read-heavy as-is, but the index and Objects will be a bit more balanced.

Experiences? Recommendations? Tips? Other possibilities? What other considerations should I take into account?

/Janne

Re: Minor question on index design

Posted by Janne Jalkanen <Ja...@ecyrd.com>.

Ok, thanks.  I'm going with Option 1, and try to steer away from  
SuperColumns. That also gives me the option to tweak the caches  
depending on the use pattern (User CF will be accessed in a lot of  
different ways, not just with relation to Objects).

/Janne

On Sep 14, 2010, at 23:46 , Aaron Morton wrote:

> I've been doing option 1 under 0.6. As usual in cassandra though a  
> lot depends on how you access the data.
>
> - If you often want to get the user and all of the objects they  
> have, use option 2. It's easier to have one read from one CF to  
> answer your query.
> - If the user has potentially >10k objects go with option 2. AFAIK  
> large super columns are still inefficient https://issues.apache.org/jira/browse/CASSANDRA-674 
>  https://issues.apache.org/jira/browse/CASSANDRA-598
> - In your OwnerIndex CF consider making the column name something  
> meaningful such as the Object Name or Timestamp (if it has one) so  
> you can slice against it, e.g. to support paging operations. Make  
> the column value the key for the object.
>
> Aaron
>
>
> On 15 Sep, 2010,at 02:41 AM, Janne Jalkanen  
> <ja...@ecyrd.com> wrote:
>
>> Hi all!
>>
>> I'm pondering between a couple of alternatives here: I've got two  
>> CFs, one which contains Objects, and one which contains Users. Now,  
>> each Object has an owner associated to it, so obviously I need some  
>> sort of an index to point from Users to Objects. This would be of  
>> course the perfect usecase for secondary indices on 0.7, but I'm  
>> still on 0.6.x.
>>
>> So, esteemed Cassandra-heads, I'm pondering what would be a better  
>> design here:
>>
>> 1) I can create a separate CF "OwnerIdx" which has user id's as  
>> keys, and then each of the columns points at an object (with a  
>> dummy value, since I just need a list). This would add a new CF,  
>> but on the other hand, this would be easy to drop once 0.7 comes  
>> along and I can just make a index query to the Objects CF, OR
>>
>> 2) Put the index inside the Users CF, with "object:<id>" for column  
>> name and a dummy value, and then get slices as necessary? This  
>> would mean less CFs (and hence no schema modification), but might  
>> mean that I have to clean it up at some point.
>>
>> I don't yet have a lot of CFs, so I'm not worried about mem  
>> consumption really. The Users CF is very read-heavy as-is, but the  
>> index and Objects will be a bit more balanced.
>>
>> Experiences? Recommendations? Tips? Other possibilities? What other  
>> considerations should I take into account?
>>
>> /Janne

Re: Minor question on index design

Posted by Aaron Morton <aa...@thelastpickle.com>.

I've been doing option 1 under 0.6. As usual in cassandra though a lot depends on how you access the data.

- If you often want to get the user and all of the objects they have, use option 2. It's easier to have one read from one CF to answer your query.
- If the user has potentially >10k objects go with option 2. AFAIK large super columns are still inefficient https://issues.apache.org/jira/browse/CASSANDRA-674 https://issues.apache.org/jira/browse/CASSANDRA-598
- In your OwnerIndex CF consider making the column name something meaningful such as the Object Name or Timestamp (if it has one) so you can slice against it, e.g. to support paging operations. Make the column value the key for the object.

Aaron

On 15 Sep, 2010,at 02:41 AM, Janne Jalkanen <ja...@ecyrd.com> wrote:

Hi all!

I'm pondering between a couple of alternatives here: I've got two CFs, one which contains Objects, and one which contains Users. Now, each Object has an owner associated to it, so obviously I need some sort of an index to point from Users to Objects. This would be of course the perfect usecase for secondary indices on 0.7, but I'm still on 0.6.x.

So, esteemed Cassandra-heads, I'm pondering what would be a better design here:

1) I can create a separate CF "OwnerIdx" which has user id's as keys, and then each of the columns points at an object (with a dummy value, since I just need a list). This would add a new CF, but on the other hand, this would be easy to drop once 0.7 comes along and I can just make a index query to the Objects CF, OR

2) Put the index inside the Users CF, with "object:<id>" for column name and a dummy value, and then get slices as necessary? This would mean less CFs (and hence no schema modification), but might mean that I have to clean it up at some point.

I don't yet have a lot of CFs, so I'm not worried about mem consumption really. The Users CF is very read-heavy as-is, but the index and Objects will be a bit more balanced.

Experiences? Recommendations? Tips? Other possibilities? What other considerations should I take into account?

/Janne