You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Prasad Sunkari <s....@gmail.com> on 2010/12/22 13:54:44 UTC

Secondary indexes for multi-value fields

Hi all,

I have a column family for users of my system and I need to have tags 
set to these users.  My current plan is to have a column that holds a 
string (comma separated tags).

I am not clear if this the best way to do it.  Specially because this 
may lead to a complications when more than one administrator is trying 
to tag the same user (lost updates) as well as the secondary indexes (if 
I wanted to use the built in secondary indexes).  I also am not sure if 
it is possible to have a secondary index on a multi-valued column!

Another alternative is to have it in a super column with each tag being 
a column by itself and let my application take care of the secondary 
indexes.

I am currently of the opinion that the second solution is the only thing 
that I could do.
Any suggestions?  Since this is my first app on Cassandra I am trying to 
see if my opinion is correct.

Thanks,
Prasad

Re: Secondary indexes for multi-value fields

Posted by Jools <jo...@gmail.com>.
I have a very similar use case in my system, I've solved it as follows;

If all your users have a unique id, such as a login userid.
You could create a new column family, keyed by the userid, and add columns
which have no value, but the column name is the tag value.

Searching these tags later will be much simpler, as you can find them later
using a range slice, and are idempotent as each column contain only one tag.

Hope that makes sense :-)

--Jools



On 22 December 2010 12:54, Prasad Sunkari <s....@gmail.com> wrote:

>
> Hi all,
>
> I have a column family for users of my system and I need to have tags set
> to these users.  My current plan is to have a column that holds a string
> (comma separated tags).
>
> I am not clear if this the best way to do it.  Specially because this may
> lead to a complications when more than one administrator is trying to tag
> the same user (lost updates) as well as the secondary indexes (if I wanted
> to use the built in secondary indexes).  I also am not sure if it is
> possible to have a secondary index on a multi-valued column!
>
> Another alternative is to have it in a super column with each tag being a
> column by itself and let my application take care of the secondary indexes.
>
> I am currently of the opinion that the second solution is the only thing
> that I could do.
> Any suggestions?  Since this is my first app on Cassandra I am trying to
> see if my opinion is correct.
>
> Thanks,
> Prasad
>

Re: Secondary indexes for multi-value fields

Posted by Prasad Sunkari <s....@gmail.com>.
I will frame my question in a different way.

Each user in my system subscribes to updates from selected other users 
(updates are aggregated from outside) and tags the users to which he/she 
is subscribed to.

In my current design, I have a column family called "Followers" keyed by 
userid in which each column name is the userid of another user following 
the first user.  Another super column family called "Subscriptions" 
again keyed by userid in which each super column name is the userid of 
the user to whose updates the "key" is subscribed to - the columns 
contain data the tags.

Obviously I use the tags in lots of places and needs the reverse index 
on tags (list of subscriptions which have a tag).  This is done by 
maintaining another column family - "SubscriptionsByTag"

Now, with the advent of secondary indexes in 0.7 can I redesign it to 
make it a little simpler?  Maybe avoid having to maintain the reverse 
index for tags?

I do understand that secondary indexes are not supported for super 
columns.  So, can I have "Subscriptions" to be a column family where 
userid maps to a comma separated list of tags?  Is it possible, out of 
the box or by implementing some interface to have secondary index over 
such multi valued columns?

What in general would be the best practices for such multi-valued fields 
on which I need a secondary index too. (Joss's reply confused me, am I 
right in thinking that range slices are only for retrieving values for a 
continuous set of keys and not really for secondary indexes)

[Sorry if I seem too naive]

Thanks,
Prasad

On 12/22/2010 09:47 PM, Anand Somani wrote:
>
>
> One approach is to ask yourself questions as to how you would use this 
> information, for example
>
>   * how often to you go from user to tags
>   * how often would you want to go from tag->users.
>   * What kind of reporting would you want to do on tags and how often
>   * Can multiple people add the same tag to the same user, are they
>     maintained separately
>   * Given your business, how many users do you expect
>   * etc.
>
> Depending on that one approach might work better than other. I have 
> not used indexes/non id based searches (do not have that use case) in 
> Cassandra yet, so this is just based on time I have spend reading 
> about it.
>
> One approach using indexes was given by Jool, the other approach is 
> using reverse indexes
>
>   * 2 CF - one for user and one for tags (reverse index)
>   * User - might need to have a SC - with tags and some information
>     like who tagged it
>   * Tag - tag to column of users
>   * Advantage: -
>       o 1 query to find user->tags on user CF
>       o tag->users - on tag CF (I would think this would be more
>         efficient than user->tags since that will potentially hit
>         multiple rows/nodes, unless I have misunderstood secondary
>         indexes)
>   * Disadvantage
>       o Need to write to couple of CF, but writes are relatively
>         cheaper than reads in Cassandra
>       o Since you update 2 CF and there are no transaction, one might
>         succeed and the other might fail
>
> Even with the other suggestion of indexes you can still add the 
> tag->users.
>
>
>
> On Wed, Dec 22, 2010 at 4:54 AM, Prasad Sunkari <s.prasad@gmail.com 
> <ma...@gmail.com>> wrote:
>
>
>     Hi all,
>
>     I have a column family for users of my system and I need to have
>     tags set to these users.  My current plan is to have a column that
>     holds a string (comma separated tags).
>
>     I am not clear if this the best way to do it.  Specially because
>     this may lead to a complications when more than one administrator
>     is trying to tag the same user (lost updates) as well as the
>     secondary indexes (if I wanted to use the built in secondary
>     indexes).  I also am not sure if it is possible to have a
>     secondary index on a multi-valued column!
>
>     Another alternative is to have it in a super column with each tag
>     being a column by itself and let my application take care of the
>     secondary indexes.
>
>     I am currently of the opinion that the second solution is the only
>     thing that I could do.
>     Any suggestions?  Since this is my first app on Cassandra I am
>     trying to see if my opinion is correct.
>
>     Thanks,
>     Prasad
>
>


Re: Secondary indexes for multi-value fields

Posted by Anand Somani <me...@gmail.com>.
One approach is to ask yourself questions as to how you would use this
information, for example

   - how often to you go from user to tags
   - how often would you want to go from tag->users.
   - What kind of reporting would you want to do on tags and how often
   - Can multiple people add the same tag to the same user, are they
   maintained separately
   - Given your business, how many users do you expect
   - etc.

Depending on that one approach might work better than other. I have not used
indexes/non id based searches (do not have that use case) in Cassandra yet,
so this is just based on time I have spend reading about it.

One approach using indexes was given by Jool, the other approach is using
reverse indexes


   - 2 CF - one for user and one for tags (reverse index)
   - User - might need to have a SC - with tags and some information like
   who tagged it
   - Tag - tag to column of users
   - Advantage: -
   - 1 query to find user->tags on user CF
      - tag->users - on tag CF (I would think this would be more efficient
      than user->tags since that will potentially hit multiple
rows/nodes, unless
      I have misunderstood secondary indexes)
      - Disadvantage
      - Need to write to couple of CF, but writes are relatively cheaper
      than reads in Cassandra
      - Since you update 2 CF and there are no transaction, one might
      succeed and the other might fail

Even with the other suggestion of indexes you can still add the tag->users.



On Wed, Dec 22, 2010 at 4:54 AM, Prasad Sunkari <s....@gmail.com> wrote:

>
> Hi all,
>
> I have a column family for users of my system and I need to have tags set
> to these users.  My current plan is to have a column that holds a string
> (comma separated tags).
>
> I am not clear if this the best way to do it.  Specially because this may
> lead to a complications when more than one administrator is trying to tag
> the same user (lost updates) as well as the secondary indexes (if I wanted
> to use the built in secondary indexes).  I also am not sure if it is
> possible to have a secondary index on a multi-valued column!
>
> Another alternative is to have it in a super column with each tag being a
> column by itself and let my application take care of the secondary indexes.
>
> I am currently of the opinion that the second solution is the only thing
> that I could do.
> Any suggestions?  Since this is my first app on Cassandra I am trying to
> see if my opinion is correct.
>
> Thanks,
> Prasad
>