You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ertio Lew <er...@gmail.com> on 2011/01/28 19:20:42 UTC

Is it recommended to store two types of data (not related to each other but need to be retrieved together) in one super column family ?

Hi,

I have two kinds of data that I would like to fit in one super column
family; I am trying this, for the reasons of implementing fast
database retrievals by combining the data of two rows into just one
row.

First kind of data, in supercolumn family, is named with timeUUIDs as
supercolumn names; Think of this as, the postIds of posts in a Group.
These posts will need to be sorted by time (so that list of latest
posts is retrieved). Thus each post has one supercolumn each with name
as (timeUUID+userID) and sorted by timeUUIDtype.

Second kind of data would be just a single supercolumn containing
columns of userId of all members in a group(very small). (The no of
members in group will be around 40-50 max). The name of this single
supercolumn may be kept suitable(perhaps max. time in future ) so as
to keep this supercolumn to the beginning.

(The supercolumns are required as we need to store some additional
data in the columns of 1st kind of data).

So is it recommended to store these two types of data (not related to
each other but need to be retrieved together) in one super column
family ?

Re: Is it recommended to store two types of data (not related to each other but need to be retrieved together) in one super column family ?

Posted by Jonathan Ellis <jb...@gmail.com>.

This sounds reasonable to me; the general rule of thumb is, "a row
should be data that you access together."

The tricky part is when you have data that is accessed multiple ways
for multiple queries.  Sometimes the answer is "denormalize,"
sometimes the answer is "accept that the queries you do less often
will be slower," depending on your workload (e.g. ratio of reads to
writes).

On Fri, Jan 28, 2011 at 12:20 PM, Ertio Lew <er...@gmail.com> wrote:
> Hi,
>
> I have two kinds of data that I would like to fit in one super column
> family; I am trying this, for the reasons of implementing fast
> database retrievals by combining the data of two rows into just one
> row.
>
> First kind of data, in supercolumn family, is named with timeUUIDs as
> supercolumn names; Think of this as, the postIds of posts in a Group.
> These posts will need to be sorted by time (so that list of latest
> posts is retrieved). Thus each post has one supercolumn each with name
> as (timeUUID+userID) and sorted by timeUUIDtype.
>
> Second kind of data would be just a single supercolumn containing
> columns of userId of all members in a group(very small). (The no of
> members in group will be around 40-50 max). The name of this single
> supercolumn may be kept suitable(perhaps max. time in future ) so as
> to keep this supercolumn to the beginning.
>
> (The supercolumns are required as we need to store some additional
> data in the columns of 1st kind of data).
>
> So is it recommended to store these two types of data (not related to
> each other but need to be retrieved together) in one super column
> family ?
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Is it recommended to store two types of data (not related to each other but need to be retrieved together) in one super column family ?

Posted by William R Speirs <bi...@gmail.com>.

I'm very new to Cassandra, but I'll pitch in my $0.02.

Row look-ups are super fast, why do you think it would be more efficient to 
store these two rows "together" in the super column method you describe?

Why would you not just look-up the rows, one after the other?

If I understand correctly, you have post_ids, user_ids, and groups. A group 
contains user_ids (people posting to the group) and post_ids (posts made to that 
group)?

So I write a post to a group. You'd add this post_id to the row holding all the 
posts for this group (this might be bad if the number of posts/columns grows 
huge). You'd then have another row associated with the group where you'd insert 
my user_id. Am I close to what you want?

If you can give a more concrete example, I (or someone more familiar with 
Cassandra) could give you more help on designing a schema.

Bill-

On 01/29/2011 01:48 PM, Ertio Lew wrote:
> Could someone please point me in right direction by commenting on the above ideas ?
>
> On Fri, Jan 28, 2011 at 11:50 PM, Ertio Lew <ertiop93@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Hi,
>
>     I have two kinds of data that I would like to fit in one super column
>     family; I am trying this, for the reasons of implementing fast
>     database retrievals by combining the data of two rows into just one
>     row.
>
>     First kind of data, in supercolumn family, is named with timeUUIDs as
>     supercolumn names; Think of this as, the postIds of posts in a Group.
>     These posts will need to be sorted by time (so that list of latest
>     posts is retrieved). Thus each post has one supercolumn each with name
>     as (timeUUID+userID) and sorted by timeUUIDtype.
>
>     Second kind of data would be just a single supercolumn containing
>     columns of userId of all members in a group(very small). (The no of
>     members in group will be around 40-50 max). The name of this single
>     supercolumn may be kept suitable(perhaps max. time in future ) so as
>     to keep this supercolumn to the beginning.
>
>     (The supercolumns are required as we need to store some additional
>     data in the columns of 1st kind of data).
>
>     So is it recommended to store these two types of data (not related to
>     each other but need to be retrieved together) in one super column
>     family ?
>
>

Re: Is it recommended to store two types of data (not related to each other but need to be retrieved together) in one super column family ?

Posted by Ertio Lew <er...@gmail.com>.

Could someone please point me in right direction by commenting on the above
ideas ?

On Fri, Jan 28, 2011 at 11:50 PM, Ertio Lew <er...@gmail.com> wrote:

> Hi,
>
> I have two kinds of data that I would like to fit in one super column
> family; I am trying this, for the reasons of implementing fast
> database retrievals by combining the data of two rows into just one
> row.
>
> First kind of data, in supercolumn family, is named with timeUUIDs as
> supercolumn names; Think of this as, the postIds of posts in a Group.
> These posts will need to be sorted by time (so that list of latest
> posts is retrieved). Thus each post has one supercolumn each with name
> as (timeUUID+userID) and sorted by timeUUIDtype.
>
> Second kind of data would be just a single supercolumn containing
> columns of userId of all members in a group(very small). (The no of
> members in group will be around 40-50 max). The name of this single
> supercolumn may be kept suitable(perhaps max. time in future ) so as
> to keep this supercolumn to the beginning.
>
> (The supercolumns are required as we need to store some additional
> data in the columns of 1st kind of data).
>
> So is it recommended to store these two types of data (not related to
> each other but need to be retrieved together) in one super column
> family ?
>