You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ertio Lew <er...@gmail.com> on 2012/11/05 23:28:37 UTC

Seeking Schema guidance

I need to store (1)posts written by users, (2)along with activity data by
other users on these posts & (3) some counters for each post like views
counts, likes counts, etc. So for each post,  there is 3 category of data
associated, the original post data which is stored in one CF using single
row per post, another counters data using 1 row for each post data in
counters type CF & for activity data, each user stores his own activity
column for each post he reacted to & also stores activity data of all his
friends in a dedicated row for every user.


So here is my current schema plan :

For Posts:
-------------
1 CF with single row for each post


For Counters:
------------------
1 CF with single row for each post


For Activities Data
---------------------------

1 CF with single row for each user



Now for showing the post at anytime I need to have all the 3 categories of
data so I'm forced to read 3 CFs. So I have been wondering why I shouldn't
be trying to merge this data into a single CF as materialized view in
single row so that read queries could be made more efficiently.

Here is the idea I have got:

For each post I would be storing the post data (written once never updated
type)+ activities data of all users on that post (written for each user at
different times & may be edited many times) in a 'single row'. Using
the activities data of all users I can calculate all the counters data(by
iterating over activity columns), so I don't need to store that explicitly.
So now for reading some 10 posts at a time, I just need to read 10 rows.
Also I set a reasonable limit on no of columns to read so that if the post
counters are too big I don't have to read all column, then in that (less
often)cases I perform a second query to read the counters from another CF.
So for most of the time I would enjoy reading from single CF & single row
for each post. But another issue is that since that single row will contain
activity of several users (each column added at different times to row) so
that row might go in many SSTtables.  So which is a good schema for me 1st
one or 2nd with respect to performance ?

Thanks.

Re: Seeking Schema guidance

Posted by Ertio Lew <er...@gmail.com>.
Thoughts ?


On Tue, Nov 6, 2012 at 3:58 AM, Ertio Lew <er...@gmail.com> wrote:

> I need to store (1)posts written by users, (2)along with activity data by
> other users on these posts & (3) some counters for each post like views
> counts, likes counts, etc. So for each post,  there is 3 category of data
> associated, the original post data which is stored in one CF using single
> row per post, another counters data using 1 row for each post data in
> counters type CF & for activity data, each user stores his own activity
> column for each post he reacted to & also stores activity data of all his
> friends in a dedicated row for every user.
>
>
> So here is my current schema plan :
>
> For Posts:
> -------------
> 1 CF with single row for each post
>
>
> For Counters:
> ------------------
> 1 CF with single row for each post
>
>
> For Activities Data
> ---------------------------
>
> 1 CF with single row for each user
>
>
>
> Now for showing the post at anytime I need to have all the 3 categories of
> data so I'm forced to read 3 CFs. So I have been wondering why I shouldn't
> be trying to merge this data into a single CF as materialized view in
> single row so that read queries could be made more efficiently.
>
> Here is the idea I have got:
>
> For each post I would be storing the post data (written once never updated
> type)+ activities data of all users on that post (written for each user at
> different times & may be edited many times) in a 'single row'. Using
> the activities data of all users I can calculate all the counters data(by
> iterating over activity columns), so I don't need to store that explicitly.
> So now for reading some 10 posts at a time, I just need to read 10 rows.
> Also I set a reasonable limit on no of columns to read so that if the post
> counters are too big I don't have to read all column, then in that (less
> often)cases I perform a second query to read the counters from another CF.
> So for most of the time I would enjoy reading from single CF & single row
> for each post. But another issue is that since that single row will contain
> activity of several users (each column added at different times to row) so
> that row might go in many SSTtables.  So which is a good schema for me 1st
> one or 2nd with respect to performance ?
>
> Thanks.
>
>
>
>
>
>
>
>