You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by aaron morton <aa...@thelastpickle.com> on 2012/10/01 23:39:42 UTC

Re: Data Modeling: Comments with Voting

You cannot (and probably do not want to) sort continually when the voting is going on. 

You can store the votes using CounterColumnTypes in column values. When someone votes you then (somehow) queue a job that will read the vote counts for the post / comment, pivot and sort on the vote count, and then write the updated leader board to cassandra. 

Alternatively if you have a small number of comments for a post just read all the votes and sort them as part of the read. 

Cheers
  
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/09/2012, at 8:25 AM, Drew Kutcharian <dr...@venarc.com> wrote:

> Thanks Roshni,
> 
> I'm not sue how #d will work when users are actually voting on a comment. What happens when two users vote on the same comment simultaneously? How do you update the entries in #d column family to prevent duplicates?
> 
>  Also #a and #c can be combined together using TimeUUID as comment ids.
> 
> - Drew
> 
> 
> 
> On Sep 27, 2012, at 2:13 AM, Roshni Rajagopal <ro...@hotmail.com> wrote:
> 
>> Hi Drew,
>> 
>> I think you have 4 requirements. Here are my suggestions.
>> 
>> a) store comments : have a static column family for comments with master data like created date, created by , length etc
>> b) when a person votes for a comment, increment a vote counter : have a counter column family for incrementing the votes for each comment
>> c) display comments sorted by date created: have a column family with a dummy row id  'sort_by_time_list',  column names can be date created(timeUUID), and column value can be comment id 
>> d) display comments sorted by number of votes: have a column family with a dummy row id 'sort_by_votes_list' and column names can be a composite of number of votes , and comment id ( as more than 1 comment can have the same votes)
>> 
>> 
>> Regards,
>> Roshni
>> 
>> > Date: Wed, 26 Sep 2012 17:36:13 -0700
>> > From: kirk@mustardgrain.com
>> > To: user@cassandra.apache.org
>> > CC: drew@venarc.com
>> > Subject: Re: Data Modeling: Comments with Voting
>> > 
>> > Depending on your needs, you could simply duplicate the comments in two 
>> > separate CFs with the column names including time in one and the vote in 
>> > the other. If you allow for updates to the comments, that would pose 
>> > some issues you'd need to solve at the app level.
>> > 
>> > On 9/26/12 4:28 PM, Drew Kutcharian wrote:
>> > > Hi Guys,
>> > >
>> > > Wondering what would be the best way to model a flat (no sub comments, i.e. twitter) comments list with support for voting (where I can sort by create time or votes) in Cassandra?
>> > >
>> > > To demonstrate:
>> > >
>> > > Sorted by create time:
>> > > - comment 1 (5 votes)
>> > > - comment 2 (1 votes)
>> > > - comment 3 (no votes)
>> > > - comment 4 (10 votes)
>> > >
>> > > Sorted by votes:
>> > > - comment 4 (10 votes)
>> > > - comment 1 (5 votes)
>> > > - comment 2 (1 votes)
>> > > - comment 3 (no votes)
>> > >
>> > > It's the sorted-by-votes that I'm having a bit of a trouble with. I'm looking for a roll-your-own approach and prefer not to use secondary indexes and CQL sorting.
>> > >
>> > > Thanks,
>> > >
>> > > Drew
>> > >
>> > 


RE: Data Modeling: Comments with Voting

Posted by Roshni Rajagopal <ro...@hotmail.com>.
Hi , 
To explain my suggestions - my thoughts were 
a) you need to store entity type information about a comment like date created, comment text, commented by etc. I cant think of any other master information for a comment, but in general one starts with entities in a standard static column family.  If you store an entity in a dynamic denormailized form, if any master data changes you would need to iterate across all rows and update it which is expensive in cassandra. Here comment text is editable.
b) So when a comment is created it goes to the static column family. Also an entry is made in the dynamic sort_by_time_list column family with column as time created. I didn't suggest a and c be clubbed so that master information remains in one place. The other approach would be to have a comment stored as a JSON in the column value. However if you need to update comment text    , it would be hard to identify the comment column and update it. c) when a comment gets a vote, the counter column family is incremented to know the number of votes for a comment. Also to sort by number of votes  , after incrementing the counter you need to write the current number of votes, and the comment id in the column family d. But I see now that you also need to delete the old number of votes & comment id column and add a new  column with current number of votes and comment id. It would be sorted by number of votes.
If there are many ways to sort, its better to do it in the application to avoid having a new column family for each type of sort...however Im not certain over time and volume which approach would perform better.Sorting can be complex - aaron's blog post http://thelastpickle.com/2012/08/18/Sorting-Lists-For-Humans/  
Welcome any feedback on my suggestions.


From: aaron@thelastpickle.com
Subject: Re: Data Modeling: Comments with Voting
Date: Tue, 2 Oct 2012 10:39:42 +1300
To: user@cassandra.apache.org

You cannot (and probably do not want to) sort continually when the voting is going on. 
You can store the votes using CounterColumnTypes in column values. When someone votes you then (somehow) queue a job that will read the vote counts for the post / comment, pivot and sort on the vote count, and then write the updated leader board to cassandra. 
Alternatively if you have a small number of comments for a post just read all the votes and sort them as part of the read. 
Cheers  

-----------------Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com



On 30/09/2012, at 8:25 AM, Drew Kutcharian <dr...@venarc.com> wrote:Thanks Roshni,
I'm not sue how #d will work when users are actually voting on a comment. What happens when two users vote on the same comment simultaneously? How do you update the entries in #d column family to prevent duplicates?
 Also #a and #c can be combined together using TimeUUID as comment ids.
- Drew



On Sep 27, 2012, at 2:13 AM, Roshni Rajagopal <ro...@hotmail.com> wrote:





Hi Drew,
I think you have 4 requirements. Here are my suggestions.
a) store comments : have a static column family for comments with master data like created date, created by , length etcb) when a person votes for a comment, increment a vote counter : have a counter column family for incrementing the votes for each commentc) display comments sorted by date created: have a column family with a dummy row id  'sort_by_time_list',  column names can be date created(timeUUID), and column value can be comment id d) display comments sorted by number of votes: have a column family with a dummy row id 'sort_by_votes_list' and column names can be a composite of number of votes , and comment id ( as more than 1 comment can have the same votes)

Regards,Roshni

> Date: Wed, 26 Sep 2012 17:36:13 -0700
> From: kirk@mustardgrain.com
> To: user@cassandra.apache.org
> CC: drew@venarc.com
> Subject: Re: Data Modeling: Comments with Voting
> 
> Depending on your needs, you could simply duplicate the comments in two 
> separate CFs with the column names including time in one and the vote in 
> the other. If you allow for updates to the comments, that would pose 
> some issues you'd need to solve at the app level.
> 
> On 9/26/12 4:28 PM, Drew Kutcharian wrote:
> > Hi Guys,
> >
> > Wondering what would be the best way to model a flat (no sub comments, i.e. twitter) comments list with support for voting (where I can sort by create time or votes) in Cassandra?
> >
> > To demonstrate:
> >
> > Sorted by create time:
> > - comment 1 (5 votes)
> > - comment 2 (1 votes)
> > - comment 3 (no votes)
> > - comment 4 (10 votes)
> >
> > Sorted by votes:
> > - comment 4 (10 votes)
> > - comment 1 (5 votes)
> > - comment 2 (1 votes)
> > - comment 3 (no votes)
> >
> > It's the sorted-by-votes that I'm having a bit of a trouble with. I'm looking for a roll-your-own approach and prefer not to use secondary indexes and CQL sorting.
> >
> > Thanks,
> >
> > Drew
> >
>