You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by S Ahmed <sa...@gmail.com> on 2010/08/15 08:51:33 UTC

indexing rows ordered by int

For CF that I need to perform range scans on, I create separate CF that have
custom ordering.

Say a CF holds comments on a story (like comments on a reddit or digg story
post)

So if I need to order comments by votes, it seems I have to re-index every
time someone votes on a comment (or batch it every x minutes).



Right now I think I have to pull all the comments into memory, then sort by
votes, then re-write the index.

Are there any best-practises for this type of index?

Re: indexing rows ordered by int

Posted by Benjamin Black <b...@b3k.us>.
http://code.google.com/p/redis/wiki/SortedSets

On Tue, Aug 17, 2010 at 12:33 PM, S Ahmed <sa...@gmail.com> wrote:
> So when using Redis, how do you go about updating the index?
> Do you serialize changes to the index i.e. when someone votes, you then
> update the index?
> Little confused as to how to go about updating a huge index.
> Say you have 1 million stores, and you want to order by the top votes, how
> would you maintain such an index since they are being constantly voted on.
> On Sun, Aug 15, 2010 at 10:48 PM, Chris Goffinet <cg...@chrisgoffinet.com>
> wrote:
>>
>> Digg is using redis for such a feature as well.  We use it on the MyNews -
>> Top in 24 hours. Since we need timestamp ordering + sorting by how many
>> friends touch a story.
>>
>> -Chris
>>
>> On Aug 15, 2010, at 7:34 PM, Benjamin Black wrote:
>>
>> > http://code.google.com/p/redis/
>> >
>> > On Sat, Aug 14, 2010 at 11:51 PM, S Ahmed <sa...@gmail.com> wrote:
>> >> For CF that I need to perform range scans on, I create separate CF that
>> >> have
>> >> custom ordering.
>> >> Say a CF holds comments on a story (like comments on a reddit or digg
>> >> story
>> >> post)
>> >> So if I need to order comments by votes, it seems I have to re-index
>> >> every
>> >> time someone votes on a comment (or batch it every x minutes).
>> >>
>> >>
>> >> Right now I think I have to pull all the comments into memory, then
>> >> sort by
>> >> votes, then re-write the index.
>> >> Are there any best-practises for this type of index?
>>
>
>

Re: indexing rows ordered by int

Posted by S Ahmed <sa...@gmail.com>.
So when using Redis, how do you go about updating the index?

Do you serialize changes to the index i.e. when someone votes, you then
update the index?

Little confused as to how to go about updating a huge index.

Say you have 1 million stores, and you want to order by the top votes, how
would you maintain such an index since they are being constantly voted on.

On Sun, Aug 15, 2010 at 10:48 PM, Chris Goffinet <cg...@chrisgoffinet.com>wrote:

> Digg is using redis for such a feature as well.  We use it on the MyNews -
> Top in 24 hours. Since we need timestamp ordering + sorting by how many
> friends touch a story.
>
> -Chris
>
> On Aug 15, 2010, at 7:34 PM, Benjamin Black wrote:
>
> > http://code.google.com/p/redis/
> >
> > On Sat, Aug 14, 2010 at 11:51 PM, S Ahmed <sa...@gmail.com> wrote:
> >> For CF that I need to perform range scans on, I create separate CF that
> have
> >> custom ordering.
> >> Say a CF holds comments on a story (like comments on a reddit or digg
> story
> >> post)
> >> So if I need to order comments by votes, it seems I have to re-index
> every
> >> time someone votes on a comment (or batch it every x minutes).
> >>
> >>
> >> Right now I think I have to pull all the comments into memory, then sort
> by
> >> votes, then re-write the index.
> >> Are there any best-practises for this type of index?
>
>

Re: indexing rows ordered by int

Posted by Chris Goffinet <cg...@chrisgoffinet.com>.
Digg is using redis for such a feature as well.  We use it on the MyNews - Top in 24 hours. Since we need timestamp ordering + sorting by how many friends touch a story.

-Chris

On Aug 15, 2010, at 7:34 PM, Benjamin Black wrote:

> http://code.google.com/p/redis/
> 
> On Sat, Aug 14, 2010 at 11:51 PM, S Ahmed <sa...@gmail.com> wrote:
>> For CF that I need to perform range scans on, I create separate CF that have
>> custom ordering.
>> Say a CF holds comments on a story (like comments on a reddit or digg story
>> post)
>> So if I need to order comments by votes, it seems I have to re-index every
>> time someone votes on a comment (or batch it every x minutes).
>> 
>> 
>> Right now I think I have to pull all the comments into memory, then sort by
>> votes, then re-write the index.
>> Are there any best-practises for this type of index?


Re: indexing rows ordered by int

Posted by Benjamin Black <b...@b3k.us>.
http://code.google.com/p/redis/

On Sat, Aug 14, 2010 at 11:51 PM, S Ahmed <sa...@gmail.com> wrote:
> For CF that I need to perform range scans on, I create separate CF that have
> custom ordering.
> Say a CF holds comments on a story (like comments on a reddit or digg story
> post)
> So if I need to order comments by votes, it seems I have to re-index every
> time someone votes on a comment (or batch it every x minutes).
>
>
> Right now I think I have to pull all the comments into memory, then sort by
> votes, then re-write the index.
> Are there any best-practises for this type of index?

Re: indexing rows ordered by int

Posted by Edward Capriolo <ed...@gmail.com>.
On Sunday, August 15, 2010, S Ahmed <sa...@gmail.com> wrote:
> For CF that I need to perform range scans on, I create separate CF that have custom ordering.
> Say a CF holds comments on a story (like comments on a reddit or digg story post)
> So if I need to order comments by votes, it seems I have to re-index every time someone votes on a comment (or batch it every x minutes).
>
>
>
> Right now I think I have to pull all the comments into memory, then sort by votes, then re-write the index.
> Are there any best-practises for this type of index?
>
It seems that most stories will have few comments 1-100. If you are
only looking to order comments on a given article by vote this seems
like something you would want to store with the article and or
calculate on the fly.

Unless you were looking for a feature like ,show highest rated comment
across all articles, I do not understand why you would need a separate
cf.
Does my suggestion make sense ?if not, can share your storage.xml ?