You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Richard Grossman <ri...@gmail.com> on 2010/03/17 16:38:48 UTC

Model to store biggest score

Hi,

I trying to find a model where I can keep the list of biggest score for
users.
it's seems simple but I'm stuck here .
For example user1 score = 10
                    user2 score = 20
                    user3 score = 30

Query: Top score (2) = user3, user2
If someone have made something similar thanks for sharing

Richard

Unsubscribe

Posted by John Alessi <jo...@socketlabs.com>.

--
John Alessi
SocketLabs, Inc.
484-418-1282


On Mar 18, 2010, at 10:12 AM, Erik Holstad wrote:

Another approach you can take is to add the userid to the score like,
=> (column=140_uid2, value=[], timestamp=1268841641979)
and f you need the score time sorted you can add
=> (column=140_268841641979_uid2, value=[], timestamp=1268841641979)

But I do think that in any case you need to remove the old entry so that you don't
get duplicates, unless I'm missing something here.


On Wed, Mar 17, 2010 at 9:52 AM, Brandon Williams <dr...@gmail.com>> wrote:
On Wed, Mar 17, 2010 at 11:48 AM, Richard Grossman <ri...@gmail.com>> wrote:
But in the case of simple column family I've the same problem when I update the score of 1 user then I need to remove his old score too. For example here the user uid5 was at 130 now he is at 140 because I add the random number cassandra will keep all the score evolution.

You can maintain another index mapping users to the values.  Depending on your use case though, if this is time-based, you can name the rows by the date and just create new rows as time goes on.

-Brandon



--
Regards Erik

Re: Model to store biggest score

Posted by Erik Holstad <er...@gmail.com>.

Another approach you can take is to add the userid to the score like,
=> (column=140_uid2, value=[], timestamp=1268841641979)
and f you need the score time sorted you can add
=> (column=140_268841641979_uid2, value=[], timestamp=1268841641979)

But I do think that in any case you need to remove the old entry so that you
don't
get duplicates, unless I'm missing something here.


On Wed, Mar 17, 2010 at 9:52 AM, Brandon Williams <dr...@gmail.com> wrote:

> On Wed, Mar 17, 2010 at 11:48 AM, Richard Grossman <ri...@gmail.com>wrote:
>
>> But in the case of simple column family I've the same problem when I
>> update the score of 1 user then I need to remove his old score too. For
>> example here the user uid5 was at 130 now he is at 140 because I add the
>> random number cassandra will keep all the score evolution.
>>
>
> You can maintain another index mapping users to the values.  Depending on
> your use case though, if this is time-based, you can name the rows by the
> date and just create new rows as time goes on.
>
> -Brandon
>



-- 
Regards Erik

Re: Model to store biggest score

Posted by Brandon Williams <dr...@gmail.com>.

On Wed, Mar 17, 2010 at 11:48 AM, Richard Grossman <ri...@gmail.com>wrote:

> But in the case of simple column family I've the same problem when I update
> the score of 1 user then I need to remove his old score too. For example
> here the user uid5 was at 130 now he is at 140 because I add the random
> number cassandra will keep all the score evolution.
>

You can maintain another index mapping users to the values.  Depending on
your use case though, if this is time-based, you can name the rows by the
date and just create new rows as time goes on.

-Brandon

Re: Model to store biggest score

Posted by Richard Grossman <ri...@gmail.com>.

But in the case of simple column family I've the same problem when I update
the score of 1 user then I need to remove his old score too. For example
here the user uid5 was at 130 now he is at 140 because I add the random
number cassandra will keep all the score evolution.

get Keyspace2.topScoreUser['top']
=> (column=140-1, value=uid5, timestamp=1268841641979)
=> (column=130-2, value=uid5, timestamp=1268841614066)
=> (column=130-1, value=uid4, timestamp=1268841594786)
=> (column=130, value=uid4, timestamp=1268841517352)
=> (column=120, value=uid3, timestamp=1268841509536)
=> (column=110, value=uid2, timestamp=1268841501720)
=> (column=100, value=uid1, timestamp=1268841496069)

Is it wrong ?

On Wed, Mar 17, 2010 at 6:20 PM, Brandon Williams <dr...@gmail.com> wrote:

> On Wed, Mar 17, 2010 at 11:13 AM, Toby DiPasquale <to...@cbcg.net> wrote:
>>
>> Couldn't you just use a supercolumn whose keys were the score and the
>> subcolumns were username:true? Basically using the subcolumns as a
>> list?
>>
>
> Sure, but that complicates getting the top N scores.  You'd have to use the
> OrderedPartioner, so it's a bit less flexible.  Also, any time a score
> changed you'd have to find the old one and remove them.
>
> -Brandon
>

Re: Model to store biggest score

Posted by Brandon Williams <dr...@gmail.com>.

On Wed, Mar 17, 2010 at 11:13 AM, Toby DiPasquale <to...@cbcg.net> wrote:
>
> Couldn't you just use a supercolumn whose keys were the score and the
> subcolumns were username:true? Basically using the subcolumns as a
> list?
>

Sure, but that complicates getting the top N scores.  You'd have to use the
OrderedPartioner, so it's a bit less flexible.  Also, any time a score
changed you'd have to find the old one and remove them.

-Brandon

Re: Model to store biggest score

Posted by Toby DiPasquale <to...@cbcg.net>.

On Wed, Mar 17, 2010 at 12:10 PM, Brandon Williams <dr...@gmail.com> wrote:
> On Wed, Mar 17, 2010 at 11:05 AM, Richard Grossman <ri...@gmail.com>
> wrote:
>>
>> Thanks, But what do you mean by ?
>>
>>> pack a random integer after the score (so the sort order is maintained)
>>> in big endian format and only examine the first 8 bytes of the column upon
>>> retrieval.
>>> -Brandon
>>
>> Do I need to take the score and add like -number like 100-1, 100-2, 100-3
>> etc... to prevent collision ?
>> Thanks
>
> You have the score, which you pack in  big endian format, resulting in 8
> bytes.  Then you generate a complete random number and pack it in big endian
> format as well, resulting in another 8 bytes.  Now you concatenate the two
> together (with the score first, to maintain sort order) and insert the
> column.  When you retrieve it, you only look at the first 8 bytes to get the
> score since the random number isn't important.
> -Brandon

Couldn't you just use a supercolumn whose keys were the score and the
subcolumns were username:true? Basically using the subcolumns as a
list?

-- 
Toby DiPasquale

Re: Model to store biggest score

Posted by Brandon Williams <dr...@gmail.com>.

On Wed, Mar 17, 2010 at 11:05 AM, Richard Grossman <ri...@gmail.com>wrote:

> Thanks, But what do you mean by ?
>
> pack a random integer after the score (so the sort order is maintained) in
>> big endian format and only examine the first 8 bytes of the column upon
>> retrieval.
>>
>> -Brandon
>>
>
> Do I need to take the score and add like -number like 100-1, 100-2, 100-3
> etc... to prevent collision ?
> Thanks
>

You have the score, which you pack in  big endian format, resulting in 8
bytes.  Then you generate a complete random number and pack it in big endian
format as well, resulting in another 8 bytes.  Now you concatenate the two
together (with the score first, to maintain sort order) and insert the
column.  When you retrieve it, you only look at the first 8 bytes to get the
score since the random number isn't important.

-Brandon

Re: Model to store biggest score

Posted by Richard Grossman <ri...@gmail.com>.

Thanks, But what do you mean by ?

pack a random integer after the score (so the sort order is maintained) in
> big endian format and only examine the first 8 bytes of the column upon
> retrieval.
>
> -Brandon
>

Do I need to take the score and add like -number like 100-1, 100-2, 100-3
etc... to prevent collision ?
Thanks

Re: Model to store biggest score

Posted by Brandon Williams <dr...@gmail.com>.

On Wed, Mar 17, 2010 at 10:38 AM, Richard Grossman <ri...@gmail.com>wrote:

> Hi,
>
> I trying to find a model where I can keep the list of biggest score for
> users.
> it's seems simple but I'm stuck here .
> For example user1 score = 10
>                     user2 score = 20
>                     user3 score = 30
>
> Query: Top score (2) = user3, user2
> If someone have made something similar thanks for sharing
>

You can use a LongType column where the column name is the score and the
value is the user.  However, two users with the same score will collide.
 One way to get around this is to use a BytesType and pack a random integer
after the score (so the sort order is maintained) in big endian format and
only examine the first 8 bytes of the column upon retrieval.

-Brandon