You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mano <ma...@gmail.com> on 2010/02/06 14:00:35 UTC

ranking a big list

Hi,

I have a large list of assessment scores (of about 500k students). The
scores can range between 0 and 200. How do I find the rank of a particular
score? For smaller lists, I can just sort the array of scores and for each
student's score I can get the index of their score in the sorted list. But
how do I achieve it using mapreduce?

Thanks,
mano

-- 
Lord, give us the wisdom to utter words that are gentle and tender, for
tomorrow we may have to eat them.
   -Sen. Morris Udall

Re: ranking a big list

Posted by Mano <ma...@gmail.com>.
On Sat, Feb 6, 2010 at 10:33 PM, Metin Akat <ak...@gmail.com> wrote:

> with this map/reduce setup, the way to get the number of students
> above some value is to query with endkey=<<the score>> and
> descending=true
>
>
Thanks, Metin. Will try it out.

regds,
mano

Re: ranking a big list

Posted by Metin Akat <ak...@gmail.com>.
with this map/reduce setup, the way to get the number of students
above some value is to query with endkey=<<the score>> and
descending=true

On Sat, Feb 6, 2010 at 6:53 PM, Mano <ma...@gmail.com> wrote:
> On Sat, Feb 6, 2010 at 9:17 PM, Mano <ma...@gmail.com> wrote:
>
>>
>> On Sat, Feb 6, 2010 at 6:57 PM, Metin Akat <ak...@gmail.com> wrote:
>>
>>> Hi,
>>> What if 2 students have score of 200 and the next 3 have score of 199?
>>> How do you decide who is fourth?
>>>
>>>
>> There is no fourth rank in that case. Only first (the 2 200's) third (the 3
>> 199's) and sixth etc ranks.
>>
>
> To elaborate, the rank here is to help infer how many students have scored
> better than a given score. Thought of the following scheme. Any pitfalls in
> this?
>
> map: function(doc){
>    emit(doc.score, 1);
> }
> reduce: function(keys, values){
>    return sum(values);
> }
>
> So, to get the rank of, say, 186, I'll query the view with the key: 187 in
> descending order. Will update after I try this out.
>
> thanks,
> mano
>

Re: ranking a big list

Posted by Mano <ma...@gmail.com>.
On Sat, Feb 6, 2010 at 9:17 PM, Mano <ma...@gmail.com> wrote:

>
> On Sat, Feb 6, 2010 at 6:57 PM, Metin Akat <ak...@gmail.com> wrote:
>
>> Hi,
>> What if 2 students have score of 200 and the next 3 have score of 199?
>> How do you decide who is fourth?
>>
>>
> There is no fourth rank in that case. Only first (the 2 200's) third (the 3
> 199's) and sixth etc ranks.
>

To elaborate, the rank here is to help infer how many students have scored
better than a given score. Thought of the following scheme. Any pitfalls in
this?

map: function(doc){
    emit(doc.score, 1);
}
reduce: function(keys, values){
    return sum(values);
}

So, to get the rank of, say, 186, I'll query the view with the key: 187 in
descending order. Will update after I try this out.

thanks,
mano

Re: ranking a big list

Posted by Metin Akat <ak...@gmail.com>.
Write a function that emits every student's score.
Query the view with that score as a key and descending=true
The view will return the offset of the returned result.
That's the position of all the students that have this score.

On Sat, Feb 6, 2010 at 5:47 PM, Mano <ma...@gmail.com> wrote:
> On Sat, Feb 6, 2010 at 6:57 PM, Metin Akat <ak...@gmail.com> wrote:
>
>> Hi,
>> What if 2 students have score of 200 and the next 3 have score of 199?
>> How do you decide who is fourth?
>>
>>
> There is no fourth rank in that case. Only first (the 2 200's) third (the 3
> 199's) and sixth etc ranks.
>
> regds,
> mano
>

Re: ranking a big list

Posted by Mano <ma...@gmail.com>.
On Sat, Feb 6, 2010 at 9:17 PM, Mano <ma...@gmail.com> wrote:

>
> On Sat, Feb 6, 2010 at 6:57 PM, Metin Akat <ak...@gmail.com> wrote:
>
>> Hi,
>> What if 2 students have score of 200 and the next 3 have score of 199?
>> How do you decide who is fourth?
>>
>>
> There is no fourth rank in that case. Only first (the 2 200's) third (the 3
> 199's) and sixth etc ranks.
>
>
map: function(doc){

}

Re: ranking a big list

Posted by Mano <ma...@gmail.com>.
On Sat, Feb 6, 2010 at 6:57 PM, Metin Akat <ak...@gmail.com> wrote:

> Hi,
> What if 2 students have score of 200 and the next 3 have score of 199?
> How do you decide who is fourth?
>
>
There is no fourth rank in that case. Only first (the 2 200's) third (the 3
199's) and sixth etc ranks.

regds,
mano

Re: ranking a big list

Posted by Metin Akat <ak...@gmail.com>.
Hi,
What if 2 students have score of 200 and the next 3 have score of 199?
How do you decide who is fourth?

On Sat, Feb 6, 2010 at 3:00 PM, Mano <ma...@gmail.com> wrote:
> Hi,
>
> I have a large list of assessment scores (of about 500k students). The
> scores can range between 0 and 200. How do I find the rank of a particular
> score? For smaller lists, I can just sort the array of scores and for each
> student's score I can get the index of their score in the sorted list. But
> how do I achieve it using mapreduce?
>
> Thanks,
> mano
>
> --
> Lord, give us the wisdom to utter words that are gentle and tender, for
> tomorrow we may have to eat them.
>   -Sen. Morris Udall
>

Re: ranking a big list

Posted by Paul Joseph Davis <pa...@gmail.com>.
The easiest way to accomplish this would be to emit(doc.score, 1) with  
a plain sum reduction. Then at query time just use startkey of zero  
and end key of the score you're interested in. Granted that has weird  
effects for ties on whether you bump up or down.


On Feb 6, 2010, at 7:58 PM, Matt Lyon <sa...@gmail.com> wrote:

> FWIW, what you're doing here is a statistical analysis called a
> percentile rank. In what I'm doing, for any given set of sums, I want
> to know which quadrile [0-25/26-50/51-75/76-100] of all the sums any
> particular sum is in. I can't necessarily tell you how to do this in
> javascript because I'm interacting with couchdb from ruby anyway and
> using a ruby library (array_statistics) to do this kind of analysis on
> a simple sum reduce from couch, but hopefully this will point you in
> the right direction.
>
> On Sat, Feb 6, 2010 at 5:00 AM, Mano <ma...@gmail.com> wrote:
>> Hi,
>>
>> I have a large list of assessment scores (of about 500k students).  
>> The
>> scores can range between 0 and 200. How do I find the rank of a  
>> particular
>> score? For smaller lists, I can just sort the array of scores and  
>> for each
>> student's score I can get the index of their score in the sorted  
>> list. But
>> how do I achieve it using mapreduce?
>>
>> Thanks,
>> mano
>>
>> --
>> Lord, give us the wisdom to utter words that are gentle and tender,  
>> for
>> tomorrow we may have to eat them.
>>   -Sen. Morris Udall
>>

Re: ranking a big list

Posted by Matt Lyon <sa...@gmail.com>.
FWIW, what you're doing here is a statistical analysis called a
percentile rank. In what I'm doing, for any given set of sums, I want
to know which quadrile [0-25/26-50/51-75/76-100] of all the sums any
particular sum is in. I can't necessarily tell you how to do this in
javascript because I'm interacting with couchdb from ruby anyway and
using a ruby library (array_statistics) to do this kind of analysis on
a simple sum reduce from couch, but hopefully this will point you in
the right direction.

On Sat, Feb 6, 2010 at 5:00 AM, Mano <ma...@gmail.com> wrote:
> Hi,
>
> I have a large list of assessment scores (of about 500k students). The
> scores can range between 0 and 200. How do I find the rank of a particular
> score? For smaller lists, I can just sort the array of scores and for each
> student's score I can get the index of their score in the sorted list. But
> how do I achieve it using mapreduce?
>
> Thanks,
> mano
>
> --
> Lord, give us the wisdom to utter words that are gentle and tender, for
> tomorrow we may have to eat them.
>   -Sen. Morris Udall
>

Re: ranking a big list

Posted by eric casteleijn <er...@canonical.com>.
On 02/06/2010 08:00 AM, Mano wrote:
> Hi,
>
> I have a large list of assessment scores (of about 500k students). The
> scores can range between 0 and 200. How do I find the rank of a particular
> score? For smaller lists, I can just sort the array of scores and for each
> student's score I can get the index of their score in the sorted list. But
> how do I achieve it using mapreduce?

Not sure if it's the optimal way, but what I'd do is this:

have a map/reduce view that counts the students per score (assuming 
you've only got integer scores, this will always return at most 200 
rows), and sort those client side.

To find out the rank of a particular student, just look up their score 
in that sorted field result, and sum up all the counts of the scores 
before it.

It does mean two rather than one http call, but I don't immediately see 
a clever way of getting away with less.