You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Peter Braden <Pe...@PeterBraden.co.uk> on 2010/09/21 19:27:46 UTC

Random Document

Hi,

Is there a good way to get a random document from a database. I'm currently
using a view that does:

function(doc) {
    emit(Math.random(), doc);
};

But as this isn't deterministic, I'm pretty sure it's wrong.

I've done a bit of googling, and haven't found anything.

Cheers,

Peter



--
Peter Braden

<http://PeterBraden.co.uk/>

Re: Random Document

Posted by Peter Nolan <pe...@gmail.com>.
sounds like a map-reduce is what you're looking for.  Or, you could emit all
the documents, and choose a random row entry on the user end (though that
would be poor practice depending on the size of your database etc.)

On Tue, Sep 21, 2010 at 1:27 PM, Peter Braden <PeterBraden@peterbraden.co.uk
> wrote:

> Hi,
>
> Is there a good way to get a random document from a database. I'm currently
> using a view that does:
>
> function(doc) {
>    emit(Math.random(), doc);
> };
>
> But as this isn't deterministic, I'm pretty sure it's wrong.
>
> I've done a bit of googling, and haven't found anything.
>
> Cheers,
>
> Peter
>
>
>
> --
> Peter Braden
>
> <http://PeterBraden.co.uk/>
>

Re: Random Document

Posted by "Eli Stevens (Gmail)" <wi...@gmail.com>.
Unless there are additional restrictions that can be imposed, I'm
pretty sure that you're going to end up needing to get the full list
of IDs, and select x of them at random without replacement to fully
match 'SORT BY RANDOM LIMIT X'.

However, depending on what you are doing with them, it's possible that
other approaches might work.  For example, you could add a
'uniformRandomValue' key to the doc which is set at document creation,
and have a view that does emit(doc.uniformRandomValue, doc._id), then
when you query the view you can (again, depending on what you're doing
with the random selection) either pick the lowest keys ('&limit=X') or
pick a random startkey in the range along with the limit
('&startkey='[0.12345]'&limit=X').  That works great when you're using
the docs as something like a work queue, where after being chosen
once, the docs are removed from the queue.  However, if the docs stick
around, you can end up with problems.  Imagine your
doc.uniformRandomValues look like:

urv: id
0.1: A
0.7: B
0.8: C
0.9: D

Selecting from this distribution with a random startkey and limit of 2
makes it very unlikely that A or D are selected, unless you remove B
and C after they're picked the first time, and implement some sort of
wrap-around to get A if the startkey is 0.85.

If that kind of approach doesn't work for you, then it would be
helpful to more about the requirements.  :)

HTH,
Eli


On Tue, Sep 21, 2010 at 2:49 PM, Peter Braden
<Pe...@peterbraden.co.uk> wrote:
> Hi,
>
> I'm after a) - the equivalent of a 'SORT BY RANDOM LIMIT x' sql statement.
>
>> But as this isn't deterministic, I'm pretty sure it's wrong.
>> I don't follow your logic. The view will show all documents in a random
> order. The fact that is is unrepeatable may make it useless for > your
> purposes, but it does not make the maths invalid, or the statistics wrong.
>
> As far as I know, the couchdb internals rely on the fact that view keys are
> deterministic to do their view updates.
>
> I'm not entirely convinced that my current function produces a good random
> selection - if a document is updated more, and therefore it's view entry is
> updated more, does that mean it has a different chance of being selected?
>
> Cheers,
>
> Peter
>
>
>
> On 21 September 2010 20:25, Ian Hobson <ia...@ianhobson.co.uk> wrote:
>
>> On 21/09/2010 18:27, Peter Braden wrote:
>>
>>> Hi,
>>>
>>> Is there a good way to get a random document from a database.
>>>
>> Hmm, that depends upon what you mean by "good", and "random" and if you
>> want a repeatable result! I guess I'm asking what exactly are you trying to
>> do?
>>
>> a) Pick a representative, and statistically defensible sample of size X
>> from a population of Y documents where each document has an equal
>> probability of being selected, and cannot be selected twice.
>>
>> b) Take a sample of size 1 from a population of Y, X times (so a given
>> document could be taken more than once)?
>>
>> c) Something similar to a or b where you don't know Y in advance?
>>
>> d) Shuffle the documents?
>>
>>
>>  I'm currently
>>
>>> using a view that does:
>>>
>>> function(doc) {
>>>     emit(Math.random(), doc);
>>> };
>>>
>>> But as this isn't deterministic, I'm pretty sure it's wrong.
>>>
>> I don't follow your logic. The view will show all documents in a random
>> order. The fact that is is unrepeatable may make it useless for your
>> purposes, but it does not make the maths invalid, or the statistics wrong.
>>
>> Regards
>>
>> Ian
>>
>
>
>
> --
> --
> Peter Braden
>
> <http://PeterBraden.co.uk/>
>



-- 
Eli

Re: Random Document

Posted by Peter Braden <Pe...@PeterBraden.co.uk>.
Hi,

I'm after a) - the equivalent of a 'SORT BY RANDOM LIMIT x' sql statement.

> But as this isn't deterministic, I'm pretty sure it's wrong.
> I don't follow your logic. The view will show all documents in a random
order. The fact that is is unrepeatable may make it useless for > your
purposes, but it does not make the maths invalid, or the statistics wrong.

As far as I know, the couchdb internals rely on the fact that view keys are
deterministic to do their view updates.

I'm not entirely convinced that my current function produces a good random
selection - if a document is updated more, and therefore it's view entry is
updated more, does that mean it has a different chance of being selected?

Cheers,

Peter



On 21 September 2010 20:25, Ian Hobson <ia...@ianhobson.co.uk> wrote:

> On 21/09/2010 18:27, Peter Braden wrote:
>
>> Hi,
>>
>> Is there a good way to get a random document from a database.
>>
> Hmm, that depends upon what you mean by "good", and "random" and if you
> want a repeatable result! I guess I'm asking what exactly are you trying to
> do?
>
> a) Pick a representative, and statistically defensible sample of size X
> from a population of Y documents where each document has an equal
> probability of being selected, and cannot be selected twice.
>
> b) Take a sample of size 1 from a population of Y, X times (so a given
> document could be taken more than once)?
>
> c) Something similar to a or b where you don't know Y in advance?
>
> d) Shuffle the documents?
>
>
>  I'm currently
>
>> using a view that does:
>>
>> function(doc) {
>>     emit(Math.random(), doc);
>> };
>>
>> But as this isn't deterministic, I'm pretty sure it's wrong.
>>
> I don't follow your logic. The view will show all documents in a random
> order. The fact that is is unrepeatable may make it useless for your
> purposes, but it does not make the maths invalid, or the statistics wrong.
>
> Regards
>
> Ian
>



-- 
--
Peter Braden

<http://PeterBraden.co.uk/>

Re: Random Document

Posted by Ian Hobson <ia...@ianhobson.co.uk>.
On 21/09/2010 18:27, Peter Braden wrote:
> Hi,
>
> Is there a good way to get a random document from a database.
Hmm, that depends upon what you mean by "good", and "random" and if you 
want a repeatable result! I guess I'm asking what exactly are you trying 
to do?

a) Pick a representative, and statistically defensible sample of size X 
from a population of Y documents where each document has an equal 
probability of being selected, and cannot be selected twice.

b) Take a sample of size 1 from a population of Y, X times (so a given 
document could be taken more than once)?

c) Something similar to a or b where you don't know Y in advance?

d) Shuffle the documents?

  I'm currently
> using a view that does:
>
> function(doc) {
>      emit(Math.random(), doc);
> };
>
> But as this isn't deterministic, I'm pretty sure it's wrong.
I don't follow your logic. The view will show all documents in a random 
order. The fact that is is unrepeatable may make it useless for your 
purposes, but it does not make the maths invalid, or the statistics wrong.

Regards

Ian

Re: Random Document

Posted by mi...@free.fr.
Hello,

please see this thread http://couchdb.markmail.org/search/selecting+a+random+subset+of+a+view for good ideas to implement randomness. I opened a Jira ticket to have some random API option, but for now it's still only a wish.

Mickael

----- Mail Original -----
De: "Aaron Miller" <ap...@ninjawhale.com>
À: user@couchdb.apache.org
Envoyé: Jeudi 23 Septembre 2010 09h10:43 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
Objet: Re: Random Document

Use a hash, like sha1 to transform the non-uniform id distribution to a
random distribution deterministially. emit(sha1(doc._id)). (plenty of JS
SHA1 hash libs around)
Then to pull a random value out, query the view with startkey=sha1(random
number gen'd at query time) and limit=1.

On Tue, Sep 21, 2010 at 10:27 AM, Peter Braden <
PeterBraden@peterbraden.co.uk> wrote:

> Hi,
>
> Is there a good way to get a random document from a database. I'm currently
> using a view that does:
>
> function(doc) {
>    emit(Math.random(), doc);
> };
>
> But as this isn't deterministic, I'm pretty sure it's wrong.
>
> I've done a bit of googling, and haven't found anything.
>
> Cheers,
>
> Peter
>
>
>
> --
> Peter Braden
>
> <http://PeterBraden.co.uk/>
>

Re: Random Document

Posted by Aaron Miller <ap...@ninjawhale.com>.
Use a hash, like sha1 to transform the non-uniform id distribution to a
random distribution deterministially. emit(sha1(doc._id)). (plenty of JS
SHA1 hash libs around)
Then to pull a random value out, query the view with startkey=sha1(random
number gen'd at query time) and limit=1.

On Tue, Sep 21, 2010 at 10:27 AM, Peter Braden <
PeterBraden@peterbraden.co.uk> wrote:

> Hi,
>
> Is there a good way to get a random document from a database. I'm currently
> using a view that does:
>
> function(doc) {
>    emit(Math.random(), doc);
> };
>
> But as this isn't deterministic, I'm pretty sure it's wrong.
>
> I've done a bit of googling, and haven't found anything.
>
> Cheers,
>
> Peter
>
>
>
> --
> Peter Braden
>
> <http://PeterBraden.co.uk/>
>