You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Manolo Padron Martinez <ma...@gmail.com> on 2009/03/03 14:27:00 UTC
Obtaining unique values from a view
Hi:
I'm really a newbie, and I have a newbie problem (and maybe a miss
conception of the way to work with couch).
I have a lot of documents with this form (that represents experiments with
any number of conditions, so X and Y could be only X or even X,Y,Z...)
{
"Experiment":"something",
"Conditions":
{
"X":3,
"Y":2
}
}
And I have a view like:
MAP:
function(doc) {
if (doc.Experiment)
for (i in doc.Conditions){
emit(doc.Experiment, i);
}
}
REDUCE:
function(key,values)
{
return values;
}
When I launch the view I get this:
{"rows":[{"key":"Something","value":["X","Y"]},{"key":"Something2","value":["X","Y","X","Y","X","Y","Z","X","Y","X","Y","X","Y","Z"]}]}
I would like to get what are the conditions for every experiment grouped by
experiment without repetitions (I mean something like)
{"rows":[{"key":"Something","value":["X","Y"]},{"key":"Something2","value":["X","Y","Z"]}]}
Anyone could help me?
Regards from Canary Island
Manuel Padron Martinez
Re: [user] Obtaining unique values from a view
Posted by Jan Lehnardt <ja...@apache.org>.
Hi Wout,
can you add this to http://wiki.apache.org/couchdb/View_Snippets?
Cheers
Jan
--
On 10 Apr 2009, at 14:32, Wout Mertens wrote:
> Note to self: I now realize that the easiest way (perhaps the best)
> to do the below is to have your function be
>
> map: function(doc) {
> if (doc.Experiment)
> for (i in doc.Conditions){
> emit([doc.Experiment, i], 1);
> }
> }
>
> reduce: function(key,values,rereduce)
> {
> return sum(values);
> }
>
> This gives you rows like
>
> {"key":["Something","X"],"value":3}
> {"key":["Something","Y"],"value":4}
>
> where value is the count.
>
> You can replace 1 and the sum with null if you're not interested in
> the count.
>
> This is way better than a unique() function because it calculates in
> constant time and memory and you can request subranges of values.
>
> Wout.
>
> On Mar 3, 2009, at 2:38 PM, Wout Mertens wrote:
>
>> I believe this has been covered in this thread:
>> http://markmail.org/thread/lwqfwlscrvilwm34
>>
>> but I think a totally satisfactory answer was not found.
>>
>> Wout.
>>
>> On Mar 3, 2009, at 2:27 PM, Manolo Padron Martinez wrote:
>>
>>> Hi:
>>>
>>> I'm really a newbie, and I have a newbie problem (and maybe a miss
>>> conception of the way to work with couch).
>>> I have a lot of documents with this form (that represents
>>> experiments with
>>> any number of conditions, so X and Y could be only X or even
>>> X,Y,Z...)
>>>
>>> {
>>> "Experiment":"something",
>>> "Conditions":
>>> {
>>> "X":3,
>>> "Y":2
>>> }
>>> }
>>>
>>> And I have a view like:
>>> MAP:
>>>
>>> function(doc) {
>>> if (doc.Experiment)
>>> for (i in doc.Conditions){
>>> emit(doc.Experiment, i);
>>> }
>>> }
>>>
>>> REDUCE:
>>>
>>> function(key,values)
>>> {
>>> return values;
>>> }
>>>
>>> When I launch the view I get this:
>>>
>>> {"rows":[{"key":"Something","value":["X","Y"]},
>>> {"key":"Something2","value":
>>> ["X","Y","X","Y","X","Y","Z","X","Y","X","Y","X","Y","Z"]}]}
>>>
>>>
>>>
>>> I would like to get what are the conditions for every experiment
>>> grouped by
>>> experiment without repetitions (I mean something like)
>>>
>>> {"rows":[{"key":"Something","value":["X","Y"]},
>>> {"key":"Something2","value":["X","Y","Z"]}]}
>>>
>>>
>>> Anyone could help me?
>>>
>>> Regards from Canary Island
>>>
>>> Manuel Padron Martinez
>>
>
>
Re: [user] Obtaining unique values from a view
Posted by Wout Mertens <wo...@gmail.com>.
Note to self: I now realize that the easiest way (perhaps the best) to
do the below is to have your function be
map: function(doc) {
if (doc.Experiment)
for (i in doc.Conditions){
emit([doc.Experiment, i], 1);
}
}
reduce: function(key,values,rereduce)
{
return sum(values);
}
This gives you rows like
{"key":["Something","X"],"value":3}
{"key":["Something","Y"],"value":4}
where value is the count.
You can replace 1 and the sum with null if you're not interested in
the count.
This is way better than a unique() function because it calculates in
constant time and memory and you can request subranges of values.
Wout.
On Mar 3, 2009, at 2:38 PM, Wout Mertens wrote:
> I believe this has been covered in this thread:
> http://markmail.org/thread/lwqfwlscrvilwm34
>
> but I think a totally satisfactory answer was not found.
>
> Wout.
>
> On Mar 3, 2009, at 2:27 PM, Manolo Padron Martinez wrote:
>
>> Hi:
>>
>> I'm really a newbie, and I have a newbie problem (and maybe a miss
>> conception of the way to work with couch).
>> I have a lot of documents with this form (that represents
>> experiments with
>> any number of conditions, so X and Y could be only X or even
>> X,Y,Z...)
>>
>> {
>> "Experiment":"something",
>> "Conditions":
>> {
>> "X":3,
>> "Y":2
>> }
>> }
>>
>> And I have a view like:
>> MAP:
>>
>> function(doc) {
>> if (doc.Experiment)
>> for (i in doc.Conditions){
>> emit(doc.Experiment, i);
>> }
>> }
>>
>> REDUCE:
>>
>> function(key,values)
>> {
>> return values;
>> }
>>
>> When I launch the view I get this:
>>
>> {"rows":[{"key":"Something","value":["X","Y"]},
>> {"key":"Something2","value":
>> ["X","Y","X","Y","X","Y","Z","X","Y","X","Y","X","Y","Z"]}]}
>>
>>
>>
>> I would like to get what are the conditions for every experiment
>> grouped by
>> experiment without repetitions (I mean something like)
>>
>> {"rows":[{"key":"Something","value":["X","Y"]},
>> {"key":"Something2","value":["X","Y","Z"]}]}
>>
>>
>> Anyone could help me?
>>
>> Regards from Canary Island
>>
>> Manuel Padron Martinez
>
Re: Obtaining unique values from a view
Posted by Wout Mertens <wm...@cisco.com>.
On Mar 5, 2009, at 4:59 PM, Wout Mertens wrote:
> Actually, I just did some tests around this, and it turns out that
> if you always query with group=true, CouchDB never runs the final
> rereduce!
So based on this, I created this reduce function for reducing a maps
emit(key,value) to unique values per key:
function(k,v,r) {
function unique_inplace(an_array) {
var first = 0;
var last = an_array.length;
// Find first non-unique pair
for(var firstb; (firstb = first) < last && ++first < last; ) {
if(an_array[firstb] == an_array[first]) {
// Start copying, skipping uniques
for(; ++first < last; ) {
if (!(an_array[firstb] == an_array[first])) {
an_array[++firstb] = an_array[first];
}
}
// firstb is at the last element of the new array
++firstb;
an_array.length = firstb;
return;
}
}
}
if(r) {
var arr=[];
for (var i=0; i<v.length; i++) {
arr=arr.concat(v[i]);
}
arr=arr.sort();
unique_inplace(arr);
return(arr);
} else {
var arr=v.sort();
unique_inplace(arr);
return(arr);
}
}
It's not optimal yet, ideally the if(r) section should use an n-way
mergesort+uniq since it receives sorted arrays as values. But this
also works :).
Thoughts?
Wout.
Re: [user] Obtaining unique values from a view
Posted by Wout Mertens <wm...@cisco.com>.
On Mar 4, 2009, at 3:28 AM, Chris Anderson wrote:
> On Tue, Mar 3, 2009 at 1:32 PM, Wout Mertens <wm...@cisco.com>
> wrote:
>> Would the problem be alleviated if you could specify for views that
>> couch
>> should not reduce past the group level? In other words, only
>> calculate
>> what's needed for views with group=true?
>>
>
> Sort of. Essentially this would require an entirely different
> map/reduce implementation. It would probably only provide reductions
> at the group level (like Hadoop reduce). CouchDB is open to /
> interested in alternate view engines, and something like this could
> probably be created in a not-to-overwhelming amount of Erlang, on top
> of CouchDB's btree storage engine. Patches welcome! (Also, there are
> some patches floating around - once 0.9.0 is off our plate we'll
> probably have more spare cycles available for evaluating/consolidating
> them.)
Actually, I just did some tests around this, and it turns out that if
you always query with group=true, CouchDB never runs the final reduce!
I tested it by making a temporary view in Futon:
map: function(doc) { emit(doc._rev%10,doc._id); }
reduce: function(k,v,r) { if(r) { log(["rereduce",v]); } else
{ log(["reduce",v]); } return(v); }
Just interacting with the view in Futon doesn't run rereduce on all
view keys. Once you access the view directly, without the group=true
parameter, CouchDB calculates the (re)reduce. Actually I didn't
realize, but for small databases, it never calls rereduce. That makes
sense now.
So as long as you promise never to run a particular "wide view"
without the group=true parameter, and the "wideness" of your view
results is manageable, it looks like you should be ok.
Of course, some attacker could DoS your server by calling the view
without group=true :-/
Let's say I'm 70% certain of the above being true. I think I'm still
missing some subtleties in map/reduce. Any opinions?
Anyway, CouchDB rocks :-)
Wout.
Re: [user] Obtaining unique values from a view
Posted by Chris Anderson <jc...@apache.org>.
On Tue, Mar 3, 2009 at 1:32 PM, Wout Mertens <wm...@cisco.com> wrote:
> Would the problem be alleviated if you could specify for views that couch
> should not reduce past the group level? In other words, only calculate
> what's needed for views with group=true?
>
Sort of. Essentially this would require an entirely different
map/reduce implementation. It would probably only provide reductions
at the group level (like Hadoop reduce). CouchDB is open to /
interested in alternate view engines, and something like this could
probably be created in a not-to-overwhelming amount of Erlang, on top
of CouchDB's btree storage engine. Patches welcome! (Also, there are
some patches floating around - once 0.9.0 is off our plate we'll
probably have more spare cycles available for evaluating/consolidating
them.)
Chris
--
Chris Anderson
http://jchris.mfdz.com
Re: [user] Obtaining unique values from a view
Posted by Wout Mertens <wm...@cisco.com>.
On Mar 3, 2009, at 6:00 PM, Chris Anderson wrote:
> On Tue, Mar 3, 2009 at 6:52 AM, Wout Mertens <wm...@cisco.com>
> wrote:
>>>> Since this question has been posed more than once, maybe main.js
>>>> should
>>>> have a uniq() function as well?
>
> Dragons dragons...
>
> This is one pattern Couch does not support (and it is not unique in
> this way).
[...]
> It's just taking a tall list and making it into a wide list. The
> disadvantage of a wide list is that you have to have the whole thing
> in memory at once. This is where Couch breaks down, because the
> spidermonkey process eventually has to have all the unique rows of the
> map in memory all at once.
Ok I understand, if your value grows for each reduce, you end up with
a huge value at the last reduce.
However, suppose you keep tags about documents in separate documents,
you would need uniq() to find out which tags a certain document has,
no? On the last re-reduce, would the total reduce value size not be at
most a few times the total amount of tags? So still manageable?
Would the problem be alleviated if you could specify for views that
couch should not reduce past the group level? In other words, only
calculate what's needed for views with group=true?
So I suppose that for now, this behavior should not be encouraged by
providing a uniq() function :-)
Wout.
Re: [user] Obtaining unique values from a view
Posted by Chris Anderson <jc...@apache.org>.
On Tue, Mar 3, 2009 at 6:52 AM, Wout Mertens <wm...@cisco.com> wrote:
>>> Since this question has been posed more than once, maybe main.js should
>>> have a uniq() function as well?
Dragons dragons...
This is one pattern Couch does not support (and it is not unique in this way).
When I first started working with CouchDB, I really wanted to take maps like
a, 1
a, 5
a, 8
b, 2
b, 6
b, 6
b, 7
and use reduce to turn them into:
a: 1, 5, 8
b: 2, 6, 7
Don't do this!
It's just taking a tall list and making it into a wide list. The
disadvantage of a wide list is that you have to have the whole thing
in memory at once. This is where Couch breaks down, because the
spidermonkey process eventually has to have all the unique rows of the
map in memory all at once.
It's fine if your map has 10-ish unique keys, but even at 100-ish
unique keys, reduces will start to time out. Remember that the above
lists I showed, will turn into something like this when the final
reduction is calculated:
1,2,5,6,7,8
Which as you can see could become a very large amount of data on
real-life datasets (which probably would have thousands of values at
full-reduce).
You can use the log(data) function in your map and reduce functions
(and watch couch.log) to see how the tall list just gets turned into
the wide list, with functions like this.
Chris
--
Chris Anderson
http://jchris.mfdz.com
Re: [user] Obtaining unique values from a view
Posted by Wout Mertens <wm...@cisco.com>.
On Mar 3, 2009, at 3:29 PM, Jan Lehnardt wrote:
> On 3 Mar 2009, at 15:08, Wout Mertens wrote:
>
>> On Mar 3, 2009, at 2:58 PM, Manolo Padron Martinez wrote:
>>
>>>> I believe this has been covered in this thread:
>>>> http://markmail.org/thread/lwqfwlscrvilwm34
>>>>
>>>> but I think a totally satisfactory answer was not found.
>>>
>>> Thanks that works for me.
>>
>> Actually now I'm curious where the sum() function comes from. I
>> can't find it in any Javascript references. Is it something Couch
>> provides?
>>
>> Answer: grepping through the source showed me it's in main.js.
>>
>> Since this question has been posed more than once, maybe main.js
>> should have a uniq() function as well?
>
> Patches welcome :) Feel free to submit uniq() to JIRA:
>
> https://issues.apache.org/jira/browse/COUCHDB
Actually it's slightly more difficult than I thought since the most
efficient way to implement it is by guaranteeing that input is sorted.
Here's a nice implementation:
http://www.code-shop.com/2007/5/14/javascript-uniq
so maybe the uniq() function should take the re-reduce parameter and
if that's not set, sort the input?
Would that be acceptible for a function in main.js?
Wout.
Re: [user] Obtaining unique values from a view
Posted by Jan Lehnardt <ja...@apache.org>.
On 3 Mar 2009, at 15:08, Wout Mertens wrote:
> On Mar 3, 2009, at 2:58 PM, Manolo Padron Martinez wrote:
>
>>> I believe this has been covered in this thread:
>>> http://markmail.org/thread/lwqfwlscrvilwm34
>>>
>>> but I think a totally satisfactory answer was not found.
>>
>> Thanks that works for me.
>
> Actually now I'm curious where the sum() function comes from. I
> can't find it in any Javascript references. Is it something Couch
> provides?
>
> Answer: grepping through the source showed me it's in main.js.
>
> Since this question has been posed more than once, maybe main.js
> should have a uniq() function as well?
Patches welcome :) Feel free to submit uniq() to JIRA:
https://issues.apache.org/jira/browse/COUCHDB
Cheers
Jan
--
Re: [user] Obtaining unique values from a view
Posted by Wout Mertens <wm...@cisco.com>.
On Mar 3, 2009, at 2:58 PM, Manolo Padron Martinez wrote:
>> I believe this has been covered in this thread:
>> http://markmail.org/thread/lwqfwlscrvilwm34
>>
>> but I think a totally satisfactory answer was not found.
>
> Thanks that works for me.
Actually now I'm curious where the sum() function comes from. I can't
find it in any Javascript references. Is it something Couch provides?
Answer: grepping through the source showed me it's in main.js.
Since this question has been posed more than once, maybe main.js
should have a uniq() function as well?
Wout.
Re: [user] Obtaining unique values from a view
Posted by Manolo Padron Martinez <ma...@gmail.com>.
> I believe this has been covered in this thread:
> http://markmail.org/thread/lwqfwlscrvilwm34
>
> but I think a totally satisfactory answer was not found.
>
Thanks that works for me.
>
> Wout.
>
>
> On Mar 3, 2009, at 2:27 PM, Manolo Padron Martinez wrote:
>
> Hi:
>>
>> I'm really a newbie, and I have a newbie problem (and maybe a miss
>> conception of the way to work with couch).
>> I have a lot of documents with this form (that represents experiments with
>> any number of conditions, so X and Y could be only X or even X,Y,Z...)
>>
>> {
>> "Experiment":"something",
>> "Conditions":
>> {
>> "X":3,
>> "Y":2
>> }
>> }
>>
>> And I have a view like:
>> MAP:
>>
>> function(doc) {
>> if (doc.Experiment)
>> for (i in doc.Conditions){
>> emit(doc.Experiment, i);
>> }
>> }
>>
>> REDUCE:
>>
>> function(key,values)
>> {
>> return values;
>> }
>>
>> When I launch the view I get this:
>>
>>
>> {"rows":[{"key":"Something","value":["X","Y"]},{"key":"Something2","value":["X","Y","X","Y","X","Y","Z","X","Y","X","Y","X","Y","Z"]}]}
>>
>>
>>
>> I would like to get what are the conditions for every experiment grouped
>> by
>> experiment without repetitions (I mean something like)
>>
>>
>> {"rows":[{"key":"Something","value":["X","Y"]},{"key":"Something2","value":["X","Y","Z"]}]}
>>
>>
>> Anyone could help me?
>>
>> Regards from Canary Island
>>
>> Manuel Padron Martinez
>>
>
>
Re: [user] Obtaining unique values from a view
Posted by Wout Mertens <wm...@cisco.com>.
I believe this has been covered in this thread:
http://markmail.org/thread/lwqfwlscrvilwm34
but I think a totally satisfactory answer was not found.
Wout.
On Mar 3, 2009, at 2:27 PM, Manolo Padron Martinez wrote:
> Hi:
>
> I'm really a newbie, and I have a newbie problem (and maybe a miss
> conception of the way to work with couch).
> I have a lot of documents with this form (that represents
> experiments with
> any number of conditions, so X and Y could be only X or even X,Y,Z...)
>
> {
> "Experiment":"something",
> "Conditions":
> {
> "X":3,
> "Y":2
> }
> }
>
> And I have a view like:
> MAP:
>
> function(doc) {
> if (doc.Experiment)
> for (i in doc.Conditions){
> emit(doc.Experiment, i);
> }
> }
>
> REDUCE:
>
> function(key,values)
> {
> return values;
> }
>
> When I launch the view I get this:
>
> {"rows":[{"key":"Something","value":["X","Y"]},
> {"key":"Something2","value":
> ["X","Y","X","Y","X","Y","Z","X","Y","X","Y","X","Y","Z"]}]}
>
>
>
> I would like to get what are the conditions for every experiment
> grouped by
> experiment without repetitions (I mean something like)
>
> {"rows":[{"key":"Something","value":["X","Y"]},
> {"key":"Something2","value":["X","Y","Z"]}]}
>
>
> Anyone could help me?
>
> Regards from Canary Island
>
> Manuel Padron Martinez