You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Manolo Padron Martinez <ma...@gmail.com> on 2009/03/03 14:58:32 UTC

Re: [user] Obtaining unique values from a view

> I believe this has been covered in this thread:
> http://markmail.org/thread/lwqfwlscrvilwm34
>
> but I think a totally satisfactory answer was not found.
>

Thanks that works for me.


>
> Wout.
>
>
> On Mar 3, 2009, at 2:27 PM, Manolo Padron Martinez wrote:
>
>  Hi:
>>
>> I'm really a newbie, and I have a newbie problem (and maybe a miss
>> conception of the way to work with couch).
>> I have a lot of documents with this form (that represents experiments with
>> any number of conditions, so X and Y could be only X or even X,Y,Z...)
>>
>> {
>>  "Experiment":"something",
>>  "Conditions":
>>      {
>>        "X":3,
>>        "Y":2
>>      }
>> }
>>
>> And I have a view like:
>> MAP:
>>
>> function(doc) {
>>  if (doc.Experiment)
>>  for (i in doc.Conditions){
>>      emit(doc.Experiment, i);
>>  }
>> }
>>
>> REDUCE:
>>
>> function(key,values)
>> {
>> return values;
>> }
>>
>> When I launch the view I get this:
>>
>>
>> {"rows":[{"key":"Something","value":["X","Y"]},{"key":"Something2","value":["X","Y","X","Y","X","Y","Z","X","Y","X","Y","X","Y","Z"]}]}
>>
>>
>>
>> I would like to get what are the conditions for every experiment grouped
>> by
>> experiment without repetitions (I mean something like)
>>
>>
>> {"rows":[{"key":"Something","value":["X","Y"]},{"key":"Something2","value":["X","Y","Z"]}]}
>>
>>
>> Anyone could help me?
>>
>> Regards from Canary Island
>>
>> Manuel Padron Martinez
>>
>
>

Re: Obtaining unique values from a view

Posted by Wout Mertens <wm...@cisco.com>.
On Mar 5, 2009, at 4:59 PM, Wout Mertens wrote:

> Actually, I just did some tests around this, and it turns out that  
> if you always query with group=true, CouchDB never runs the final  
> rereduce!

So based on this, I created this reduce function for reducing a maps  
emit(key,value) to unique values per key:

function(k,v,r) {
	function unique_inplace(an_array) {
		var first = 0;
		var last = an_array.length;
		// Find first non-unique pair
		for(var firstb; (firstb = first) < last && ++first < last; ) {
			if(an_array[firstb] == an_array[first]) {
				// Start copying, skipping uniques
				for(; ++first < last; ) {
					if (!(an_array[firstb] == an_array[first])) {
						an_array[++firstb] = an_array[first];
					}
				}
				// firstb is at the last element of the new array
				++firstb;
				an_array.length = firstb;
				return;
			}
		}
	}

	if(r) {
		var arr=[];
		for (var i=0; i<v.length; i++) {
			arr=arr.concat(v[i]);
		}
		arr=arr.sort();
		unique_inplace(arr);
		return(arr);
	} else {
		var arr=v.sort();
		unique_inplace(arr);
		return(arr);
	}
}

It's not optimal yet, ideally the if(r) section should use an n-way  
mergesort+uniq since it receives sorted arrays as values. But this  
also works :).

Thoughts?

Wout.


Re: [user] Obtaining unique values from a view

Posted by Wout Mertens <wm...@cisco.com>.
On Mar 4, 2009, at 3:28 AM, Chris Anderson wrote:

> On Tue, Mar 3, 2009 at 1:32 PM, Wout Mertens <wm...@cisco.com>  
> wrote:
>> Would the problem be alleviated if you could specify for views that  
>> couch
>> should not reduce past the group level? In other words, only  
>> calculate
>> what's needed for views with group=true?
>>
>
> Sort of. Essentially this would require an entirely different
> map/reduce implementation. It would probably only provide reductions
> at the group level (like Hadoop reduce). CouchDB is open to /
> interested in alternate view engines, and something like this could
> probably be created in a not-to-overwhelming amount of Erlang, on top
> of CouchDB's btree storage engine. Patches welcome! (Also, there are
> some patches floating around - once 0.9.0 is off our plate we'll
> probably have more spare cycles available for evaluating/consolidating
> them.)

Actually, I just did some tests around this, and it turns out that if  
you always query with group=true, CouchDB never runs the final reduce!

I tested it by making a temporary view in Futon:

map:    function(doc) { emit(doc._rev%10,doc._id); }
reduce: function(k,v,r) { if(r) { log(["rereduce",v]); } else  
{ log(["reduce",v]); } return(v); }

Just interacting with the view in Futon doesn't run rereduce on all  
view keys. Once you access the view directly, without the group=true  
parameter, CouchDB calculates the (re)reduce. Actually I didn't  
realize, but for small databases, it never calls rereduce. That makes  
sense now.

So as long as you promise never to run a particular "wide view"  
without the group=true parameter, and the "wideness" of your view  
results is manageable, it looks like you should be ok.

Of course, some attacker could DoS your server by calling the view  
without group=true :-/

Let's say I'm 70% certain of the above being true. I think I'm still  
missing some subtleties in map/reduce. Any opinions?

Anyway, CouchDB rocks :-)

Wout.

Re: [user] Obtaining unique values from a view

Posted by Chris Anderson <jc...@apache.org>.
On Tue, Mar 3, 2009 at 1:32 PM, Wout Mertens <wm...@cisco.com> wrote:
> Would the problem be alleviated if you could specify for views that couch
> should not reduce past the group level? In other words, only calculate
> what's needed for views with group=true?
>

Sort of. Essentially this would require an entirely different
map/reduce implementation. It would probably only provide reductions
at the group level (like Hadoop reduce). CouchDB is open to /
interested in alternate view engines, and something like this could
probably be created in a not-to-overwhelming amount of Erlang, on top
of CouchDB's btree storage engine. Patches welcome! (Also, there are
some patches floating around - once 0.9.0 is off our plate we'll
probably have more spare cycles available for evaluating/consolidating
them.)

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: [user] Obtaining unique values from a view

Posted by Wout Mertens <wm...@cisco.com>.
On Mar 3, 2009, at 6:00 PM, Chris Anderson wrote:

> On Tue, Mar 3, 2009 at 6:52 AM, Wout Mertens <wm...@cisco.com>  
> wrote:
>>>> Since this question has been posed more than once, maybe main.js  
>>>> should
>>>> have a uniq() function as well?
>
> Dragons dragons...
>
> This is one pattern Couch does not support (and it is not unique in  
> this way).
[...]
> It's just taking a tall list and making it into a wide list. The
> disadvantage of a wide list is that you have to have the whole thing
> in memory at once. This is where Couch breaks down, because the
> spidermonkey process eventually has to have all the unique rows of the
> map in memory all at once.

Ok I understand, if your value grows for each reduce, you end up with  
a huge value at the last reduce.

However, suppose you keep tags about documents in separate documents,  
you would need uniq() to find out which tags a certain document has,  
no? On the last re-reduce, would the total reduce value size not be at  
most a few times the total amount of tags? So still manageable?

Would the problem be alleviated if you could specify for views that  
couch should not reduce past the group level? In other words, only  
calculate what's needed for views with group=true?


So I suppose that for now, this behavior should not be encouraged by  
providing a uniq() function :-)

Wout.

Re: [user] Obtaining unique values from a view

Posted by Chris Anderson <jc...@apache.org>.
On Tue, Mar 3, 2009 at 6:52 AM, Wout Mertens <wm...@cisco.com> wrote:
>>> Since this question has been posed more than once, maybe main.js should
>>> have a uniq() function as well?

Dragons dragons...

This is one pattern Couch does not support (and it is not unique in this way).

When I first started working with CouchDB, I really wanted to take maps like

a, 1
a, 5
a, 8
b, 2
b, 6
b, 6
b, 7

and use reduce to turn them into:

a: 1, 5, 8
b: 2, 6, 7

Don't do this!

It's just taking a tall list and making it into a wide list. The
disadvantage of a wide list is that you have to have the whole thing
in memory at once. This is where Couch breaks down, because the
spidermonkey process eventually has to have all the unique rows of the
map in memory all at once.

It's fine if your map has 10-ish unique keys, but even at 100-ish
unique keys, reduces will start to time out. Remember that the above
lists I showed, will turn into something like this when the final
reduction is calculated:

1,2,5,6,7,8

Which as you can see could become a very large amount of data on
real-life datasets (which probably would have thousands of values at
full-reduce).

You can use the log(data) function in your map and reduce functions
(and watch couch.log) to see how the tall list just gets turned into
the wide list, with functions like this.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: [user] Obtaining unique values from a view

Posted by Wout Mertens <wm...@cisco.com>.
On Mar 3, 2009, at 3:29 PM, Jan Lehnardt wrote:

> On 3 Mar 2009, at 15:08, Wout Mertens wrote:
>
>> On Mar 3, 2009, at 2:58 PM, Manolo Padron Martinez wrote:
>>
>>>> I believe this has been covered in this thread:
>>>> http://markmail.org/thread/lwqfwlscrvilwm34
>>>>
>>>> but I think a totally satisfactory answer was not found.
>>>
>>> Thanks that works for me.
>>
>> Actually now I'm curious where the sum() function comes from. I  
>> can't find it in any Javascript references. Is it something Couch  
>> provides?
>>
>> Answer: grepping through the source showed me it's in main.js.
>>
>> Since this question has been posed more than once, maybe main.js  
>> should have a uniq() function as well?
>
> Patches welcome :) Feel free to submit uniq() to JIRA:
>
> https://issues.apache.org/jira/browse/COUCHDB

Actually it's slightly more difficult than I thought since the most  
efficient way to implement it is by guaranteeing that input is sorted.

Here's a nice implementation:

http://www.code-shop.com/2007/5/14/javascript-uniq

so maybe the uniq() function should take the re-reduce parameter and  
if that's not set, sort the input?

Would that be acceptible for a function in main.js?

Wout.

Re: [user] Obtaining unique values from a view

Posted by Jan Lehnardt <ja...@apache.org>.
On 3 Mar 2009, at 15:08, Wout Mertens wrote:

> On Mar 3, 2009, at 2:58 PM, Manolo Padron Martinez wrote:
>
>>> I believe this has been covered in this thread:
>>> http://markmail.org/thread/lwqfwlscrvilwm34
>>>
>>> but I think a totally satisfactory answer was not found.
>>
>> Thanks that works for me.
>
> Actually now I'm curious where the sum() function comes from. I  
> can't find it in any Javascript references. Is it something Couch  
> provides?
>
> Answer: grepping through the source showed me it's in main.js.
>
> Since this question has been posed more than once, maybe main.js  
> should have a uniq() function as well?

Patches welcome :) Feel free to submit uniq() to JIRA:

https://issues.apache.org/jira/browse/COUCHDB

Cheers
Jan
--


Re: [user] Obtaining unique values from a view

Posted by Wout Mertens <wm...@cisco.com>.
On Mar 3, 2009, at 2:58 PM, Manolo Padron Martinez wrote:

>> I believe this has been covered in this thread:
>> http://markmail.org/thread/lwqfwlscrvilwm34
>>
>> but I think a totally satisfactory answer was not found.
>
> Thanks that works for me.

Actually now I'm curious where the sum() function comes from. I can't  
find it in any Javascript references. Is it something Couch provides?

Answer: grepping through the source showed me it's in main.js.

Since this question has been posed more than once, maybe main.js  
should have a uniq() function as well?

Wout.