You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Julian Stahnke <ma...@julianstahnke.com> on 2010/03/12 14:10:54 UTC

View output slow

Hello!

I have a problem with a view being slow, even though it’s indexed and cached and so on. I have database of books (–120,000 documents) and a map/reduce function that counts how many books there are per author. I’m then calling the view with ?group=true to get the list. I’m neither emitting nor outputting any actual documents, only the counts. This results in an output of about 78,000 key/value pairs that look like the following: {"key":"Albert Kapr","value":3}.

Now, even when the view is indexed and cached, it still takes 60 seconds to receive the output, using PHP’s cURL functions, the browser, whatever I’ve tried. Getting the same output served from a static file takes only a fraction of a second.

When I set limit=100, it’s basically instantaneous. I want to sort the output by value though, so I can’t really limit it or use ranges. Trying it with about 7,000 books, the request takes about 5 seconds, so it seems to be linear to the number of lines being output?

I’m using CouchDB 0.10.1 (the one that’s in homebrew) on a 2006 MacBook Pro.

Am I doing anything wrong, or should this really take so long? I wasn’t able to find any information about this—only about indexing being slow, but that doesn’t seem to be my problem.

Maybe I should also mention that I’m an interaction design student who used to be a front-end dev, but not a ‘real’ programmer.

Thanks for any help!

Best,
Julian


For reference, the map function:

function (doc)
{
    if (doc.author) {
		for (i = 0; i < doc.author.length; i++) {
			emit(doc.author[i], 1);
		}
    } else {
        emit(null, 1);        
    }
}

The reduce function: 

function (keys, values, rereduce)
{
    return sum(values);
}

Some sample output:

{"rows":[
{"key":null,"value":1542},
{"key":"... Hans Arp ... /Konzept: Hans Christian Tavel .../","value":1},
---more rows---
{"key":"Zwi Erich Kurzweil","value":1}
]}

Re: View output slow

Posted by Adam Kocoloski <ko...@apache.org>.
On Mar 12, 2010, at 12:56 PM, Julian Stahnke wrote:

> Am 12.03.2010 um 17:24 schrieb J Chris Anderson:
> 
>> 
>> On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:
>> 
>>> Hello!
>>> 
>>> I have a problem with a view being slow, even though it’s indexed and cached and so on. I have database of books (–120,000 documents) and a map/reduce function that counts how many books there are per author. I’m then calling the view with ?group=true to get the list. I’m neither emitting nor outputting any actual documents, only the counts. This results in an output of about 78,000 key/value pairs that look like the following: {"key":"Albert Kapr","value":3}.
>>> 
>>> Now, even when the view is indexed and cached, it still takes 60 seconds to receive the output, using PHP’s cURL functions, the browser, whatever I’ve tried. Getting the same output served from a static file takes only a fraction of a second.
>>> 
>>> When I set limit=100, it’s basically instantaneous. I want to sort the output by value though, so I can’t really limit it or use ranges. Trying it with about 7,000 books, the request takes about 5 seconds, so it seems to be linear to the number of lines being output?
>> 
>> For each line of output in the group reduce view, CouchDB must calculate 1 final reduction (even when the intermediate reductions are already cached in the btree). This is because the btree nodes might not have the exact same boundaries as your group keys.
>> 
>> There is a remedy. You can replace your simple summing reduce with the text "_sum" (without quotes). This triggers the same function, but implemented in Erlang by CouchDB. Most of your slowness is probably due to IO between CouchDB and serverside JavaScript. Using the _sum function will help with this.
>> 
>> There will still be a calculation per group reduce row, but the cost is much lower.
>> 
>> Let us know how much faster this is!
>> 
>> Chris
> 
> Oh wow, thanks! It’s now taking about 4 seconds instead of a minute!
> 
> Is this function documented somewhere? I didn’t come across it anywhere, so I added it to the Performance page in the wiki: http://wiki.apache.org/couchdb/Performance I hope that is okay. I also found a commit message[1] in which was said that one could implement more of these functions, I didn’t quite get how though. This seems like it could be very helpful in some case. Maybe it should be documented properly somewhere by somebody who actually knows about it?
> 
> Thanks a lot,
> Julian
> 
> [1] http://svn.apache.org/viewvc?view=revision&revision=774101

I added a _stats reduction to trunk a few days ago which returns an object with min, max, count, sum, and sum-of-squares fields.  Those are the primitives you need to calculate the aggregates jchris suggested in that commit message.  In my testing the internal _stats was about 12x faster than the version in Futon's reduce.js test.

If you're looking to hack on more of these, open up couch_query_servers.erl and search for builtin.  Best,

Adam


Re: View output slow

Posted by Jan Lehnardt <ja...@apache.org>.
On 12 Mar 2010, at 11:56, Julian Stahnke wrote:

> Am 12.03.2010 um 17:24 schrieb J Chris Anderson:
> 
>> 
>> On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:
>> 
>>> Hello!
>>> 
>>> I have a problem with a view being slow, even though it’s indexed and cached and so on. I have database of books (–120,000 documents) and a map/reduce function that counts how many books there are per author. I’m then calling the view with ?group=true to get the list. I’m neither emitting nor outputting any actual documents, only the counts. This results in an output of about 78,000 key/value pairs that look like the following: {"key":"Albert Kapr","value":3}.
>>> 
>>> Now, even when the view is indexed and cached, it still takes 60 seconds to receive the output, using PHP’s cURL functions, the browser, whatever I’ve tried. Getting the same output served from a static file takes only a fraction of a second.
>>> 
>>> When I set limit=100, it’s basically instantaneous. I want to sort the output by value though, so I can’t really limit it or use ranges. Trying it with about 7,000 books, the request takes about 5 seconds, so it seems to be linear to the number of lines being output?
>> 
>> For each line of output in the group reduce view, CouchDB must calculate 1 final reduction (even when the intermediate reductions are already cached in the btree). This is because the btree nodes might not have the exact same boundaries as your group keys.
>> 
>> There is a remedy. You can replace your simple summing reduce with the text "_sum" (without quotes). This triggers the same function, but implemented in Erlang by CouchDB. Most of your slowness is probably due to IO between CouchDB and serverside JavaScript. Using the _sum function will help with this.
>> 
>> There will still be a calculation per group reduce row, but the cost is much lower.
>> 
>> Let us know how much faster this is!
>> 
>> Chris
> 
> Oh wow, thanks! It’s now taking about 4 seconds instead of a minute!
> 
> Is this function documented somewhere? I didn’t come across it anywhere, so I added it to the Performance page in the wiki: http://wiki.apache.org/couchdb/Performance I hope that is okay.

Thanks for adding it :)

Cheers
Jan
--

Re: View output slow

Posted by Julian Stahnke <ma...@julianstahnke.com>.
Am 12.03.2010 um 17:24 schrieb J Chris Anderson:

> 
> On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:
> 
>> Hello!
>> 
>> I have a problem with a view being slow, even though it’s indexed and cached and so on. I have database of books (–120,000 documents) and a map/reduce function that counts how many books there are per author. I’m then calling the view with ?group=true to get the list. I’m neither emitting nor outputting any actual documents, only the counts. This results in an output of about 78,000 key/value pairs that look like the following: {"key":"Albert Kapr","value":3}.
>> 
>> Now, even when the view is indexed and cached, it still takes 60 seconds to receive the output, using PHP’s cURL functions, the browser, whatever I’ve tried. Getting the same output served from a static file takes only a fraction of a second.
>> 
>> When I set limit=100, it’s basically instantaneous. I want to sort the output by value though, so I can’t really limit it or use ranges. Trying it with about 7,000 books, the request takes about 5 seconds, so it seems to be linear to the number of lines being output?
> 
> For each line of output in the group reduce view, CouchDB must calculate 1 final reduction (even when the intermediate reductions are already cached in the btree). This is because the btree nodes might not have the exact same boundaries as your group keys.
> 
> There is a remedy. You can replace your simple summing reduce with the text "_sum" (without quotes). This triggers the same function, but implemented in Erlang by CouchDB. Most of your slowness is probably due to IO between CouchDB and serverside JavaScript. Using the _sum function will help with this.
> 
> There will still be a calculation per group reduce row, but the cost is much lower.
> 
> Let us know how much faster this is!
> 
> Chris

Oh wow, thanks! It’s now taking about 4 seconds instead of a minute!

Is this function documented somewhere? I didn’t come across it anywhere, so I added it to the Performance page in the wiki: http://wiki.apache.org/couchdb/Performance I hope that is okay. I also found a commit message[1] in which was said that one could implement more of these functions, I didn’t quite get how though. This seems like it could be very helpful in some case. Maybe it should be documented properly somewhere by somebody who actually knows about it?

Thanks a lot,
Julian

[1] http://svn.apache.org/viewvc?view=revision&revision=774101

> 
>> 
>> I’m using CouchDB 0.10.1 (the one that’s in homebrew) on a 2006 MacBook Pro.
>> 
>> Am I doing anything wrong, or should this really take so long? I wasn’t able to find any information about this—only about indexing being slow, but that doesn’t seem to be my problem.
>> 
>> Maybe I should also mention that I’m an interaction design student who used to be a front-end dev, but not a ‘real’ programmer.
>> 
>> Thanks for any help!
>> 
>> Best,
>> Julian
>> 
>> 
>> For reference, the map function:
>> 
>> function (doc)
>> {
>>   if (doc.author) {
>> 		for (i = 0; i < doc.author.length; i++) {
>> 			emit(doc.author[i], 1);
>> 		}
>>   } else {
>>       emit(null, 1);        
>>   }
>> }
>> 
>> The reduce function: 
>> 
>> function (keys, values, rereduce)
>> {
>>   return sum(values);
>> }
>> 
>> Some sample output:
>> 
>> {"rows":[
>> {"key":null,"value":1542},
>> {"key":"... Hans Arp ... /Konzept: Hans Christian Tavel .../","value":1},
>> ---more rows---
>> {"key":"Zwi Erich Kurzweil","value":1}
>> ]}
> 


Re: View output slow

Posted by J Chris Anderson <jc...@gmail.com>.
On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:

> Hello!
> 
> I have a problem with a view being slow, even though it’s indexed and cached and so on. I have database of books (–120,000 documents) and a map/reduce function that counts how many books there are per author. I’m then calling the view with ?group=true to get the list. I’m neither emitting nor outputting any actual documents, only the counts. This results in an output of about 78,000 key/value pairs that look like the following: {"key":"Albert Kapr","value":3}.
> 
> Now, even when the view is indexed and cached, it still takes 60 seconds to receive the output, using PHP’s cURL functions, the browser, whatever I’ve tried. Getting the same output served from a static file takes only a fraction of a second.
> 
> When I set limit=100, it’s basically instantaneous. I want to sort the output by value though, so I can’t really limit it or use ranges. Trying it with about 7,000 books, the request takes about 5 seconds, so it seems to be linear to the number of lines being output?

For each line of output in the group reduce view, CouchDB must calculate 1 final reduction (even when the intermediate reductions are already cached in the btree). This is because the btree nodes might not have the exact same boundaries as your group keys.

There is a remedy. You can replace your simple summing reduce with the text "_sum" (without quotes). This triggers the same function, but implemented in Erlang by CouchDB. Most of your slowness is probably due to IO between CouchDB and serverside JavaScript. Using the _sum function will help with this.

There will still be a calculation per group reduce row, but the cost is much lower.

Let us know how much faster this is!

Chris


> 
> I’m using CouchDB 0.10.1 (the one that’s in homebrew) on a 2006 MacBook Pro.
> 
> Am I doing anything wrong, or should this really take so long? I wasn’t able to find any information about this—only about indexing being slow, but that doesn’t seem to be my problem.
> 
> Maybe I should also mention that I’m an interaction design student who used to be a front-end dev, but not a ‘real’ programmer.
> 
> Thanks for any help!
> 
> Best,
> Julian
> 
> 
> For reference, the map function:
> 
> function (doc)
> {
>    if (doc.author) {
> 		for (i = 0; i < doc.author.length; i++) {
> 			emit(doc.author[i], 1);
> 		}
>    } else {
>        emit(null, 1);        
>    }
> }
> 
> The reduce function: 
> 
> function (keys, values, rereduce)
> {
>    return sum(values);
> }
> 
> Some sample output:
> 
> {"rows":[
> {"key":null,"value":1542},
> {"key":"... Hans Arp ... /Konzept: Hans Christian Tavel .../","value":1},
> ---more rows---
> {"key":"Zwi Erich Kurzweil","value":1}
> ]}


Re: View output slow

Posted by Julian Stahnke <ma...@julianstahnke.com>.
Am 12.03.2010 um 20:12 schrieb Bruno Ronchetti:

> Hi Julian,
> 
> I also notice that your map function emits once for every *letter* in the author's name.
> 
> For instance having just one entry in the db:
> 
> {
>   "_id": "01943fed255df913c33cab5c27b3bc7e",
>   "_rev": "1-8288a511edad170e8e806281d0188033",
>   "author": "Karl Marx",
>   "title": "Das Kapital"
> }
> 
> 
> yields following results:
> 
> {"rows":[
> {"key":null,"value":9}
> ]}
> 
> and, using the group=true clause for increased clarity, yields:
> 
> {"rows":[
> {"key":" ","value":1},
> {"key":"a","value":2},
> {"key":"K","value":1},
> {"key":"l","value":1},
> {"key":"M","value":1},
> {"key":"r","value":2},
> {"key":"x","value":1}
> ]}
> This could be another factor contributing to the slowness of your view.
> 
> Cheers. Bruno.


Oh, doc.author is an array. I should’ve mentioned that or simplified the version I’ve posted. Sorry!

Cheers,
Julian

> 
> 
> 
> 
> On 12/mar/2010, at 14.10, Julian Stahnke wrote:
> 
>> Hello!
>> 
>> I have a problem with a view being slow, even though it’s indexed and cached and so on. I have database of books (–120,000 documents) and a map/reduce function that counts how many books there are per author. I’m then calling the view with ?group=true to get the list. I’m neither emitting nor outputting any actual documents, only the counts. This results in an output of about 78,000 key/value pairs that look like the following: {"key":"Albert Kapr","value":3}.
>> 
>> Now, even when the view is indexed and cached, it still takes 60 seconds to receive the output, using PHP’s cURL functions, the browser, whatever I’ve tried. Getting the same output served from a static file takes only a fraction of a second.
>> 
>> When I set limit=100, it’s basically instantaneous. I want to sort the output by value though, so I can’t really limit it or use ranges. Trying it with about 7,000 books, the request takes about 5 seconds, so it seems to be linear to the number of lines being output?
>> 
>> I’m using CouchDB 0.10.1 (the one that’s in homebrew) on a 2006 MacBook Pro.
>> 
>> Am I doing anything wrong, or should this really take so long? I wasn’t able to find any information about this—only about indexing being slow, but that doesn’t seem to be my problem.
>> 
>> Maybe I should also mention that I’m an interaction design student who used to be a front-end dev, but not a ‘real’ programmer.
>> 
>> Thanks for any help!
>> 
>> Best,
>> Julian
>> 
>> 
>> For reference, the map function:
>> 
>> function (doc)
>> {
>>   if (doc.author) {
>> 		for (i = 0; i < doc.author.length; i++) {
>> 			emit(doc.author[i], 1);
>> 		}
>>   } else {
>>       emit(null, 1);        
>>   }
>> }
>> 
>> The reduce function: 
>> 
>> function (keys, values, rereduce)
>> {
>>   return sum(values);
>> }
>> 
>> Some sample output:
>> 
>> {"rows":[
>> {"key":null,"value":1542},
>> {"key":"... Hans Arp ... /Konzept: Hans Christian Tavel .../","value":1},
>> ---more rows---
>> {"key":"Zwi Erich Kurzweil","value":1}
>> ]}
> 


Re: View output slow

Posted by Bruno Ronchetti <br...@mac.com>.
Hi Julian,

I also notice that your map function emits once for every *letter* in the author's name.

For instance having just one entry in the db:

{
   "_id": "01943fed255df913c33cab5c27b3bc7e",
   "_rev": "1-8288a511edad170e8e806281d0188033",
   "author": "Karl Marx",
   "title": "Das Kapital"
}


yields following results:

{"rows":[
{"key":null,"value":9}
]}

and, using the group=true clause for increased clarity, yields:

{"rows":[
{"key":" ","value":1},
{"key":"a","value":2},
{"key":"K","value":1},
{"key":"l","value":1},
{"key":"M","value":1},
{"key":"r","value":2},
{"key":"x","value":1}
]}
This could be another factor contributing to the slowness of your view.

Cheers. Bruno.




On 12/mar/2010, at 14.10, Julian Stahnke wrote:

> Hello!
> 
> I have a problem with a view being slow, even though it’s indexed and cached and so on. I have database of books (–120,000 documents) and a map/reduce function that counts how many books there are per author. I’m then calling the view with ?group=true to get the list. I’m neither emitting nor outputting any actual documents, only the counts. This results in an output of about 78,000 key/value pairs that look like the following: {"key":"Albert Kapr","value":3}.
> 
> Now, even when the view is indexed and cached, it still takes 60 seconds to receive the output, using PHP’s cURL functions, the browser, whatever I’ve tried. Getting the same output served from a static file takes only a fraction of a second.
> 
> When I set limit=100, it’s basically instantaneous. I want to sort the output by value though, so I can’t really limit it or use ranges. Trying it with about 7,000 books, the request takes about 5 seconds, so it seems to be linear to the number of lines being output?
> 
> I’m using CouchDB 0.10.1 (the one that’s in homebrew) on a 2006 MacBook Pro.
> 
> Am I doing anything wrong, or should this really take so long? I wasn’t able to find any information about this—only about indexing being slow, but that doesn’t seem to be my problem.
> 
> Maybe I should also mention that I’m an interaction design student who used to be a front-end dev, but not a ‘real’ programmer.
> 
> Thanks for any help!
> 
> Best,
> Julian
> 
> 
> For reference, the map function:
> 
> function (doc)
> {
>    if (doc.author) {
> 		for (i = 0; i < doc.author.length; i++) {
> 			emit(doc.author[i], 1);
> 		}
>    } else {
>        emit(null, 1);        
>    }
> }
> 
> The reduce function: 
> 
> function (keys, values, rereduce)
> {
>    return sum(values);
> }
> 
> Some sample output:
> 
> {"rows":[
> {"key":null,"value":1542},
> {"key":"... Hans Arp ... /Konzept: Hans Christian Tavel .../","value":1},
> ---more rows---
> {"key":"Zwi Erich Kurzweil","value":1}
> ]}