You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by maddiin <ma...@googlemail.com> on 2008/11/22 21:25:46 UTC

Map/Reduce takes lots of time every request

Hi,

I have 5000 docs. Each has a field for tags like in 
http://wiki.apache.org/couchdb/Tags_inside_documents. I am also using 
the same map/reduce.

There are around 8000 unique tags and the map/reduce returns 
{"rows":[{"key":null,"value":27913}]} without group=true.

If I query with ?count=1000&group=true it takes around 8 seconds to 
fetch the results, without a count it takes beyond a minute.

It does so on every request, if I do a group=True and let it run and 
then rehit the same url it still takes the same amount of time.

I'm on latest trunk if that matters.

Do you have any advice what I am doing wrong and how I could speed this up?

Re: Map/Reduce takes lots of time every request

Posted by maddiin <ma...@googlemail.com>.

I am using a permanent view in my app, to make sure I tried with curl -X 
GET "http://localhost:5984/testdb/_view/posts/all_tags?group=true" with 
same results.

I recently installed otp_src_R12B-5, maybe it is misconfigured?
Erlang (BEAM) emulator version 5.6.5 [source] [async-threads:0] [hipe] 
[kernel-poll:false]


Damien Katz schrieb:
> It sounds like ever you are using a temp view, or an older version of 
> Erlang (R11 I think) that has performance problems.
>
> -Damien
>
> On Nov 22, 2008, at 4:20 PM, maddiin wrote:
>
>> > It sounds like you are building a tag-cloud.
>>
>> That is right.
>>
>>
>> > I'm curious how long it takes with reduce=false
>>
>> With reduce=False it takes around 50 seconds to load all tags.
>> 1000 tags are loading in 2-3 seconds with reduce=False.
>>
>> With reduce=True it takes around 80 seconds to load all tags.
>> 1000 tags are loading in 8-9 seconds with reduce=True.
>>
>>
>> > The smart money would be on caching the results of that operation, 
>> which is standard practice with SQL based tag clouds as well.
>>
>> I guess I will have to unless..
>>
>> > Also, I'm not sure, but perhaps it would be possible for CouchDB to 
>> cache final reduce values in the btree as well, so that group=true 
>> queries can save the cost of the final rereduce (and make subsequent 
>> queries fast...).
>>
>> That'd be nice.
>>
>>
>> Thanks for the response,
>> maddiin
>>
>>
>> Chris Anderson schrieb:
>>> On Sat, Nov 22, 2008 at 12:25 PM, maddiin <ma...@googlemail.com> 
>>> wrote:
>>>
>>>> Do you have any advice what I am doing wrong and how I could speed 
>>>> this up?
>>>>
>>>
>>>
>>> I'm curious how long it takes with reduce=false (should be limited
>>> basically by IO).
>>>
>>> I'm almost certain (please correct me if I'm wrong) that reduce
>>> requests must call the JavaScript interpreter at least once per
>>> request, to rereduce the btree inner-nodes that fit in that request
>>> range. This means for group=true requests, the rereduce function must
>>> run once per unique key (at minimum). That would be the source of your
>>> slowness. It sounds like you are building a tag-cloud. The smart money
>>> would be on caching the results of that operation, which is standard
>>> practice with SQL based tag clouds as well.
>>>
>>> If you're not doing a tag cloud, maybe there's a way you can get the
>>> needed results using map only?
>>>
>>> Also, I'm not sure, but perhaps it would be possible for CouchDB to
>>> cache final reduce values in the btree as well, so that group=true
>>> queries can save the cost of the final rereduce (and make subsequent
>>> queries fast...)
>>>
>>> Chris
>>>
>>>
>>>
>>
>
>

Re: Map/Reduce takes lots of time every request

Posted by Damien Katz <da...@apache.org>.

It sounds like ever you are using a temp view, or an older version of  
Erlang (R11 I think) that has performance problems.

-Damien

On Nov 22, 2008, at 4:20 PM, maddiin wrote:

> > It sounds like you are building a tag-cloud.
>
> That is right.
>
>
> > I'm curious how long it takes with reduce=false
>
> With reduce=False it takes around 50 seconds to load all tags.
> 1000 tags are loading in 2-3 seconds with reduce=False.
>
> With reduce=True it takes around 80 seconds to load all tags.
> 1000 tags are loading in 8-9 seconds with reduce=True.
>
>
> > The smart money would be on caching the results of that operation,  
> which is standard practice with SQL based tag clouds as well.
>
> I guess I will have to unless..
>
> > Also, I'm not sure, but perhaps it would be possible for CouchDB  
> to cache final reduce values in the btree as well, so that  
> group=true queries can save the cost of the final rereduce (and make  
> subsequent queries fast...).
>
> That'd be nice.
>
>
> Thanks for the response,
> maddiin
>
>
> Chris Anderson schrieb:
>> On Sat, Nov 22, 2008 at 12:25 PM, maddiin <ma...@googlemail.com>  
>> wrote:
>>
>>> Do you have any advice what I am doing wrong and how I could speed  
>>> this up?
>>>
>>
>>
>> I'm curious how long it takes with reduce=false (should be limited
>> basically by IO).
>>
>> I'm almost certain (please correct me if I'm wrong) that reduce
>> requests must call the JavaScript interpreter at least once per
>> request, to rereduce the btree inner-nodes that fit in that request
>> range. This means for group=true requests, the rereduce function must
>> run once per unique key (at minimum). That would be the source of  
>> your
>> slowness. It sounds like you are building a tag-cloud. The smart  
>> money
>> would be on caching the results of that operation, which is standard
>> practice with SQL based tag clouds as well.
>>
>> If you're not doing a tag cloud, maybe there's a way you can get the
>> needed results using map only?
>>
>> Also, I'm not sure, but perhaps it would be possible for CouchDB to
>> cache final reduce values in the btree as well, so that group=true
>> queries can save the cost of the final rereduce (and make subsequent
>> queries fast...)
>>
>> Chris
>>
>>
>>
>

Re: Map/Reduce takes lots of time every request

Posted by maddiin <ma...@googlemail.com>.

 > It sounds like you are building a tag-cloud.

That is right.


 > I'm curious how long it takes with reduce=false

With reduce=False it takes around 50 seconds to load all tags.
1000 tags are loading in 2-3 seconds with reduce=False.

With reduce=True it takes around 80 seconds to load all tags.
1000 tags are loading in 8-9 seconds with reduce=True.


 > The smart money would be on caching the results of that operation, 
which is standard practice with SQL based tag clouds as well.

I guess I will have to unless..

 > Also, I'm not sure, but perhaps it would be possible for CouchDB to 
cache final reduce values in the btree as well, so that group=true 
queries can save the cost of the final rereduce (and make subsequent 
queries fast...).

That'd be nice.


Thanks for the response,
maddiin


Chris Anderson schrieb:
> On Sat, Nov 22, 2008 at 12:25 PM, maddiin <ma...@googlemail.com> wrote:
>   
>> Do you have any advice what I am doing wrong and how I could speed this up?
>>     
>
>
> I'm curious how long it takes with reduce=false (should be limited
> basically by IO).
>
> I'm almost certain (please correct me if I'm wrong) that reduce
> requests must call the JavaScript interpreter at least once per
> request, to rereduce the btree inner-nodes that fit in that request
> range. This means for group=true requests, the rereduce function must
> run once per unique key (at minimum). That would be the source of your
> slowness. It sounds like you are building a tag-cloud. The smart money
> would be on caching the results of that operation, which is standard
> practice with SQL based tag clouds as well.
>
> If you're not doing a tag cloud, maybe there's a way you can get the
> needed results using map only?
>
> Also, I'm not sure, but perhaps it would be possible for CouchDB to
> cache final reduce values in the btree as well, so that group=true
> queries can save the cost of the final rereduce (and make subsequent
> queries fast...)
>
> Chris
>
>
>

Re: Map/Reduce takes lots of time every request

Posted by Nuno Job <nu...@gmail.com>.

Slightly offtopic: Anyone saw Simon Peyton Jones talking about making a
generic mapreduce-like in haskell?  http://tinyurl.com/537apv

On Sat, Nov 22, 2008 at 3:53 PM, Chris Anderson <jc...@apache.org> wrote:

> On Sat, Nov 22, 2008 at 12:25 PM, maddiin <ma...@googlemail.com> wrote:
> >
> > Do you have any advice what I am doing wrong and how I could speed this
> up?
>
>
> I'm curious how long it takes with reduce=false (should be limited
> basically by IO).
>
> I'm almost certain (please correct me if I'm wrong) that reduce
> requests must call the JavaScript interpreter at least once per
> request, to rereduce the btree inner-nodes that fit in that request
> range. This means for group=true requests, the rereduce function must
> run once per unique key (at minimum). That would be the source of your
> slowness. It sounds like you are building a tag-cloud. The smart money
> would be on caching the results of that operation, which is standard
> practice with SQL based tag clouds as well.
>
> If you're not doing a tag cloud, maybe there's a way you can get the
> needed results using map only?
>
> Also, I'm not sure, but perhaps it would be possible for CouchDB to
> cache final reduce values in the btree as well, so that group=true
> queries can save the cost of the final rereduce (and make subsequent
> queries fast...)
>
> Chris
>
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Re: Map/Reduce takes lots of time every request

Posted by Chris Anderson <jc...@apache.org>.

On Sat, Nov 22, 2008 at 12:25 PM, maddiin <ma...@googlemail.com> wrote:
>
> Do you have any advice what I am doing wrong and how I could speed this up?

I'm curious how long it takes with reduce=false (should be limited
basically by IO).

I'm almost certain (please correct me if I'm wrong) that reduce
requests must call the JavaScript interpreter at least once per
request, to rereduce the btree inner-nodes that fit in that request
range. This means for group=true requests, the rereduce function must
run once per unique key (at minimum). That would be the source of your
slowness. It sounds like you are building a tag-cloud. The smart money
would be on caching the results of that operation, which is standard
practice with SQL based tag clouds as well.

If you're not doing a tag cloud, maybe there's a way you can get the
needed results using map only?

Also, I'm not sure, but perhaps it would be possible for CouchDB to
cache final reduce values in the btree as well, so that group=true
queries can save the cost of the final rereduce (and make subsequent
queries fast...)

Chris

-- 
Chris Anderson
http://jchris.mfdz.com