You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Chris Van Pelt <va...@gmail.com> on 2009/01/07 23:49:46 UTC

Slooooow views

I'm pretty new to couch, and I'm wondering if there is a way to
improve the performance of my views.  My views are very slow during
generation, but also during simple queries.

A request to one of my views after it is fully updated takes 2-4
seconds.  The view is rather complex, using compound keys and an
involved reduce step, but I assumed that didn't matter once the update
step was complete.  I have 10000 documents in my DB.

The update step, after adding 11 documents, takes 23 seconds.

Because I have a web server talking directly to couch, I can't afford
requests taking longer than a few seconds.  Ideally my requests would
be in the tenths of seconds.  How do people deal with this in the real
world?  I understand I can fire off view generation manually, but than
all other requests hang while the view is being generated.

I've also been looking for a way to edit my views without taking down
my entire site for the 15 minutes it takes to regenerate the entire
index.  It seems plausible to keep the old index around for queries
while the new index is being created, no?

Chris

Re: Slooooow views

Posted by Chris Van Pelt <va...@gmail.com>.
I chose couch because I needed a way to take arbitrary hashes and
combine them, performing various operations on dynamic key/value
pairs.  Seeing that couch would eventually be able to do this in a
distributed manor seemed like a great fit.

My impression was that the reduce step was incremental once the
functions were defined...  Given the referential transparency of my
reduce function, I don't understand the performance impact incurred by
the large dynamic hash output from my reduce function.  Can you think
of a better fit for my needs in another solution?

Chris

On Wed, Jan 7, 2009 at 4:00 PM, Damien Katz <da...@apache.org> wrote:
> In Couchdb, your reductions must compute to smallish, fixed sized data. The
> problem is your reduce function, it's builds up and returns a map of values,
> and as it computes the index, it will actually compute the reduction of
> every value in the view. Every time the index is updated, it does this.
>
> -Damien
>
>
> On Jan 7, 2009, at 6:38 PM, Chris Van Pelt wrote:
>
>> Ok, so I created a gist with the map, reduce, and a document:
>> http://gist.github.com/44497
>>
>> The purpose of this view is to combine multiple judgments (the data
>> attribute of the doc) for a single unit_id.  The "fields
>> attribute tells couch how to aggregate the data (averaging numbers,
>> choosing the most common item, etc.).
>>
>> I do use group=true, along with skip and count when querying this
>> view.  I understand that skip can slow things down, but the request is
>> still slow when skip is 0.
>>
>> Another strange thing is that even when I query one of my "count"
>> views (a simple sum() reduce step) I experience the same lag.  Could
>> this be because my count views are a part of the same design document?
>>
>> Also are there better ways to debug this?  I've set my log level to
>> debug, but it doesn't give me details about where the time spent
>> processing is going, and I can only gauge response times to the
>> second...
>>
>> Chris
>>
>> On Wed, Jan 7, 2009 at 3:12 PM, Chris Anderson <jc...@gmail.com> wrote:
>>>
>>> On Wed, Jan 7, 2009 at 3:07 PM, Jeremy Wall <jw...@google.com> wrote:
>>>>
>>>> Maybe someone else could chime in on when you get the hit for reduction?
>>>>
>>>
>>> Based on my use of log() in the reduce function, it looks like for
>>> each reduce query, the reduce function is run once, to obtain the
>>> final reduce value.
>>>
>>> When you run a group=true, or group_level reduce query, which returns
>>> values for many keys, you'll end up running the final reduction once
>>> per returned value. I think this could be optimized to avoid running
>>> final reduces if they've already been run for those key-ranges. I'm
>>> not sure how much work that would be.
>>>
>>> --
>>> Chris Anderson
>>> http://jchris.mfdz.com
>>>
>
>

Re: Slooooow views

Posted by Damien Katz <da...@apache.org>.
In Couchdb, your reductions must compute to smallish, fixed sized  
data. The problem is your reduce function, it's builds up and returns  
a map of values, and as it computes the index, it will actually  
compute the reduction of every value in the view. Every time the index  
is updated, it does this.

-Damien


On Jan 7, 2009, at 6:38 PM, Chris Van Pelt wrote:

> Ok, so I created a gist with the map, reduce, and a document:
> http://gist.github.com/44497
>
> The purpose of this view is to combine multiple judgments (the data
> attribute of the doc) for a single unit_id.  The "fields
> attribute tells couch how to aggregate the data (averaging numbers,
> choosing the most common item, etc.).
>
> I do use group=true, along with skip and count when querying this
> view.  I understand that skip can slow things down, but the request is
> still slow when skip is 0.
>
> Another strange thing is that even when I query one of my "count"
> views (a simple sum() reduce step) I experience the same lag.  Could
> this be because my count views are a part of the same design document?
>
> Also are there better ways to debug this?  I've set my log level to
> debug, but it doesn't give me details about where the time spent
> processing is going, and I can only gauge response times to the
> second...
>
> Chris
>
> On Wed, Jan 7, 2009 at 3:12 PM, Chris Anderson <jc...@gmail.com>  
> wrote:
>> On Wed, Jan 7, 2009 at 3:07 PM, Jeremy Wall <jw...@google.com> wrote:
>>>
>>> Maybe someone else could chime in on when you get the hit for  
>>> reduction?
>>>
>>
>> Based on my use of log() in the reduce function, it looks like for
>> each reduce query, the reduce function is run once, to obtain the
>> final reduce value.
>>
>> When you run a group=true, or group_level reduce query, which returns
>> values for many keys, you'll end up running the final reduction once
>> per returned value. I think this could be optimized to avoid running
>> final reduces if they've already been run for those key-ranges. I'm
>> not sure how much work that would be.
>>
>> --
>> Chris Anderson
>> http://jchris.mfdz.com
>>


Re: Slooooow views

Posted by Chris Van Pelt <va...@gmail.com>.
Ok, so I created a gist with the map, reduce, and a document:
http://gist.github.com/44497

The purpose of this view is to combine multiple judgments (the data
attribute of the doc) for a single unit_id.  The "fields
 attribute tells couch how to aggregate the data (averaging numbers,
choosing the most common item, etc.).

I do use group=true, along with skip and count when querying this
view.  I understand that skip can slow things down, but the request is
still slow when skip is 0.

Another strange thing is that even when I query one of my "count"
views (a simple sum() reduce step) I experience the same lag.  Could
this be because my count views are a part of the same design document?

Also are there better ways to debug this?  I've set my log level to
debug, but it doesn't give me details about where the time spent
processing is going, and I can only gauge response times to the
second...

Chris

On Wed, Jan 7, 2009 at 3:12 PM, Chris Anderson <jc...@gmail.com> wrote:
> On Wed, Jan 7, 2009 at 3:07 PM, Jeremy Wall <jw...@google.com> wrote:
>>
>> Maybe someone else could chime in on when you get the hit for reduction?
>>
>
> Based on my use of log() in the reduce function, it looks like for
> each reduce query, the reduce function is run once, to obtain the
> final reduce value.
>
> When you run a group=true, or group_level reduce query, which returns
> values for many keys, you'll end up running the final reduction once
> per returned value. I think this could be optimized to avoid running
> final reduces if they've already been run for those key-ranges. I'm
> not sure how much work that would be.
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Re: Slooooow views

Posted by Dan Reverri <re...@gmail.com>.
Would it be possible to query the view with update=false from the web
app and run a background job to update the view?

I am not sure if update=false applies to reduced views or if the old
view is available if the view is updating from another request.


On Wed, Jan 7, 2009 at 3:12 PM, Chris Anderson <jc...@gmail.com> wrote:
> On Wed, Jan 7, 2009 at 3:07 PM, Jeremy Wall <jw...@google.com> wrote:
>>
>> Maybe someone else could chime in on when you get the hit for reduction?
>>
>
> Based on my use of log() in the reduce function, it looks like for
> each reduce query, the reduce function is run once, to obtain the
> final reduce value.
>
> When you run a group=true, or group_level reduce query, which returns
> values for many keys, you'll end up running the final reduction once
> per returned value. I think this could be optimized to avoid running
> final reduces if they've already been run for those key-ranges. I'm
> not sure how much work that would be.
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Re: Slooooow views

Posted by Chris Anderson <jc...@gmail.com>.
On Wed, Jan 7, 2009 at 3:07 PM, Jeremy Wall <jw...@google.com> wrote:
>
> Maybe someone else could chime in on when you get the hit for reduction?
>

Based on my use of log() in the reduce function, it looks like for
each reduce query, the reduce function is run once, to obtain the
final reduce value.

When you run a group=true, or group_level reduce query, which returns
values for many keys, you'll end up running the final reduction once
per returned value. I think this could be optimized to avoid running
final reduces if they've already been run for those key-ranges. I'm
not sure how much work that would be.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Slooooow views

Posted by Jeremy Wall <jw...@google.com>.
I'm not sure but from recent discussion on the list I think that reduces
happen at query time so complex reduces will slow down queries. Complex Maps
should only be an update time hit so those don't figure into a query when
the index is updated.

Maybe someone else could chime in on when you get the hit for reduction?

On Wed, Jan 7, 2009 at 4:49 PM, Chris Van Pelt <va...@gmail.com> wrote:

> I'm pretty new to couch, and I'm wondering if there is a way to
> improve the performance of my views.  My views are very slow during
> generation, but also during simple queries.
>
> A request to one of my views after it is fully updated takes 2-4
> seconds.  The view is rather complex, using compound keys and an
> involved reduce step, but I assumed that didn't matter once the update
> step was complete.  I have 10000 documents in my DB.
>
> The update step, after adding 11 documents, takes 23 seconds.
>
> Because I have a web server talking directly to couch, I can't afford
> requests taking longer than a few seconds.  Ideally my requests would
> be in the tenths of seconds.  How do people deal with this in the real
> world?  I understand I can fire off view generation manually, but than
> all other requests hang while the view is being generated.
>
> I've also been looking for a way to edit my views without taking down
> my entire site for the 15 minutes it takes to regenerate the entire
> index.  It seems plausible to keep the old index around for queries
> while the new index is being created, no?
>
> Chris
>

Re: Slooooow views

Posted by Chris Van Pelt <va...@gmail.com>.
I was using R11B...  I'm installing R12B-5 now.  Should I expect an
amazing performance increase?

Filled with anticipation :)

On Wed, Jan 7, 2009 at 3:37 PM, Jan Lehnardt <ja...@apache.org> wrote:
> Can you make sure you are running on Erlang R12B-4 or newer?
>
> Cheers
> Jan
> --
> On 7 Jan 2009, at 23:49, Chris Van Pelt wrote:
>
>> I'm pretty new to couch, and I'm wondering if there is a way to
>> improve the performance of my views.  My views are very slow during
>> generation, but also during simple queries.
>>
>> A request to one of my views after it is fully updated takes 2-4
>> seconds.  The view is rather complex, using compound keys and an
>> involved reduce step, but I assumed that didn't matter once the update
>> step was complete.  I have 10000 documents in my DB.
>>
>> The update step, after adding 11 documents, takes 23 seconds.
>>
>> Because I have a web server talking directly to couch, I can't afford
>> requests taking longer than a few seconds.  Ideally my requests would
>> be in the tenths of seconds.  How do people deal with this in the real
>> world?  I understand I can fire off view generation manually, but than
>> all other requests hang while the view is being generated.
>>
>> I've also been looking for a way to edit my views without taking down
>> my entire site for the 15 minutes it takes to regenerate the entire
>> index.  It seems plausible to keep the old index around for queries
>> while the new index is being created, no?
>>
>> Chris
>>
>
>

Re: Slooooow views

Posted by Jan Lehnardt <ja...@apache.org>.
Can you make sure you are running on Erlang R12B-4 or newer?

Cheers
Jan
--
On 7 Jan 2009, at 23:49, Chris Van Pelt wrote:

> I'm pretty new to couch, and I'm wondering if there is a way to
> improve the performance of my views.  My views are very slow during
> generation, but also during simple queries.
>
> A request to one of my views after it is fully updated takes 2-4
> seconds.  The view is rather complex, using compound keys and an
> involved reduce step, but I assumed that didn't matter once the update
> step was complete.  I have 10000 documents in my DB.
>
> The update step, after adding 11 documents, takes 23 seconds.
>
> Because I have a web server talking directly to couch, I can't afford
> requests taking longer than a few seconds.  Ideally my requests would
> be in the tenths of seconds.  How do people deal with this in the real
> world?  I understand I can fire off view generation manually, but than
> all other requests hang while the view is being generated.
>
> I've also been looking for a way to edit my views without taking down
> my entire site for the 15 minutes it takes to regenerate the entire
> index.  It seems plausible to keep the old index around for queries
> while the new index is being created, no?
>
> Chris
>


Re: Slooooow views

Posted by Chris Anderson <jc...@gmail.com>.
On Wed, Jan 7, 2009 at 2:49 PM, Chris Van Pelt <va...@gmail.com> wrote:
> A request to one of my views after it is fully updated takes 2-4
> seconds.  The view is rather complex, using compound keys and an
> involved reduce step, but I assumed that didn't matter once the update
> step was complete.  I have 10000 documents in my DB.
>
> The update step, after adding 11 documents, takes 23 seconds.

2 seconds per document sounds like rather a lot. If you could post
your view and example document that might help us find the problem.


>
> I've also been looking for a way to edit my views without taking down
> my entire site for the 15 minutes it takes to regenerate the entire
> index.  It seems plausible to keep the old index around for queries
> while the new index is being created, no?
>

I think this is till an open question... the answer will be to leave
the old design doc alone, while generating the new one under a new
name. The more interesting part is how to upgrade users to newer
versions of your application.

In the simplest case for this, "users" are your app-servers, in which
case you can just change the query running code to query the new
design doc. If you push the new design doc to production, and kick off
view generation, but wait until generation is complete to deploy new
application code, you should avoid slow end-user performance.
CouchRest (my Ruby adapter) handles this by naming design docs after a
hash of their contents. This way you never end up overwriting one, and
new code talks to new design docs automatically.

It's a little harder to manage the upgrades for Ajax apps that are
querying your views directly from the browser. In this case you
probably don't want at least the static HTML pages urls to change. If
they are contained in design docs themselves, then you've at least got
to change the html and javascript attachments on the old design doc,
to point to the views in the new design doc. But you don't want to
make that change until after the views have generated.

So you can see it's possible, but not very pretty. I think we should
world to make the upgrade path here a little simpler.

-- 
Chris Anderson
http://jchris.mfdz.com