You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by cdr53x <cd...@free.fr> on 2010/11/04 11:41:28 UTC

Re: How to speedup view generation?

On 10/30/2010 03:52 PM, Anand Chitipothu wrote:
> I'm trying to setup a couchdb database with 14M documents. The view
> generation is taking too long. It is running at the rate of 22
> docs/sec right now. At this rate it will take 7days to build the view,
> which is too slow and I expect the speed to go down further as the
> view file size increase.
>
>    

Hi ,

What is the size of the design document files on the drive ?

I noticed that large views use quite large file ;).

I also noticed that the view group indexers take a large amount time to 
achieve the last 30% of the task. At least twice then to complete the 
first 70%.

In my case I have a 'small' database containing  400K docs. I also hava 
a design doc that indexes 80% of the docs with 8 views. Map functions 
only emit a single property per doc and a null value, so they should be 
compact.

The overall size of this desing doc .view file on disk is 17G ;).

I don't know how couchdb handles the update of such large files but 
maybe there is something with updating large files ...

Concerning the performance, I use std javascript as interpreter and get 
a rate of ~60 changes/sec in the beginning of the process.

Then it drops to 15c/s after 70%.

I'm about 6c/s, then after 85%

The first 70% took 52minutes and the whole process runned for 3h21m on a 
small stand alone dedicated server.

So I get the feeling that it is not an issue with the view "calculation" 
algo, but probably something that is related to the disk i/o.

I have no erlang knowlege, and I might be quite wrong about the feeling, 
but if you guys know a little bit on this part of couch code  maybe 
there is something that could be checked and would improve the overall 
design doc refresh performance ?

Regards,

cdrx




Re: How to speedup view generation?

Posted by cdr53x <cd...@free.fr>.
On 11/04/2010 05:24 PM, Anand Chitipothu wrote:
> Yes, it is due is IO.
It would be interesting to check if it's the system IO that stucks or is 
it something in the way the IOs are performed by couch.

> I decided to generate the view by feeding the data directly to the map
> function and it took about an hour to generate the view for entire 14M
> docs. I sorted it, ran reduce and saved the results in another couchdb
> database. That was quite faster. I could finish the whole process in
> less than 10 hours.
>    
  What do you mean by "feeding the data to the map function" ?

Regards,

cdrx


Re: How to speedup view generation?

Posted by Anand Chitipothu <an...@gmail.com>.
2010/11/4 cdr53x <cd...@free.fr>:
> On 10/30/2010 03:52 PM, Anand Chitipothu wrote:
>>
>> I'm trying to setup a couchdb database with 14M documents. The view
>> generation is taking too long. It is running at the rate of 22
>> docs/sec right now. At this rate it will take 7days to build the view,
>> which is too slow and I expect the speed to go down further as the
>> view file size increase.
>>
>>
>
> Hi ,
>
> What is the size of the design document files on the drive ?
>
> I noticed that large views use quite large file ;).
>
> I also noticed that the view group indexers take a large amount time to
> achieve the last 30% of the task. At least twice then to complete the first
> 70%.
>
> In my case I have a 'small' database containing  400K docs. I also hava a
> design doc that indexes 80% of the docs with 8 views. Map functions only
> emit a single property per doc and a null value, so they should be compact.
>
> The overall size of this desing doc .view file on disk is 17G ;).
>
> I don't know how couchdb handles the update of such large files but maybe
> there is something with updating large files ...
>
> Concerning the performance, I use std javascript as interpreter and get a
> rate of ~60 changes/sec in the beginning of the process.
>
> Then it drops to 15c/s after 70%.
>
> I'm about 6c/s, then after 85%
>
> The first 70% took 52minutes and the whole process runned for 3h21m on a
> small stand alone dedicated server.
>
> So I get the feeling that it is not an issue with the view "calculation"
> algo, but probably something that is related to the disk i/o.
>
> I have no erlang knowlege, and I might be quite wrong about the feeling, but
> if you guys know a little bit on this part of couch code  maybe there is
> something that could be checked and would improve the overall design doc
> refresh performance ?

Yes, it is due is IO. In my case it started with a speed of 200
docs/sec and it dropped to almost 3docs/sec and the view file size was
about 60GB after processing something around 6-7M docs.

I noticed that the IO wait has increased to about 15 and the the
beam.smp and couchjs together weren't taking even 50% of one core. I
tried running compaction and looked like the size of the view will be
reduced to 1/6 after compaction, but it was still not progressing well
because if IO wait. Having an SSD might have helped, but I don't have
one.

So I thought it might be faster to run compaction after loading and
waiting for view generation to complete. Tried it and still it looked
like it not going to finish in one week. Even compaction is very very
slow.

I decided to generate the view by feeding the data directly to the map
function and it took about an hour to generate the view for entire 14M
docs. I sorted it, ran reduce and saved the results in another couchdb
database. That was quite faster. I could finish the whole process in
less than 10 hours.

The downside is that I need to take the pain of making sure the view
is up-to-date with the original database. I think that is the good
compromise.

Anand

Re: How to speedup view generation?

Posted by Randall Leeds <ra...@gmail.com>.
Ah. My mistake.

On Thu, Nov 4, 2010 at 16:32, Adam Kocoloski <ko...@apache.org> wrote:
> The view generation status updates ignore old MVCC versions completely.  We're a bit inconsistent in that department - the replicator reports the current sequence number, but the view updater reports the number of documents that have been loaded into memory so far.  As a result, I wouldn't expect the number of    obsolete updates to affect the perceived view updater progress much at all.  Best,
>
> Adam
>
> On Nov 4, 2010, at 5:27 PM, Randall Leeds wrote:
>
>> I'm more interested to know if compacting the database before building
>> the view helps.
>>
>> If you have many document updates it could be that the density of new
>> changes is higher at the end of the database file. In other words, the
>> I/O bottleneck isn't caused by a large file but but maybe reading more
>> information. Much of the early parts of the by-sequence btree may have
>> been overwritten by new changes and therefore ignored in view
>> generation.
>>
>> For example, if you have a database of 1 document with 1 million
>> changes, you would expect the view generation to roll through the
>> first 999k changes very quickly.
>>
>> -Randall
>>
>> On Thu, Nov 4, 2010 at 04:07, cdr53x <cd...@free.fr> wrote:
>>> On 11/04/2010 11:55 AM, Nils Breunese wrote:
>>>>
>>>> Also after compaction?
>>>>
>>>
>>> Yes after compaction. ;)
>>>
>>>
>>>
>>>
>>>
>>>
>
>

Re: How to speedup view generation?

Posted by Adam Kocoloski <ko...@apache.org>.
The view generation status updates ignore old MVCC versions completely.  We're a bit inconsistent in that department - the replicator reports the current sequence number, but the view updater reports the number of documents that have been loaded into memory so far.  As a result, I wouldn't expect the number of    obsolete updates to affect the perceived view updater progress much at all.  Best,

Adam

On Nov 4, 2010, at 5:27 PM, Randall Leeds wrote:

> I'm more interested to know if compacting the database before building
> the view helps.
> 
> If you have many document updates it could be that the density of new
> changes is higher at the end of the database file. In other words, the
> I/O bottleneck isn't caused by a large file but but maybe reading more
> information. Much of the early parts of the by-sequence btree may have
> been overwritten by new changes and therefore ignored in view
> generation.
> 
> For example, if you have a database of 1 document with 1 million
> changes, you would expect the view generation to roll through the
> first 999k changes very quickly.
> 
> -Randall
> 
> On Thu, Nov 4, 2010 at 04:07, cdr53x <cd...@free.fr> wrote:
>> On 11/04/2010 11:55 AM, Nils Breunese wrote:
>>> 
>>> Also after compaction?
>>> 
>> 
>> Yes after compaction. ;)
>> 
>> 
>> 
>> 
>> 
>> 


Re: How to speedup view generation?

Posted by Randall Leeds <ra...@gmail.com>.
I'm more interested to know if compacting the database before building
the view helps.

If you have many document updates it could be that the density of new
changes is higher at the end of the database file. In other words, the
I/O bottleneck isn't caused by a large file but but maybe reading more
information. Much of the early parts of the by-sequence btree may have
been overwritten by new changes and therefore ignored in view
generation.

For example, if you have a database of 1 document with 1 million
changes, you would expect the view generation to roll through the
first 999k changes very quickly.

-Randall

On Thu, Nov 4, 2010 at 04:07, cdr53x <cd...@free.fr> wrote:
> On 11/04/2010 11:55 AM, Nils Breunese wrote:
>>
>> Also after compaction?
>>
>
> Yes after compaction. ;)
>
>
>
>
>
>

Re: How to speedup view generation?

Posted by cdr53x <cd...@free.fr>.
On 11/04/2010 11:55 AM, Nils Breunese wrote:
> Also after compaction?
>    
Yes after compaction. ;)






Re: How to speedup view generation?

Posted by cdr53x <cd...@free.fr>.
On 11/04/2010 11:55 AM, Nils Breunese wrote:
> cdr53x wrote:
>
> The overall size of this desing doc .view file on disk is 17G ;).
Ouups, sorry , the file is not 17G, this is the overall size of the 
.design directory.

The desing doc .view file is "only" 9.2G. And yes, this is the size 
after compaction.

I'm sorry for the wrong number, but does not change the feeling about 
writing large files.

Regards,

cdrx





Re: How to speedup view generation?

Posted by Nils Breunese <N....@vpro.nl>.
cdr53x wrote:

> What is the size of the design document files on the drive ?
>
> I noticed that large views use quite large file ;).
>
> I also noticed that the view group indexers take a large amount time to
> achieve the last 30% of the task. At least twice then to complete the
> first 70%.
>
> In my case I have a 'small' database containing  400K docs. I also hava
> a design doc that indexes 80% of the docs with 8 views. Map functions
> only emit a single property per doc and a null value, so they should be
> compact.
>
> The overall size of this desing doc .view file on disk is 17G ;).

Also after compaction?

http://wiki.apache.org/couchdb/Compaction

Nils.
------------------------------------------------------------------------
 VPRO
 phone:  +31(0)356712911
 e-mail: info@vpro.nl
 web:    www.vpro.nl
------------------------------------------------------------------------