You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Anand Chitipothu <an...@gmail.com> on 2009/03/20 07:34:24 UTC

using mutliple cores for view computation

I have created a new couchdb database on a quad-core machine, loaded
the database with 30 million documents and created a view.

I accessed the view for the first time, to start the view computation.
>From whatever I can see from the output of top command, beam.smp is
taking around 70% of cpu and couchjs around 30%.

Shouldn't the view computation process use all the 4 cores for doing
the computation?

Thanks,
Anand

Re: using mutliple cores for view computation

Posted by Adam Kocoloski <ko...@apache.org>.
On Mar 20, 2009, at 7:35 AM, Anand Chitipothu wrote:

>> I don't see anything out of the ordinary there. If you're not on  
>> trunk
>> then you'll definitely want to upgrade.
>
> my working copy is just 3 days old.
>
> Revision: 755487
> Last Changed Date: 2009-03-17 11:52:57 +0000 (Tue, 17 Mar 2009)
>
> I have about 30M documents and size of database is 26G.
>
> One thing that is worrying me is couchjs is not taking much cpu, most
> of the time cpu is taken by beam.smp. And it looks like both of them
> are sharing single core. At least, these 2 processes can take one core
> each. Any idea why it is not so?

Hi Anand, this matches my experience.  I believe the reason they  
appear to share a core is that the Erlang process waits on couchjs to  
send back results before continuing on with the indexing.  The  
workflow looks like

1) Pull docs into memory until we reach some threshold

2) map over docs:
     - convert doc to JSON
     - send JSON to view server
     - receive result from view server

3) Update the btrees with the results from the view server

4) Repeat

As others have said, serious attempts to optimize this workflow  
haven't really begun yet.  Best, Adam

Re: using mutliple cores for view computation

Posted by Chris Anderson <jc...@apache.org>.
On Sat, Mar 21, 2009 at 6:55 AM, Brenton Alker <br...@tekerson.com> wrote:
> Anand Chitipothu wrote:
>>> All the views in a design document computed in one process.
>>
>> If one view in a design document is modified, will it compute only the
>> modified view or all the views in the design document?
>
> I believe only the view that is modified, and only the next time it is
> requested.
>

Each design document is computed as a unit. If you want to add new
views while keeping old ones available it's best to do it in a new
design document.

> --
>
> Brenton Alker
> PHP Developer - Brisbane, Australia
>
> http://blog.tekerson.com/
>
>



-- 
Chris Anderson
http://jchris.mfdz.com

Re: using mutliple cores for view computation

Posted by Anand Chitipothu <an...@gmail.com>.
It took 29 hours to compute the view.
I tested the performance and stopped the couchdb server with Ctrl-C.
When I restarted couchdb and tried to access the view, it started
computing the view again.
Am I doing something wrong? or is this a bug?

Thanks,
Anand

Re: using mutliple cores for view computation

Posted by Brenton Alker <br...@tekerson.com>.
Anand Chitipothu wrote:
>> All the views in a design document computed in one process.
> 
> If one view in a design document is modified, will it compute only the
> modified view or all the views in the design document?

I believe only the view that is modified, and only the next time it is
requested.

-- 

Brenton Alker
PHP Developer - Brisbane, Australia

http://blog.tekerson.com/


Re: using mutliple cores for view computation

Posted by Anand Chitipothu <an...@gmail.com>.
> All the views in a design document computed in one process.

If one view in a design document is modified, will it compute only the
modified view or all the views in the design document?

Re: using mutliple cores for view computation

Posted by Adam Kocoloski <ko...@apache.org>.
On Mar 20, 2009, at 7:59 AM, Anand Chitipothu wrote:

>> If you started building another view
>> it should bring up a new JS process and use a second core as  
>> expected.
>
> Does it compute only the requested view or all the views in the  
> design document?

All the views in a design document computed in one process.  Paul was  
talking about a view from another design doc.  Best,

Adam



Re: using mutliple cores for view computation

Posted by Anand Chitipothu <an...@gmail.com>.
> If they required more than a single core they would take different
> cores. But beam.smp is going to be waiting on blocking io while
> couchjs is working and vice versa so your kernel might be keeping them
> on the same core for efficiency.

Looks like there is some room for improvement here. The beam.smp can
prefetch the docs instead of waiting for couchjs to finish the task.

> If you started building another view
> it should bring up a new JS process and use a second core as expected.

Does it compute only the requested view or all the views in the design document?

> You should be able to estimate the amount of time it'll take from the
> status page in Futon.

Thanks, I didn't know about this.
It finished just 37%, it means it is going to take 10 more hours.

Re: using mutliple cores for view computation

Posted by Paul Davis <pa...@gmail.com>.
On Fri, Mar 20, 2009 at 7:35 AM, Anand Chitipothu <an...@gmail.com> wrote:
>> I don't see anything out of the ordinary there. If you're not on trunk
>> then you'll definitely want to upgrade.
>
> my working copy is just 3 days old.
>
> Revision: 755487
> Last Changed Date: 2009-03-17 11:52:57 +0000 (Tue, 17 Mar 2009)
>
> I have about 30M documents and size of database is 26G.
>
> One thing that is worrying me is couchjs is not taking much cpu, most
> of the time cpu is taken by beam.smp. And it looks like both of them
> are sharing single core. At least, these 2 processes can take one core
> each. Any idea why it is not so?
>

If they required more than a single core they would take different
cores. But beam.smp is going to be waiting on blocking io while
couchjs is working and vice versa so your kernel might be keeping them
on the same core for efficiency. If you started building another view
it should bring up a new JS process and use a second core as expected.
As Jan said, we haven't done anything at all to optimize any of this
obviously. I wouldn't pay so much attention to the process table. You
should be able to estimate the amount of time it'll take from the
status page in Futon.

HTH,
Paul Davis

Re: using mutliple cores for view computation

Posted by Anand Chitipothu <an...@gmail.com>.
> I don't see anything out of the ordinary there. If you're not on trunk
> then you'll definitely want to upgrade.

my working copy is just 3 days old.

Revision: 755487
Last Changed Date: 2009-03-17 11:52:57 +0000 (Tue, 17 Mar 2009)

I have about 30M documents and size of database is 26G.

One thing that is worrying me is couchjs is not taking much cpu, most
of the time cpu is taken by beam.smp. And it looks like both of them
are sharing single core. At least, these 2 processes can take one core
each. Any idea why it is not so?

Re: using mutliple cores for view computation

Posted by Paul Davis <pa...@gmail.com>.
On Fri, Mar 20, 2009 at 7:17 AM, Anand Chitipothu <an...@gmail.com> wrote:
>> There are also tricks to make views compute faster, can you show your
>> m/r function and how large your documents are?
>
> view is here: http://gist.github.com/82322
> Most of the documents are between 0.5K to 1K in size.
>

I don't see anything out of the ordinary there. If you're not on trunk
then you'll definitely want to upgrade.

Re: using mutliple cores for view computation

Posted by Anand Chitipothu <an...@gmail.com>.
> There are also tricks to make views compute faster, can you show your
> m/r function and how large your documents are?

view is here: http://gist.github.com/82322
Most of the documents are between 0.5K to 1K in size.

Re: using mutliple cores for view computation

Posted by Jan Lehnardt <ja...@apache.org>.
On 20 Mar 2009, at 10:41, Anand Chitipothu wrote:

>> Can you check iostat to see how fast data gets passed to the disk?
>> If we're saturating your write speed already, there's no need to go
>> multi process.
>>
>> (I expect that we don't saturate the disk just yet, so yeah there's  
>> room
>> for improvement, but I don't think disk is fast enough for 4 cores  
>> spitting
>> out data).
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda1              1.43        33.72        60.85  129287651  233314088
>
> It is surely not saturated by I/O. %iowait is less than 1.0.
>
> The view computation is running from more than 5 hours and I would
> like to see the view server use all the available cores and it will be
> even better it if is possible add more nodes to assist the view
> computation.

If you look into the archives, there are a buch of things that are still
open for speeding this up. We're not there yet.

If you want to have more nodes helping, you need the data on these
cores as well. All speedup might be eaten by data-transfer time.

There are also tricks to make views compute faster, can you show your
m/r function and how large your documents are?

Cheers
Jan
--


Re: using mutliple cores for view computation

Posted by Anand Chitipothu <an...@gmail.com>.
> Patches welcome. ;)

I would love to, but I don't speak erlang yet.

Re: using mutliple cores for view computation

Posted by Paul Davis <pa...@gmail.com>.
On Fri, Mar 20, 2009 at 6:41 AM, Anand Chitipothu <an...@gmail.com> wrote:
>> Can you check iostat to see how fast data gets passed to the disk?
>> If we're saturating your write speed already, there's no need to go
>> multi process.
>>
>> (I expect that we don't saturate the disk just yet, so yeah there's room
>> for improvement, but I don't think disk is fast enough for 4 cores spitting
>> out data).
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda1              1.43        33.72        60.85  129287651  233314088
>
> It is surely not saturated by I/O. %iowait is less than 1.0.
>
> The view computation is running from more than 5 hours and I would
> like to see the view server use all the available cores and it will be
> even better it if is possible add more nodes to assist the view
> computation.
>

Patches welcome. ;)

Re: using mutliple cores for view computation

Posted by Anand Chitipothu <an...@gmail.com>.
> Can you check iostat to see how fast data gets passed to the disk?
> If we're saturating your write speed already, there's no need to go
> multi process.
>
> (I expect that we don't saturate the disk just yet, so yeah there's room
> for improvement, but I don't think disk is fast enough for 4 cores spitting
> out data).

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda1              1.43        33.72        60.85  129287651  233314088

It is surely not saturated by I/O. %iowait is less than 1.0.

The view computation is running from more than 5 hours and I would
like to see the view server use all the available cores and it will be
even better it if is possible add more nodes to assist the view
computation.

Re: using mutliple cores for view computation

Posted by Jan Lehnardt <ja...@apache.org>.
On 20 Mar 2009, at 06:34, Anand Chitipothu wrote:

> I have created a new couchdb database on a quad-core machine, loaded
> the database with 30 million documents and created a view.
>
> I accessed the view for the first time, to start the view computation.
> From whatever I can see from the output of top command, beam.smp is
> taking around 70% of cpu and couchjs around 30%.
>
> Shouldn't the view computation process use all the 4 cores for doing
> the computation?

Can you check iostat to see how fast data gets passed to the disk?
If we're saturating your write speed already, there's no need to go
multi process.

(I expect that we don't saturate the disk just yet, so yeah there's room
for improvement, but I don't think disk is fast enough for 4 cores  
spitting
out data).

Cheers
Jan
--


Re: using mutliple cores for view computation

Posted by Paul Davis <pa...@gmail.com>.
On Fri, Mar 20, 2009 at 5:03 AM, Cedric Vivier <ce...@neonux.com> wrote:
> On Fri, Mar 20, 2009 at 4:49 PM, Paul Davis <pa...@gmail.com>wrote:
>
>> A slight clarification, JS itself is multi-threaded. Couchjs is just
>> not implemented as a threaded view server. I've contemplated giving
>> view servers the ability to exectute multiple jobs simultaneously but
>> there are a couple of things that make it not a priority.
>
>
> I also think keeping view servers as simple as possible makes sense.
> Otoh is it possible to make CouchDb evenly distribute jobs to several
> spawned couchjs view server instances?

Not yet. The idea has been floated, but the disk IO caveat lurks there
like a menacing monster.

> (and the OS scheduler would make sure that the different view server
> processes ends up running on different cores I guess... though of course the
> 'bottleneck' might be the I/O anyways)
>
> Regards,
>

Re: using mutliple cores for view computation

Posted by Cedric Vivier <ce...@neonux.com>.
On Fri, Mar 20, 2009 at 4:49 PM, Paul Davis <pa...@gmail.com>wrote:

> A slight clarification, JS itself is multi-threaded. Couchjs is just
> not implemented as a threaded view server. I've contemplated giving
> view servers the ability to exectute multiple jobs simultaneously but
> there are a couple of things that make it not a priority.


I also think keeping view servers as simple as possible makes sense.
Otoh is it possible to make CouchDb evenly distribute jobs to several
spawned couchjs view server instances?
(and the OS scheduler would make sure that the different view server
processes ends up running on different cores I guess... though of course the
'bottleneck' might be the I/O anyways)

Regards,

Re: using mutliple cores for view computation

Posted by Paul Davis <pa...@gmail.com>.
A slight clarification, JS itself is multi-threaded. Couchjs is just
not implemented as a threaded view server. I've contemplated giving
view servers the ability to exectute multiple jobs simultaneously but
there are a couple of things that make it not a priority.

Firstly, view computation is generally reported to be rate limited by
disk write throughput. Secondly, why move all that complicated
threading logic into the view server when erlang can just spawn four
os processes. There's definitely room for improvement in OS process
handling but I would place any major parallelization type changes post
1.0.

HTH,
Paul Davis

On Fri, Mar 20, 2009 at 3:05 AM, Daniel Friesen
<li...@danielfriesen.name> wrote:
> JavaScript is single-threaded, views are written in JavaScript thus a view
> can only make use of a single core.
>
> On the positive side, it does mean that while doing a view, you still have
> the cpu power to handle two other views, and handle couch requests with
> dedicated cores.
>
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire)
>
> Anand Chitipothu wrote:
>>
>> I have created a new couchdb database on a quad-core machine, loaded
>> the database with 30 million documents and created a view.
>>
>> I accessed the view for the first time, to start the view computation.
>> From whatever I can see from the output of top command, beam.smp is
>> taking around 70% of cpu and couchjs around 30%.
>>
>> Shouldn't the view computation process use all the 4 cores for doing
>> the computation?
>>
>> Thanks,
>> Anand
>>
>
>

Re: using mutliple cores for view computation

Posted by Daniel Friesen <li...@danielfriesen.name>.
JavaScript is single-threaded, views are written in JavaScript thus a 
view can only make use of a single core.

On the positive side, it does mean that while doing a view, you still 
have the cpu power to handle two other views, and handle couch requests 
with dedicated cores.

~Daniel Friesen (Dantman, Nadir-Seen-Fire)

Anand Chitipothu wrote:
> I have created a new couchdb database on a quad-core machine, loaded
> the database with 30 million documents and created a view.
>
> I accessed the view for the first time, to start the view computation.
> From whatever I can see from the output of top command, beam.smp is
> taking around 70% of cpu and couchjs around 30%.
>
> Shouldn't the view computation process use all the 4 cores for doing
> the computation?
>
> Thanks,
> Anand
>