You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by "Andrew Stuart (SuperCoders)" <an...@supercoders.com.au> on 2011/06/30 02:55:54 UTC

Has Erlang's promise of parallelism been realised in CouchDB?

One of the primary supposed advantages of Erlang is its ability to  
parallelise.

Has this promise been realised as a performance and scalability  
benefit in CouchDB?  Or has the promise turned out to be too  
impractical to realise in any major way.

as

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Randall Leeds <ra...@gmail.com>.

On Thu, Jun 30, 2011 at 01:06, Dirkjan Ochtman <di...@ochtman.nl> wrote:
> On Thu, Jun 30, 2011 at 09:53, Randall Leeds <ra...@gmail.com> wrote:
>> But beyond the parallelization of request handling there's concurrency
>> in the more general sense. The neat thing about Erlang, and why it has
>> its reputation, is that the CouchDB code can be liberal about its use
>> of concurrency at the code level without suffering from deadlocks or
>> other headaches that often plague programmers of complex,
>> multi-threaded shared memory systems. The Erlang team has taken care
>> of all the hard parts about sharing data in a concurrent environment.
>> As I understand it, the Erlang runtime's use of chipsets with many
>> hardware threads is only improving, and those benefits will be
>> automatically conferred upon CouchDB.
>
> One thing that, as far as I know, has not been parallellized is the
> view indexer. While it should be possible to execute at least the map
> part of map/reduce concurrently, CouchDB doesn't do that yet. IIRC
> there were reasons for that? But at least it's good to be aware of.

This is true. The reason is that the protocol between CouchDB and the
view server is synchronous. Just as you need multiple connections to a
web server to make parallel HTTP requests, additional view server
processes could be run with some added complexity in the CouchDB code.
The better solution is to make this protocol asynchronous in the
future, but such a change will only happen with a major version bump
(2.0, for example) since it will break all the view servers currently
written for other languages.

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Robert Dionne <di...@dionne-associates.com>.

the reductions are stored as part of the btree nodes. the btree is append-only so a new branch from the leaves to the root is created
when writes occur. The reduce and rereduce functions are called along the way to compute the reductions and they are stored with 
the nodes. 

CouchDB uses a fixed chunk size, to break up the list of nodes to write into chunks. This determines the branching. So basically the nodes
in the btree have no capped or fixed size, they will vary with the size of the keys and the reductions. In general one wants a high branching
factor, making for shallower trees.

by internal I mean non-leaf nodes. 

On Jun 30, 2011, at 7:10 AM, Dirkjan Ochtman wrote:

> On Thu, Jun 30, 2011 at 12:38, Robert Dionne
> <di...@dionne-associates.com> wrote:
>>  I think one thing that really impacts performance with the view indexer is the storing of the reductions
>> in the internal nodes of the btree.
> 
> What do you mean by internal, here? Do you mean that they shouldn't be
> stored at all, or that they should be stored separately somehow?
> 
> (Alternatively: this sounds interesting, please explain what you mean.)
> 
> Cheers,
> 
> Dirkjan

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Dirkjan Ochtman <di...@ochtman.nl>.

On Thu, Jun 30, 2011 at 12:38, Robert Dionne
<di...@dionne-associates.com> wrote:
>  I think one thing that really impacts performance with the view indexer is the storing of the reductions
> in the internal nodes of the btree.

What do you mean by internal, here? Do you mean that they shouldn't be
stored at all, or that they should be stored separately somehow?

(Alternatively: this sounds interesting, please explain what you mean.)

Cheers,

Dirkjan

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Robert Newson <rn...@apache.org>.

Yes, couchdb makes good use of erlangs concurrency and will certainly
use more than one core with the caveat that you need concurrent access
to multiple databases/views.

B.

On 30 June 2011 11:38, Robert Dionne <di...@dionne-associates.com> wrote:
> Yes, that sounds possible.
>
>  I think one thing that really impacts performance with the view indexer is the storing of the reductions
> in the internal nodes of the btree.
>
>
>
> On Jun 30, 2011, at 5:26 AM, Dirkjan Ochtman wrote:
>
>> On Thu, Jun 30, 2011 at 11:21, Robert Newson <rn...@apache.org> wrote:
>>> Individual view building is sequential and it's hard to see how it
>>> could be otherwise, given then append-only nature of view files today.
>>
>> IIRC JSON encoding/decoding and the process of running the actual view
>> functions is a non-trivial part of view building, which could be
>> somewhat trivially parallellized at least for the map functions. I.e.,
>> if there are 10000 new documents to index, why not start 4 view
>> servers and let each of them process 25% of the updated documents? The
>> writes will be serialized again, of course, but I didn't think the
>> disk writes were the bottleneck for the view indexer?
>>
>> Cheers,
>>
>> Dirkjan
>
>

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Robert Dionne <di...@dionne-associates.com>.

Yes, that sounds possible.

 I think one thing that really impacts performance with the view indexer is the storing of the reductions
in the internal nodes of the btree.



On Jun 30, 2011, at 5:26 AM, Dirkjan Ochtman wrote:

> On Thu, Jun 30, 2011 at 11:21, Robert Newson <rn...@apache.org> wrote:
>> Individual view building is sequential and it's hard to see how it
>> could be otherwise, given then append-only nature of view files today.
> 
> IIRC JSON encoding/decoding and the process of running the actual view
> functions is a non-trivial part of view building, which could be
> somewhat trivially parallellized at least for the map functions. I.e.,
> if there are 10000 new documents to index, why not start 4 view
> servers and let each of them process 25% of the updated documents? The
> writes will be serialized again, of course, but I didn't think the
> disk writes were the bottleneck for the view indexer?
> 
> Cheers,
> 
> Dirkjan

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Dirkjan Ochtman <di...@ochtman.nl>.

On Thu, Jun 30, 2011 at 11:21, Robert Newson <rn...@apache.org> wrote:
> Individual view building is sequential and it's hard to see how it
> could be otherwise, given then append-only nature of view files today.

IIRC JSON encoding/decoding and the process of running the actual view
functions is a non-trivial part of view building, which could be
somewhat trivially parallellized at least for the map functions. I.e.,
if there are 10000 new documents to index, why not start 4 view
servers and let each of them process 25% of the updated documents? The
writes will be serialized again, of course, but I didn't think the
disk writes were the bottleneck for the view indexer?

Cheers,

Dirkjan

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Robert Newson <rn...@apache.org>.

Individual view building is sequential and it's hard to see how it
could be otherwise, given then append-only nature of view files today.

To be honest, this constraint is only of note in a single database,
single view scenario, which I don't think is likely, or even
interesting, and certainly not worth optimizing for.

I don't believe reads are parallelized either, since the file
descriptor is held by a gen_server and all calls to a gen_server are
serialized.

Where Erlang shines is the ability to have many requests processed
sanely at the same time, it just happens to need to be to different
databases and views in CouchDB. The reason reads seem fast, and why
you might think they are parallel, is because of disk caching effects.

B.

On 30 June 2011 10:16, Andrew Stuart (SuperCoders)
<an...@supercoders.com.au> wrote:
> I imagine that if anything needs the performance it is the view indexer?
>
> On 30/06/2011, at 6:06 PM, Dirkjan Ochtman wrote:
>
> On Thu, Jun 30, 2011 at 09:53, Randall Leeds <ra...@gmail.com>
> wrote:
>>
>> But beyond the parallelization of request handling there's concurrency
>> in the more general sense. The neat thing about Erlang, and why it has
>> its reputation, is that the CouchDB code can be liberal about its use
>> of concurrency at the code level without suffering from deadlocks or
>> other headaches that often plague programmers of complex,
>> multi-threaded shared memory systems. The Erlang team has taken care
>> of all the hard parts about sharing data in a concurrent environment.
>> As I understand it, the Erlang runtime's use of chipsets with many
>> hardware threads is only improving, and those benefits will be
>> automatically conferred upon CouchDB.
>
> One thing that, as far as I know, has not been parallellized is the
> view indexer. While it should be possible to execute at least the map
> part of map/reduce concurrently, CouchDB doesn't do that yet. IIRC
> there were reasons for that? But at least it's good to be aware of.
>
> Cheers,
>
> Dirkjan
> --
> Message  protected by MailGuard: e-mail anti-virus, anti-spam and content
> filtering.http://www.mailguard.com.au/mg
> Click here to report this message as spam:
> https://login.mailguard.com.au/report/1CCaMoXsmg/2UfOi5VJGXiukwyBDiHnm2/1
>

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by "Andrew Stuart (SuperCoders)" <an...@supercoders.com.au>.

I imagine that if anything needs the performance it is the view indexer?

On 30/06/2011, at 6:06 PM, Dirkjan Ochtman wrote:

On Thu, Jun 30, 2011 at 09:53, Randall Leeds <ra...@gmail.com>  
wrote:
> But beyond the parallelization of request handling there's concurrency
> in the more general sense. The neat thing about Erlang, and why it has
> its reputation, is that the CouchDB code can be liberal about its use
> of concurrency at the code level without suffering from deadlocks or
> other headaches that often plague programmers of complex,
> multi-threaded shared memory systems. The Erlang team has taken care
> of all the hard parts about sharing data in a concurrent environment.
> As I understand it, the Erlang runtime's use of chipsets with many
> hardware threads is only improving, and those benefits will be
> automatically conferred upon CouchDB.

One thing that, as far as I know, has not been parallellized is the
view indexer. While it should be possible to execute at least the map
part of map/reduce concurrently, CouchDB doesn't do that yet. IIRC
there were reasons for that? But at least it's good to be aware of.

Cheers,

Dirkjan
-- 
Message  protected by MailGuard: e-mail anti-virus, anti-spam and  
content filtering.http://www.mailguard.com.au/mg
Click here to report this message as spam:
https://login.mailguard.com.au/report/1CCaMoXsmg/2UfOi5VJGXiukwyBDiHnm2/1

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Dirkjan Ochtman <di...@ochtman.nl>.

On Thu, Jun 30, 2011 at 09:53, Randall Leeds <ra...@gmail.com> wrote:
> But beyond the parallelization of request handling there's concurrency
> in the more general sense. The neat thing about Erlang, and why it has
> its reputation, is that the CouchDB code can be liberal about its use
> of concurrency at the code level without suffering from deadlocks or
> other headaches that often plague programmers of complex,
> multi-threaded shared memory systems. The Erlang team has taken care
> of all the hard parts about sharing data in a concurrent environment.
> As I understand it, the Erlang runtime's use of chipsets with many
> hardware threads is only improving, and those benefits will be
> automatically conferred upon CouchDB.

One thing that, as far as I know, has not been parallellized is the
view indexer. While it should be possible to execute at least the map
part of map/reduce concurrently, CouchDB doesn't do that yet. IIRC
there were reasons for that? But at least it's good to be aware of.

Cheers,

Dirkjan

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by "Andrew Stuart (SuperCoders)" <an...@supercoders.com.au>.

So it sounds like the answer is yes!  CouchDB takes good advantage of  
parallelism.

Does this mean that if I run CouchDB on a multicore processor then  
overall performance will go up with each core (within reason)?   
CouchDB would make good use of an eight or sixteen core CPU?

as

On 30/06/2011, at 5:53 PM, Randall Leeds wrote:

On Wed, Jun 29, 2011 at 17:55, Andrew Stuart (SuperCoders)
<an...@supercoders.com.au> wrote:
> One of the primary supposed advantages of Erlang is its ability to
> parallelise.
>
> Has this promise been realised as a performance and scalability  
> benefit in
> CouchDB?  Or has the promise turned out to be too impractical to  
> realise in
> any major way.

Absolutely. It works out fabulously. There are some serialization
choke points along the write path, but reads are mostly free to be
executed entirely in parallel. With sufficiently fast disk(s) or
reading from hot, cached file pages I've seen good CPU utilization on
high-end multi-core and multi-cpu machines.

But beyond the parallelization of request handling there's concurrency
in the more general sense. The neat thing about Erlang, and why it has
its reputation, is that the CouchDB code can be liberal about its use
of concurrency at the code level without suffering from deadlocks or
other headaches that often plague programmers of complex,
multi-threaded shared memory systems. The Erlang team has taken care
of all the hard parts about sharing data in a concurrent environment.
As I understand it, the Erlang runtime's use of chipsets with many
hardware threads is only improving, and those benefits will be
automatically conferred upon CouchDB.

-Randall
-- 
Message  protected by MailGuard: e-mail anti-virus, anti-spam and  
content filtering.http://www.mailguard.com.au/mg
Click here to report this message as spam:
https://login.mailguard.com.au/report/1CCawyM1VJ/2ApnQ5MiX7V7bUwt2tI3Ow/0

Re: Has Erlang's promise of parallelism been realised in CouchDB?

Posted by Randall Leeds <ra...@gmail.com>.

On Wed, Jun 29, 2011 at 17:55, Andrew Stuart (SuperCoders)
<an...@supercoders.com.au> wrote:
> One of the primary supposed advantages of Erlang is its ability to
> parallelise.
>
> Has this promise been realised as a performance and scalability benefit in
> CouchDB?  Or has the promise turned out to be too impractical to realise in
> any major way.

Absolutely. It works out fabulously. There are some serialization
choke points along the write path, but reads are mostly free to be
executed entirely in parallel. With sufficiently fast disk(s) or
reading from hot, cached file pages I've seen good CPU utilization on
high-end multi-core and multi-cpu machines.

But beyond the parallelization of request handling there's concurrency
in the more general sense. The neat thing about Erlang, and why it has
its reputation, is that the CouchDB code can be liberal about its use
of concurrency at the code level without suffering from deadlocks or
other headaches that often plague programmers of complex,
multi-threaded shared memory systems. The Erlang team has taken care
of all the hard parts about sharing data in a concurrent environment.
As I understand it, the Erlang runtime's use of chipsets with many
hardware threads is only improving, and those benefits will be
automatically conferred upon CouchDB.

-Randall