You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Martin Hedenfalk <ma...@bzero.se> on 2009/09/03 08:33:58 UTC

comfy couch?

Hello couchdb users,

We're currently evaluating couchdb for a project and would very much  
appreciate
any thoughts and insights regarding the suitability of couchdb for this.

We want to use CouchDB on several nodes (say around 12 nodes  
initially, but may
grow over time). Each node runs our application that handles external  
requests
by updating the local couchdb database. There is no loadbalancer.
Continuous replication between all nodes make sure all databases are
(eventually) consistent. The same document can be updated via  
different nodes,
so conflicts are expected to happen.

Any node can fail and should be synchronized when it is back online.

We anticipate the system to handle around 1000 transactions per  
second. While
load is not necessarily distributed evenly between all nodes, let's  
say around
100 tps per node.

What delay can we expect before a document is replicated to any of the  
other
nodes? The couchdb book mentions a "complex algorithm determining the  
ideal
moment to replicate". My initial tests (although with only two nodes  
the same
network segment) indicates replication happens nearly instantly when  
inserting
some hundred documents per second.

Would a full mesh of continuous replication between many nodes pose a  
problem?
So far I've only tested replication between two nodes. I'll add more  
nodes to
the testbed as I get my hands on more machines.

Is there any difference between push and pull mode replication?

There was however one troublesome issue with replication. There were 116
documents missing in one database. Restarting both couches and trying  
any
combination of replication didn't help. One node was always running on  
100% CPU
and the other machine was completely idle. Once I continued with the  
test
insertions and setting up continuous replication again it resolved  
itself. Have
anyone experienced the same thing?

I'm running CouchDB 0.10.0a810462 (I checked out the 0.10.x branch) on  
FreeBSD
Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:4:4] [rq:4] [async- 
threads:0] [hipe] [kernel-poll:false]

thanks
         -martin

Re: comfy couch?

Posted by Jan Lehnardt <ja...@googlemail.com>.

On 5 Sep 2009, at 19:34, Chris Anderson wrote:

>> Would a full mesh of continuous replication between many nodes pose a
>> problem?
>> So far I've only tested replication between two nodes. I'll add  
>> more nodes
>> to
>> the testbed as I get my hands on more machines.
>>
>> Is there any difference between push and pull mode replication?
>
> Pull replication is a little more efficient than push, due to http
> pipelining etc.

Adam corrected me on that the other day, push replication now can
use bulk doc updates and is more efficient again :)

Cheers
Jan
--
http://couch.io

Re: comfy couch?

Posted by Chris Anderson <jc...@apache.org>.

On Wed, Sep 2, 2009 at 11:33 PM, Martin Hedenfalk<ma...@bzero.se> wrote:
> Hello couchdb users,
>
> We're currently evaluating couchdb for a project and would very much
> appreciate
> any thoughts and insights regarding the suitability of couchdb for this.
>
> We want to use CouchDB on several nodes (say around 12 nodes initially, but
> may
> grow over time). Each node runs our application that handles external
> requests
> by updating the local couchdb database. There is no loadbalancer.
> Continuous replication between all nodes make sure all databases are
> (eventually) consistent. The same document can be updated via different
> nodes,
> so conflicts are expected to happen.
>
> Any node can fail and should be synchronized when it is back online.
>
> We anticipate the system to handle around 1000 transactions per second.
> While
> load is not necessarily distributed evenly between all nodes, let's say
> around
> 100 tps per node.

Everything you've said above, sounds like a sane deployment.

100 - 1000 tps per node should be possible on normal hardware.

Remember that since you aren't partitioning (all nodes eventually see
all updates) you won't get the pattern where each node gets only a
portion of the write load. All writes are eventually performed
everywhere. The multi-master configuration you describe will still
absorb load spikes. Replication is a more efficient writer than most
clients, so you should be able to load the cluster harder than a
single node.

>
> What delay can we expect before a document is replicated to any of the other
> nodes? The couchdb book mentions a "complex algorithm determining the ideal
> moment to replicate". My initial tests (although with only two nodes the
> same
> network segment) indicates replication happens nearly instantly when
> inserting
> some hundred documents per second.
>
> Would a full mesh of continuous replication between many nodes pose a
> problem?
> So far I've only tested replication between two nodes. I'll add more nodes
> to
> the testbed as I get my hands on more machines.
>
> Is there any difference between push and pull mode replication?

Pull replication is a little more efficient than push, due to http
pipelining etc.

>
> There was however one troublesome issue with replication. There were 116
> documents missing in one database. Restarting both couches and trying any
> combination of replication didn't help. One node was always running on 100%
> CPU
> and the other machine was completely idle. Once I continued with the test
> insertions and setting up continuous replication again it resolved itself.
> Have
> anyone experienced the same thing?

I'm not sure what could cause this - if you are able to make it
reproducible, we can make sure it goes away.

Chris

>
> I'm running CouchDB 0.10.0a810462 (I checked out the 0.10.x branch) on
> FreeBSD
> Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:4:4] [rq:4]
> [async-threads:0] [hipe] [kernel-poll:false]
>
> thanks
>        -martin
>
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io