You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by david rene comba lareu <sh...@gmail.com> on 2014/08/26 22:09:23 UTC

best practices on replication? recommended rev_limit value? network config?

Hi,

i'm a new user of couchdb. my company is developing a SaaS app that
relies completely on json manifest to work, so couchdb was perfect for
the task. We expect a heavy load (100K users), so the replication is a
very important feature for us and like it was promoted that
replication was easy on couchdb, we finally decided to use it.

Before subscribing to the mailing list, i supposed that master ->
master replication was a good option, removing the failure point of
having only one write master at a time, but i just saw that exist's
"leaf"  revisions where the data is not consistent between masters.

so i have a couple of questions, about this:

1) what is the best setup to assure consistency? write-only masters to
read-only slaves like common node setups? Even that performance is
really important we need to prevail consistency on top of everything.
2) we don't need revs at all, all changes are final. reducing
rev_limit value has a positive impact on performance? if it has, what
is the recommended value?
3) like the wiki said that ssl was not supported correctly by erlang,
we set up an haproxy in the frontend that forward the request to the
couchdb http. like is the first time we work on a DB system with an
http frontend (and not a permanent connection like mysql or redis)
what is the recommended setup in terms of network? (like timeout, keep
alive options etc..). any documentation about this is useful.

Any advice regarding on this is highly appreciated !

Regards.

Re: best practices on replication? recommended rev_limit value? network config?

Posted by david rene comba lareu <sh...@gmail.com>.
Thanks for the answers hector !

1) Understood. now, under 100K users with around 1M-3M documents the
manual approach is not possible. Exist any way to make a second call
to the couchdb to instruct that the revision made by the origin server
is the one that should prevail? or a daemon?

2) i will leave it to 1000 for now then, and see on productions how it goes

3) me too :P

2014-08-26 18:18 GMT-03:00 Sanjuan, Hector <he...@here.com>:
> Hi,
>
> you got it wrong. There is consistency (eventual consistency). Every node agrees that one of the leaf revisions is the "active" one and they all show the same result when queried (eventually). So,
>
> 1) Eventual consistency is ensured by Couchdb design. You just have to take care of resolving conflicts manually (eventually correcting Couchdb assumption on which document version prevails).
>
> 2) You don't want low values, because during replication that would mean you are assuming that your nodes are replicating nicely all the time, without network interruptions or limitations. If your rev_limit is low and you miss to repilicate a number of revisions higher than the limit for whatever reason, you'll run into trouble. If you are constantly writing the same document many times, the rev_limit should be high. If your writes operations and spread around a large number of documents, then the revision numbers won't grow so quickly so you could lower it. I'm not sure this will save anything than disk space.
>
> 3) I'm not sure about that... and I'd like to hear the answer :)
>
> Hector
> ________________________________________
> From: david rene comba lareu <sh...@gmail.com>
> Sent: Tuesday, August 26, 2014 22:09
> To: user@couchdb.apache.org
> Subject: best practices on replication? recommended rev_limit value? network config?
>
> Hi,
>
> i'm a new user of couchdb. my company is developing a SaaS app that
> relies completely on json manifest to work, so couchdb was perfect for
> the task. We expect a heavy load (100K users), so the replication is a
> very important feature for us and like it was promoted that
> replication was easy on couchdb, we finally decided to use it.
>
> Before subscribing to the mailing list, i supposed that master ->
> master replication was a good option, removing the failure point of
> having only one write master at a time, but i just saw that exist's
> "leaf"  revisions where the data is not consistent between masters.
>
> so i have a couple of questions, about this:
>
> 1) what is the best setup to assure consistency? write-only masters to
> read-only slaves like common node setups? Even that performance is
> really important we need to prevail consistency on top of everything.
> 2) we don't need revs at all, all changes are final. reducing
> rev_limit value has a positive impact on performance? if it has, what
> is the recommended value?
> 3) like the wiki said that ssl was not supported correctly by erlang,
> we set up an haproxy in the frontend that forward the request to the
> couchdb http. like is the first time we work on a DB system with an
> http frontend (and not a permanent connection like mysql or redis)
> what is the recommended setup in terms of network? (like timeout, keep
> alive options etc..). any documentation about this is useful.
>
> Any advice regarding on this is highly appreciated !
>
> Regards.

RE: best practices on replication? recommended rev_limit value? network config?

Posted by "Sanjuan, Hector" <he...@here.com>.
Hi,

you got it wrong. There is consistency (eventual consistency). Every node agrees that one of the leaf revisions is the "active" one and they all show the same result when queried (eventually). So,

1) Eventual consistency is ensured by Couchdb design. You just have to take care of resolving conflicts manually (eventually correcting Couchdb assumption on which document version prevails).

2) You don't want low values, because during replication that would mean you are assuming that your nodes are replicating nicely all the time, without network interruptions or limitations. If your rev_limit is low and you miss to repilicate a number of revisions higher than the limit for whatever reason, you'll run into trouble. If you are constantly writing the same document many times, the rev_limit should be high. If your writes operations and spread around a large number of documents, then the revision numbers won't grow so quickly so you could lower it. I'm not sure this will save anything than disk space.

3) I'm not sure about that... and I'd like to hear the answer :)

Hector
________________________________________
From: david rene comba lareu <sh...@gmail.com>
Sent: Tuesday, August 26, 2014 22:09
To: user@couchdb.apache.org
Subject: best practices on replication? recommended rev_limit value? network config?

Hi,

i'm a new user of couchdb. my company is developing a SaaS app that
relies completely on json manifest to work, so couchdb was perfect for
the task. We expect a heavy load (100K users), so the replication is a
very important feature for us and like it was promoted that
replication was easy on couchdb, we finally decided to use it.

Before subscribing to the mailing list, i supposed that master ->
master replication was a good option, removing the failure point of
having only one write master at a time, but i just saw that exist's
"leaf"  revisions where the data is not consistent between masters.

so i have a couple of questions, about this:

1) what is the best setup to assure consistency? write-only masters to
read-only slaves like common node setups? Even that performance is
really important we need to prevail consistency on top of everything.
2) we don't need revs at all, all changes are final. reducing
rev_limit value has a positive impact on performance? if it has, what
is the recommended value?
3) like the wiki said that ssl was not supported correctly by erlang,
we set up an haproxy in the frontend that forward the request to the
couchdb http. like is the first time we work on a DB system with an
http frontend (and not a permanent connection like mysql or redis)
what is the recommended setup in terms of network? (like timeout, keep
alive options etc..). any documentation about this is useful.

Any advice regarding on this is highly appreciated !

Regards.