You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Steve Koppelman <st...@gmail.com> on 2012/10/04 17:04:54 UTC

Simple load-balancing replication best practices

Assuming a hubless (i.e. not master-slave) set of 4 couchdb 1.2.0
servers behind a load balancer, is there a recommended best-practice
for setting up the replication relationships? I'm most interested in:

* Assuming the _replicator document is on one of the two nodes in a
relationship, is there a preference for push vs. pull replication
relationships? I seem to recall pull as being regarded as more
reliable than push through 1.1.1.

* The new docs highlight replication of the _replicator database as a
way to establish many-to-many replication. This raises two questions.

  1. Is there harm in this sort of cluster to have all members to pull
from one another, i.e., all of
A->B
A->C
B->A
B->C
C ->A

  2. Is there harm in full replication of _replicator if it results in
documents that point a node to itself?  That is, if I have a document
that specifies a source of "localhost" and a destination as "node B",
if this is replicated to node B this particular instance of the
_replicator doc would set up an instance to replicate to itself, which
doesn't sound good. Is it important to do filtered replication of
_replicator when taking this approach?

Rgds, etc.

-sk

Re: Simple load-balancing replication best practices

Posted by Martin Hewitt <ma...@thenoi.se>.

We arrange our replication with all servers replicating to all others, which is higher in network terms, but more reliable in terms of sudden failure. 

In my experience, if you try and create a replication to itself, it won't cause any problems, it'll simply move to "completed", but I haven't tried that recently.

I do know that replication jobs are idempotent - i.e. if you have a replication job running and try and create another that's to/from the same database, it won't create another job, and it'll return the _id of the existing job. 

Martin 

On Thursday, 4 October 2012 at 16:04, Steve Koppelman wrote:

> Assuming a hubless (i.e. not master-slave) set of 4 couchdb 1.2.0
> servers behind a load balancer, is there a recommended best-practice
> for setting up the replication relationships? I'm most interested in:
> 
> * Assuming the _replicator document is on one of the two nodes in a
> relationship, is there a preference for push vs. pull replication
> relationships? I seem to recall pull as being regarded as more
> reliable than push through 1.1.1.
> 
> * The new docs highlight replication of the _replicator database as a
> way to establish many-to-many replication. This raises two questions.
> 
> 1. Is there harm in this sort of cluster to have all members to pull
> from one another, i.e., all of
> A->B
> A->C
> B->A
> B->C
> C ->A
> 
> 2. Is there harm in full replication of _replicator if it results in
> documents that point a node to itself? That is, if I have a document
> that specifies a source of "localhost" and a destination as "node B",
> if this is replicated to node B this particular instance of the
> _replicator doc would set up an instance to replicate to itself, which
> doesn't sound good. Is it important to do filtered replication of
> _replicator when taking this approach?
> 
> Rgds, etc.
> 
> -sk

Re: Simple load-balancing replication best practices

Posted by Bob Dionne <di...@dionne-associates.com>.

https://github.com/cloudant/couch_replicator/commit/7feec1bd998264dd8

sorry, I should wait until I've had coffee :)

On Oct 7, 2012, at 8:32 AM, Octavian Damiean <ma...@gmail.com> wrote:

> I'd be interested to read that if you insert the URL too Bob. :)
> 
> On Sun, Oct 7, 2012 at 2:22 PM, Bob Dionne <di...@dionne-associates.com>wrote:
> 
>> There used to be an advantage to using PULL but that's no longer the case.
>> 
>> However PULL replications are a bit more stable when attachments are
>> involved, so I'd recommend them over PUSH. I've described the problem
>> here[1] in BigCouch if you're interested in the details.
>> 
>> On Oct 7, 2012, at 5:04 AM, Nick North <no...@gmail.com> wrote:
>> 
>>> I'm also interested in whether there is a preference for push or pull
>> with
>>> CouchDb 1.2. I have a full-mesh replication setup using pull replication,
>>> but have no idea whether push might be better in some way. Is there a
>>> replication guru out there who could enlighten us?
>>> Nick
>>> On 4 October 2012 17:48, Dave Cottlehuber <dc...@jsonified.com> wrote:
>>> 
>>>> On 4 October 2012 17:04, Steve Koppelman <st...@gmail.com>
>>>> wrote:
>>>>> Assuming a hubless (i.e. not master-slave) set of 4 couchdb 1.2.0
>>>>> servers behind a load balancer, is there a recommended best-practice
>>>>> for setting up the replication relationships? I'm most interested in:
>>>>> 
>>>>> * Assuming the _replicator document is on one of the two nodes in a
>>>>> relationship, is there a preference for push vs. pull replication
>>>>> relationships? I seem to recall pull as being regarded as more
>>>>> reliable than push through 1.1.1.
>>>> 
>>>> Hope somebody else comments on this, I'm interested to know if this
>>>> still makes a difference.
>>>> 
>>>> 
>>>> 
>> 
>>

Re: Simple load-balancing replication best practices

Posted by Octavian Damiean <ma...@gmail.com>.

I'd be interested to read that if you insert the URL too Bob. :)

On Sun, Oct 7, 2012 at 2:22 PM, Bob Dionne <di...@dionne-associates.com>wrote:

> There used to be an advantage to using PULL but that's no longer the case.
>
> However PULL replications are a bit more stable when attachments are
> involved, so I'd recommend them over PUSH. I've described the problem
> here[1] in BigCouch if you're interested in the details.
>
> On Oct 7, 2012, at 5:04 AM, Nick North <no...@gmail.com> wrote:
>
> > I'm also interested in whether there is a preference for push or pull
> with
> > CouchDb 1.2. I have a full-mesh replication setup using pull replication,
> > but have no idea whether push might be better in some way. Is there a
> > replication guru out there who could enlighten us?
> > Nick
> > On 4 October 2012 17:48, Dave Cottlehuber <dc...@jsonified.com> wrote:
> >
> >> On 4 October 2012 17:04, Steve Koppelman <st...@gmail.com>
> >> wrote:
> >>> Assuming a hubless (i.e. not master-slave) set of 4 couchdb 1.2.0
> >>> servers behind a load balancer, is there a recommended best-practice
> >>> for setting up the replication relationships? I'm most interested in:
> >>>
> >>> * Assuming the _replicator document is on one of the two nodes in a
> >>> relationship, is there a preference for push vs. pull replication
> >>> relationships? I seem to recall pull as being regarded as more
> >>> reliable than push through 1.1.1.
> >>
> >> Hope somebody else comments on this, I'm interested to know if this
> >> still makes a difference.
> >>
> >>
> >>
>
>

Re: Simple load-balancing replication best practices

Posted by Bob Dionne <di...@dionne-associates.com>.

There used to be an advantage to using PULL but that's no longer the case. 

However PULL replications are a bit more stable when attachments are involved, so I'd recommend them over PUSH. I've described the problem here[1] in BigCouch if you're interested in the details.

On Oct 7, 2012, at 5:04 AM, Nick North <no...@gmail.com> wrote:

> I'm also interested in whether there is a preference for push or pull with
> CouchDb 1.2. I have a full-mesh replication setup using pull replication,
> but have no idea whether push might be better in some way. Is there a
> replication guru out there who could enlighten us?
> Nick
> On 4 October 2012 17:48, Dave Cottlehuber <dc...@jsonified.com> wrote:
> 
>> On 4 October 2012 17:04, Steve Koppelman <st...@gmail.com>
>> wrote:
>>> Assuming a hubless (i.e. not master-slave) set of 4 couchdb 1.2.0
>>> servers behind a load balancer, is there a recommended best-practice
>>> for setting up the replication relationships? I'm most interested in:
>>> 
>>> * Assuming the _replicator document is on one of the two nodes in a
>>> relationship, is there a preference for push vs. pull replication
>>> relationships? I seem to recall pull as being regarded as more
>>> reliable than push through 1.1.1.
>> 
>> Hope somebody else comments on this, I'm interested to know if this
>> still makes a difference.
>> 
>> 
>>

Re: Simple load-balancing replication best practices

Posted by Nick North <no...@gmail.com>.

I'm also interested in whether there is a preference for push or pull with
CouchDb 1.2. I have a full-mesh replication setup using pull replication,
but have no idea whether push might be better in some way. Is there a
replication guru out there who could enlighten us?
Nick
On 4 October 2012 17:48, Dave Cottlehuber <dc...@jsonified.com> wrote:

> On 4 October 2012 17:04, Steve Koppelman <st...@gmail.com>
> wrote:
> > Assuming a hubless (i.e. not master-slave) set of 4 couchdb 1.2.0
> > servers behind a load balancer, is there a recommended best-practice
> > for setting up the replication relationships? I'm most interested in:
> >
> > * Assuming the _replicator document is on one of the two nodes in a
> > relationship, is there a preference for push vs. pull replication
> > relationships? I seem to recall pull as being regarded as more
> > reliable than push through 1.1.1.
>
> Hope somebody else comments on this, I'm interested to know if this
> still makes a difference.
>
>
>

Re: Simple load-balancing replication best practices

Posted by Dave Cottlehuber <dc...@jsonified.com>.

On 4 October 2012 23:47, Steve Koppelman <st...@gmail.com> wrote:
> On Thu, Oct 4, 2012 at 3:52 PM, stephen bartell <sn...@gmail.com> wrote:
>
>> >>  1. Is there harm in this sort of cluster to have all members to pull
>> >> from one another, i.e., all of
>> >> A->B
>> >> A->C
>> >> B->A
>> >> B->C
>> >> C ->A
>> >
>> > Multimaster Meshed Magic :-)
>>
>> … only if _id's are unique.
>> AFAIK, you need to address conflict resolution if _ids are similar.
>>
>
> Good point there, since some of the databases have app-generated _ids,
> which is fine on a single master, not necessarily so fine when moving to a
> setup with distributed writes.

Then add something like the first 5 chars of md5 of the hostname or similar.
 Or use bigcouch.

A+
Dave

Re: Simple load-balancing replication best practices

Posted by Steve Koppelman <st...@gmail.com>.

On Thu, Oct 4, 2012 at 3:52 PM, stephen bartell <sn...@gmail.com> wrote:

> >>  1. Is there harm in this sort of cluster to have all members to pull
> >> from one another, i.e., all of
> >> A->B
> >> A->C
> >> B->A
> >> B->C
> >> C ->A
> >
> > Multimaster Meshed Magic :-)
>
> … only if _id's are unique.
> AFAIK, you need to address conflict resolution if _ids are similar.
>

Good point there, since some of the databases have app-generated _ids,
which is fine on a single master, not necessarily so fine when moving to a
setup with distributed writes.

Re: Simple load-balancing replication best practices

Posted by stephen bartell <sn...@gmail.com>.

On Oct 4, 2012, at 9:48 AM, Dave Cottlehuber wrote:

> On 4 October 2012 17:04, Steve Koppelman <st...@gmail.com> wrote:
>> Assuming a hubless (i.e. not master-slave) set of 4 couchdb 1.2.0
>> servers behind a load balancer, is there a recommended best-practice
>> for setting up the replication relationships? I'm most interested in:
>> 
>> * Assuming the _replicator document is on one of the two nodes in a
>> relationship, is there a preference for push vs. pull replication
>> relationships? I seem to recall pull as being regarded as more
>> reliable than push through 1.1.1.
> 
> Hope somebody else comments on this, I'm interested to know if this
> still makes a difference.
> 
>> * The new docs highlight replication of the _replicator database as a
>> way to establish many-to-many replication. This raises two questions.
>> 
>>  1. Is there harm in this sort of cluster to have all members to pull
>> from one another, i.e., all of
>> A->B
>> A->C
>> B->A
>> B->C
>> C ->A
> 
> Multimaster Meshed Magic :-)

… only if _id's are unique.
AFAIK, you need to address conflict resolution if _ids are similar.

> 
>>  2. Is there harm in full replication of _replicator if it results in
>> documents that point a node to itself?  That is, if I have a document
>> that specifies a source of "localhost" and a destination as "node B",
>> if this is replicated to node B this particular instance of the
>> _replicator doc would set up an instance to replicate to itself, which
>> doesn't sound good. Is it important to do filtered replication of
>> _replicator when taking this approach?
> 
> Should be fine.
> 
>> Rgds, etc.
>> 
>> -sk
> 
> You might want to look at BigCouch which handles a lot of this sort of
> stuff for you, as well as sharded views. But the feature set isn't
> quite parity yet.
> 
> A+
> Dave


Stephen Bartell

"The significant problems we face cannot be solved at the same level of thinking we were at when we created them." -Einstein

Re: Simple load-balancing replication best practices

Posted by Dave Cottlehuber <dc...@jsonified.com>.

On 4 October 2012 17:04, Steve Koppelman <st...@gmail.com> wrote:
> Assuming a hubless (i.e. not master-slave) set of 4 couchdb 1.2.0
> servers behind a load balancer, is there a recommended best-practice
> for setting up the replication relationships? I'm most interested in:
>
> * Assuming the _replicator document is on one of the two nodes in a
> relationship, is there a preference for push vs. pull replication
> relationships? I seem to recall pull as being regarded as more
> reliable than push through 1.1.1.

Hope somebody else comments on this, I'm interested to know if this
still makes a difference.

> * The new docs highlight replication of the _replicator database as a
> way to establish many-to-many replication. This raises two questions.
>
>   1. Is there harm in this sort of cluster to have all members to pull
> from one another, i.e., all of
> A->B
> A->C
> B->A
> B->C
> C ->A

Multimaster Meshed Magic :-)

>   2. Is there harm in full replication of _replicator if it results in
> documents that point a node to itself?  That is, if I have a document
> that specifies a source of "localhost" and a destination as "node B",
> if this is replicated to node B this particular instance of the
> _replicator doc would set up an instance to replicate to itself, which
> doesn't sound good. Is it important to do filtered replication of
> _replicator when taking this approach?

Should be fine.

> Rgds, etc.
>
> -sk

You might want to look at BigCouch which handles a lot of this sort of
stuff for you, as well as sharded views. But the feature set isn't
quite parity yet.

A+
Dave