You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Zdravko Gligic <zg...@gmail.com> on 2011/04/05 19:36:31 UTC

Peer-to-Peer Replication

Hi Folks,

Are there any large implementations of CouchDB peer-to-peer
replications or even smaller open source samples?  Actually, the piece
that I am mostly interested in is at the application/design end of how
to go about implementing the "traffic cop" for a use case where
everyone is eventually synchronized with everyone else.

Given a large number of peers that one could replicate to/from, is
there anything within CouchDB that can be "posted centrally" to know
how up to date anyone is, so that badly out of date peers are
replicate to/from the more up to date ones, instead to/from each
other?  What else should I ask, if I knew better ;?)

Thanks,
Zdravko

Re: Peer-to-Peer Replication

Posted by Christian Polzer <ch...@hai-fai.de>.
I will stop answering right now because I am also a beginner with CouchDB,
but finding a way of implementing a Foaf handling for replication would be nice. 
Kind of a trusted net like Gpg...

Regards,
Chris



On 06.04.2011, at 22:44, Zdravko Gligic wrote:

>>> You can configure the replication for continuous replication, they will find each other when
> available. Also, this is a key feature to CouchDB. That's why CouchDB
> for example is working
> so well with mobile devices: It's replicating when it's
> online/connected. Link:
> http://www.couchbase.com/products-and-services/mobile-couchbase
> It has to be remembered, that  (until version 1.2, I think i remember)
> these settings are
> lost upon server restart.
> 
> Does this mean that numerous replications can be set up for a single
> local CouchDB instance.  If so then given a community of 100,000's of
> peers, would then a logical solution be one where each peer was
> grouped into a subset of all of the peers, by some sort of most common
> attribute - such as replicating to/from one's friends - where
> hopefully through the"friends of friends" effect, eventually everyone
> eventually gets updated?  If this is even remotely the case, then what
> would be an optimal number of replications that any one local CouchDB
> should be configured with - 10's, 100's or 1000's of "friend" peers?
> 
>>> CouchDB is all about local data, especially with replication (MVCC).
> 
>>> There are many nice features with CouchDB replication, I would really recommend reading the
> replication section in the CouchDB book.
> 
>>> It is explained very understandable there.
> 
> What I get out of that documentation is that CouchDB is quite
> sophisticated in making replication happen, once you tell it with whom
> it should to/from replicate.  However, I can not find anything that
> expands much on how one would set it up to replicate to/from a large
> pool of potential peers - hence my above questions.
> 
> Thanks again.


Re: Peer-to-Peer Replication

Posted by Owen Marshall <om...@facilityone.com>.
On 04/07/2011 01:30 PM, Zdravko Gligic wrote:

> If a single local CouchDB is set up to replicated from 100 friends and
> after say some prolonged quiet period a message comes through from 1
> of the friends (who has the latest copy with all documents in it) will
> all of the subsequent 99 cause the same volume of network traffic or
> will they end up being more like "hand shakes" in which it is
> determined that none of them have anything new and that no actual data
> should be sent?

Replication looks at the current update sequence number on the database.
If the update sequence number is the same, no further traffic will occur.

See http://wiki.apache.org/couchdb/HTTP_database_API, especially the
section titled "Database Information".

-- 
Owen Marshall
FacilityONE
omarshall@facilityone.com | (502) 805-2126


Re: Peer-to-Peer Replication

Posted by Zdravko Gligic <zg...@gmail.com>.
One final question ...

> A higher amount of peers will require much more configuration and would
> increase network traffic, but would likely decrease the delay for
> getting the data replicated to anyone.

If a single local CouchDB is set up to replicated from 100 friends and
after say some prolonged quiet period a message comes through from 1
of the friends (who has the latest copy with all documents in it) will
all of the subsequent 99 cause the same volume of network traffic or
will they end up being more like "hand shakes" in which it is
determined that none of them have anything new and that no actual data
should be sent?

With that, I thank you all not only for giving me valuable insight but
also for helping me to better formulated my questions. ;)

Re: Peer-to-Peer Replication

Posted by Owen Marshall <om...@facilityone.com>.
On 04/06/2011 04:44 PM, Zdravko Gligic wrote:

> Does this mean that numerous replications can be set up for a single
> local CouchDB instance.  

Absolutely yes.

> If so then given a community of 100,000's of
> peers, would then a logical solution be one where each peer was
> grouped into a subset of all of the peers, by some sort of most common
> attribute - such as replicating to/from one's friends - where
> hopefully through the"friends of friends" effect, eventually everyone
> eventually gets updated?  

That seems logical, yes. Note that you will have to write
application-level routing that tries to decide how to tell CouchDB to sync.

Perhaps your app will loop over _all_dbs
(http://wiki.apache.org/couchdb/HTTP_database_API) and do a push
synchronization to one/more peers. Perhaps it decides on some subset.
Either way, it's up to your app to make this happen.

> If this is even remotely the case, then what
> would be an optimal number of replications that any one local CouchDB
> should be configured with - 10's, 100's or 1000's of "friend" peers?

Again, that depends on your needs.

A lower amount of peers would mean less configuration and network
traffic, but could impose large delays before some peers synchronize
some databases.

A higher amount of peers will require much more configuration and would
increase network traffic, but would likely decrease the delay for
getting the data replicated to anyone.

You could try to find an intermediate position, such as trying to always
replicate to a "supernode", but that again depends on your needs.

Note that we are now heading into network theory land...

> However, I can not find anything that
> expands much on how one would set it up to replicate to/from a large
> pool of potential peers - hence my above questions.

This is application specific, and for good reason.

Your application-specific needs are radically different from mine.
CouchDB gives us both excellent replication, but it also makes both of
us need to fully understand *how* we want that replication to occur.

-- 
Owen Marshall
FacilityONE
omarshall@facilityone.com | (502) 805-2126


Re: Peer-to-Peer Replication

Posted by Zdravko Gligic <zg...@gmail.com>.
>> You can configure the replication for continuous replication, they will find each other when
available. Also, this is a key feature to CouchDB. That's why CouchDB
for example is working
so well with mobile devices: It's replicating when it's
online/connected. Link:
http://www.couchbase.com/products-and-services/mobile-couchbase
It has to be remembered, that  (until version 1.2, I think i remember)
these settings are
lost upon server restart.

Does this mean that numerous replications can be set up for a single
local CouchDB instance.  If so then given a community of 100,000's of
peers, would then a logical solution be one where each peer was
grouped into a subset of all of the peers, by some sort of most common
attribute - such as replicating to/from one's friends - where
hopefully through the"friends of friends" effect, eventually everyone
eventually gets updated?  If this is even remotely the case, then what
would be an optimal number of replications that any one local CouchDB
should be configured with - 10's, 100's or 1000's of "friend" peers?

>> CouchDB is all about local data, especially with replication (MVCC).

>> There are many nice features with CouchDB replication, I would really recommend reading the
replication section in the CouchDB book.

>> It is explained very understandable there.

What I get out of that documentation is that CouchDB is quite
sophisticated in making replication happen, once you tell it with whom
it should to/from replicate.  However, I can not find anything that
expands much on how one would set it up to replicate to/from a large
pool of potential peers - hence my above questions.

Thanks again.

Re: Peer-to-Peer Replication

Posted by Christian Polzer <ch...@hai-fai.de>.
I would suggest reading this:

http://guide.couchdb.org/draft/replication.html

1.) It is a key feature to CouchDB not to rely on centralized (master-slave) replication, but you could build it with CouchDB
     Peer-To-Peer by definition does exclude centralized servers (well, I think).
2.) I think if two CouchDB's cover each other, replication should work as well, as there is just the same port in use for everything.

Regards,
Chris



On 06.04.2011, at 20:13, Zdravko Gligic wrote:

> OK, lets start with some basic questions ;)
> 
> (1) How would two CouchDB go about discovering and meeting each other?
> Would this not require a central server (similar to IRC) and are
> there any hosted solutions?
> 
> (2) Once at least two CouchDB's discover and meet each other, is it
> just their URLs (domain name or IP based) that are needed? What about
> routers and/or firewalls?
> 
> Thanks again.
> 
> 
> On Tue, Apr 5, 2011 at 1:36 PM, Zdravko Gligic <zg...@gmail.com> wrote:
>> Hi Folks,
>> 
>> Are there any large implementations of CouchDB peer-to-peer
>> replications or even smaller open source samples?  Actually, the piece
>> that I am mostly interested in is at the application/design end of how
>> to go about implementing the "traffic cop" for a use case where
>> everyone is eventually synchronized with everyone else.
>> 
>> Given a large number of peers that one could replicate to/from, is
>> there anything within CouchDB that can be "posted centrally" to know
>> how up to date anyone is, so that badly out of date peers are
>> replicate to/from the more up to date ones, instead to/from each
>> other?  What else should I ask, if I knew better ;?)
>> 
>> Thanks,
>> Zdravko
>> 


Re: Peer-to-Peer Replication

Posted by Christian Polzer <ch...@hai-fai.de>.


On 06.04.2011, at 20:50, Zdravko Gligic wrote:

>>> (1) How would two CouchDB go about discovering and meeting each other?
>>>  Would this not require a central server (similar to IRC) and are
>>> there any hosted solutions?
>> 
>> Two instances don't discover -- they depend on *you* telling them to
>> replicate.
> 
> I realize that keeping track of "who is currently online" is something
> that one needs to develop as part of one's application and was in fact
> curious if there were already built solutions for this, either as
> software or even as hosted services ?

You can configure the replication for continuous replication, they will find each other when available. Also, this is a key feature to CouchDB. That's why CouchDB for example is working so well with mobile devices: It's replicating when it's online/connected. Link: http://www.couchbase.com/products-and-services/mobile-couchbase
It has to be remembered, that  (until version 1.2, I think i remember) these settings are lost upon server restart.

> 
>> POST /_replicate HTTP/1.1
>> {"source":"http://foo.bar/database","target":"database"}
>> 
>> Once you've triggered a replication, CouchDB takes over and figures out
>> the differences between the DB revisions on both instances.
>> 
>> Once it is done, it is done -- unless you pass continuous: true. That
>> sets up continuous replication, where the target will watch the source's
>> _changes API and replicate new docs over as needed.
> 
> However, for a bunch of peers, it seems that there would need to be a
> form of a ripple effect, where two peers then go on to discover other
> peers with more up to date CouchDB instances.

CouchDB is all about local data, especially with replication (MVCC). 

There are many nice features with CouchDB replication, I would really recommend reading the replication section in the CouchDB book.

It is explained very understandable there. 



> 
>> For more, see http://guide.couchdb.org/draft/replication.html. It's a
>> great guide.
> 
> That contains an excellent explanation of what happens between two
> peers. However, it seems that any serverless p2p application would
> require lots of sophistication just to determine who needs to connect
> to who, in order for everyone to get updated in most or even
> relatively efficient manner.
> 
>>> (2) Once at least two CouchDB's discover and meet each other, is it
>>> just their URLs (domain name or IP based) that are needed? What about
>>> routers and/or firewalls?
>> 
>> Yep. As long as the instances can reach each other, everything should
>> just work (TM).
> 
> How would 2 CouchDBs communicate to each other if both are in
> different homes, where each one is behind a router with multiple
> computers as CouchDB peers?
> 
>> If you want to look at some other offerings, BigCouch
>> (https://github.com/cloudant/bigcouch) is one that I quite like. But IMO
>> you should get comfortable with what CouchDB provides before you go
>> elsewhere.
> 
> In terms of this p2p replication, what specifically does BigCouch
> offer that is beyond regular CouchDB?
> 
> Thanks again.


Re: Peer-to-Peer Replication

Posted by Zdravko Gligic <zg...@gmail.com>.
>> (1) How would two CouchDB go about discovering and meeting each other?
>>  Would this not require a central server (similar to IRC) and are
>> there any hosted solutions?
>
> Two instances don't discover -- they depend on *you* telling them to
> replicate.

I realize that keeping track of "who is currently online" is something
that one needs to develop as part of one's application and was in fact
curious if there were already built solutions for this, either as
software or even as hosted services ?

> POST /_replicate HTTP/1.1
> {"source":"http://foo.bar/database","target":"database"}
>
> Once you've triggered a replication, CouchDB takes over and figures out
> the differences between the DB revisions on both instances.
>
> Once it is done, it is done -- unless you pass continuous: true. That
> sets up continuous replication, where the target will watch the source's
> _changes API and replicate new docs over as needed.

However, for a bunch of peers, it seems that there would need to be a
form of a ripple effect, where two peers then go on to discover other
peers with more up to date CouchDB instances.

> For more, see http://guide.couchdb.org/draft/replication.html. It's a
> great guide.

That contains an excellent explanation of what happens between two
peers. However, it seems that any serverless p2p application would
require lots of sophistication just to determine who needs to connect
to who, in order for everyone to get updated in most or even
relatively efficient manner.

>> (2) Once at least two CouchDB's discover and meet each other, is it
>> just their URLs (domain name or IP based) that are needed? What about
>> routers and/or firewalls?
>
> Yep. As long as the instances can reach each other, everything should
> just work (TM).

How would 2 CouchDBs communicate to each other if both are in
different homes, where each one is behind a router with multiple
computers as CouchDB peers?

> If you want to look at some other offerings, BigCouch
> (https://github.com/cloudant/bigcouch) is one that I quite like. But IMO
> you should get comfortable with what CouchDB provides before you go
> elsewhere.

In terms of this p2p replication, what specifically does BigCouch
offer that is beyond regular CouchDB?

Thanks again.

Re: Peer-to-Peer Replication

Posted by Owen Marshall <om...@facilityone.com>.
On 04/06/2011 02:13 PM, Zdravko Gligic wrote:

> (1) How would two CouchDB go about discovering and meeting each other?
>  Would this not require a central server (similar to IRC) and are
> there any hosted solutions?

Two instances don't discover -- they depend on *you* telling them to
replicate.

To accomplish this you post a message to /_replicate with a source and a
target:

POST /_replicate HTTP/1.1
{"source":"http://foo.bar/database","target":"database"}

Once you've triggered a replication, CouchDB takes over and figures out
the differences between the DB revisions on both instances.

Once it is done, it is done -- unless you pass continuous: true. That
sets up continuous replication, where the target will watch the source's
_changes API and replicate new docs over as needed.

For more, see http://guide.couchdb.org/draft/replication.html. It's a
great guide.

> (2) Once at least two CouchDB's discover and meet each other, is it
> just their URLs (domain name or IP based) that are needed? What about
> routers and/or firewalls?

Yep. As long as the instances can reach each other, everything should
just work (TM).

If you want to look at some other offerings, BigCouch
(https://github.com/cloudant/bigcouch) is one that I quite like. But IMO
you should get comfortable with what CouchDB provides before you go
elsewhere.

Best,

-- 
Owen Marshall
FacilityONE
omarshall@facilityone.com | (502) 805-2126


Re: Peer-to-Peer Replication

Posted by Zdravko Gligic <zg...@gmail.com>.
OK, lets start with some basic questions ;)

(1) How would two CouchDB go about discovering and meeting each other?
 Would this not require a central server (similar to IRC) and are
there any hosted solutions?

(2) Once at least two CouchDB's discover and meet each other, is it
just their URLs (domain name or IP based) that are needed? What about
routers and/or firewalls?

Thanks again.


On Tue, Apr 5, 2011 at 1:36 PM, Zdravko Gligic <zg...@gmail.com> wrote:
> Hi Folks,
>
> Are there any large implementations of CouchDB peer-to-peer
> replications or even smaller open source samples?  Actually, the piece
> that I am mostly interested in is at the application/design end of how
> to go about implementing the "traffic cop" for a use case where
> everyone is eventually synchronized with everyone else.
>
> Given a large number of peers that one could replicate to/from, is
> there anything within CouchDB that can be "posted centrally" to know
> how up to date anyone is, so that badly out of date peers are
> replicate to/from the more up to date ones, instead to/from each
> other?  What else should I ask, if I knew better ;?)
>
> Thanks,
> Zdravko
>

Re: Peer-to-Peer Replication

Posted by Nebu Pookins <ne...@gmail.com>.
The replication model is such that for every connected graph of peers, all peers in that graph will update to the same up-to-date state. This is what they call "eventual consistentcy". 

It's like in bittorrent, you don't have to worry about the clients with full copies of the file somehow losing data; the replication data is versionned, so your peers will only replicate "forward" in time, not backwards. 

Sent from my iPhone

On 2011-04-05, at 1:36 PM, Zdravko Gligic <zg...@gmail.com> wrote:

> Hi Folks,
> 
> Are there any large implementations of CouchDB peer-to-peer
> replications or even smaller open source samples?  Actually, the piece
> that I am mostly interested in is at the application/design end of how
> to go about implementing the "traffic cop" for a use case where
> everyone is eventually synchronized with everyone else.
> 
> Given a large number of peers that one could replicate to/from, is
> there anything within CouchDB that can be "posted centrally" to know
> how up to date anyone is, so that badly out of date peers are
> replicate to/from the more up to date ones, instead to/from each
> other?  What else should I ask, if I knew better ;?)
> 
> Thanks,
> Zdravko

Re: Peer-to-Peer Replication

Posted by Ian Hobson <ia...@ianhobson.co.uk>.
On 05/04/2011 18:36, Zdravko Gligic wrote:
> Hi Folks,
>
> Are there any large implementations of CouchDB peer-to-peer
> replications or even smaller open source samples?  Actually, the piece
> that I am mostly interested in is at the application/design end of how
> to go about implementing the "traffic cop" for a use case where
> everyone is eventually synchronized with everyone else.
>
> Given a large number of peers that one could replicate to/from, is
> there anything within CouchDB that can be "posted centrally" to know
> how up to date anyone is, so that badly out of date peers are
> replicate to/from the more up to date ones, instead to/from each
> other?  What else should I ask, if I knew better ;?)
>
Hi,

I've been thinking about this and had an idea. IIRC you mention in one 
post that
you may have 100,000 users, each with a couch on their kit.

The idea is this. (and its not peer to peer :) ).

You publish a set of couches, each of which will accept HTTP (port 80) 
replications from your
users. You have enough of these scattered about so your users can find 
one close that
is responsive. It is up to the  user to initiate these replications and 
they are free to use any
published node they like.

By using port 80, and triggering it from the user's end, you will avoid 
most problems
with routers and port forwarding. (I think- needs confirmation).

You also have a central (private) set of nodes. Perhaps 4. The published 
nodes replicate with
these central machines in a round robin fashion, so that each published 
node replicates
with the center 4 times an hour, but with a different machine each time. 
Each uses the same
sequence, but the times are staggered. If the 4 central nodes are A, B, 
C and D, then A
may see published node x at 3 minutes past, B will see X at 15+3 = 18 
minutes past,
C will be replicated with by x at 33 minutes past, and D will receive 
the call at 48
minutes past. Published node Y might call in 22 minutes after X - and it 
would be 22
minutes later for every central node.

These central machines replicate continuously with each other, in a ring.

In normal use, a message dropped at any published node, will get to a 
central node within
15 minutes (average 8), and from there to all the central nodes in a few 
moments.
 From there, it will propagate out to all other nodes within 15 minutes, 
average 8 minutes.
So updates cross the network in an average of about 16 minutes, and a 
maximun of
just over half an hour.

If a published node goes down, the users will switch to another until 
you bring it up
again. Shortly after restarting, it will catch up by replication.

If a single central node goes down, replication still happens, but 
messages that can't be
dropped off or picked up from the lost mode, will be delayed by an extra 
15 minutes.
(possibly twice).  When it returns to life, it will catch up.

You can have more or fewer than 4 central nodes. So long as there is a 
central machine
running, then replication will happen - even if the central group is 
split and cannot
replicate within itself.

Although I have specified an hourly cycle, you could use any time scale 
you like - or you
could change it by database.

Note - There are no direct users of published or central nodes, so those 
nodes do not
have to spend time building indexes.

Regards

Ian





Re: Peer-to-Peer Replication

Posted by Ryan Ramage <ry...@gmail.com>.
> For a local network, there are lots of service discovery protocols

One other thing, each couch may have different dbs on it. So an extra
layer of complexity is not just which couches to wire together, but
what dbs. For a lighthearted example, I would not want my music db
syncing with another persons movies db.

Re: Peer-to-Peer Replication

Posted by Ryan Ramage <ry...@gmail.com>.
I think we are missing the issue. We do all agree that couch is great
at replicating when it has been wired up with src and dest urls.

The issue is more around creating a distributed graph management to
handle nodes (couches) in a peer to peer manner. I don't think this
space has been really explored.

For a local network, there are lots of service discovery protocols
that you could use, something like
http://en.wikipedia.org/wiki/Zero_configuration_networking or
http://en.wikipedia.org/wiki/Universal_Plug_and_Play
but of course this would be outside of what couch does. I think
BigCouch may have something like that built in, but someone from
Cloudant would have to confirm.

For a more p2p system, this is a much harder problem. For one, you
mentioned the network ports. If you are imagining average people using
this, then you would have to deal with managing port forwarding using
Internet Gateway Device Protocol. At least one couch on the end of a
replication would have to be able to be accessed by http through a
router. You would want a user friendly way to do this.

Next is managing the graph. This is hard. No help from couch again.
Nodes going up and down, etc.

It would be fun to see some work done on this. For extra points it
would be cool if it where done in erlang and could be contributed into
the couch core :)

Ryan




On Thu, Apr 7, 2011 at 9:17 AM, Nebu Pookins <ne...@gmail.com> wrote:
> On Wed, Apr 6, 2011 at 5:47 PM, Zdravko Gligic <zg...@gmail.com> wrote:
>>
>> In it's simplest form, consider a large community of members or peers,
>> in which each member subscribes to one or more of a dozen CouchDB
>> databases.
>
> Each peer would have their own CouchDB. So if you have a thousand
> peers, there are a thousand CouchDB instances, each with their local
> copy of the database.
>
>>
>> Within each database, community members could post documents, comment
>> and/or take other actions on any one doc.  However, each of these
>> actions would not be an update to the original docs but would rather
>> be creations of new docs.
>>
>> The end result should be a situation in which each subscriber
>> eventually ends up with all of the documents in their corresponding
>> local CouchDB copy.
>
> This is pretty much the default behaviour you get from CouchDB out of
> the box. The only issue is you need to make each instance aware of at
> least one other instance. And these connections should form a
> connected graph if you want everyone to see everyone else's changes.
> To "make aware" a particular CouchDB, you simply instruct it to
> replicate against a specific other instance (by providing a URL to the
> other instance).
>
> - Nebu
>

Re: Peer-to-Peer Replication

Posted by Nebu Pookins <ne...@gmail.com>.
On Wed, Apr 6, 2011 at 5:47 PM, Zdravko Gligic <zg...@gmail.com> wrote:
>
> In it's simplest form, consider a large community of members or peers,
> in which each member subscribes to one or more of a dozen CouchDB
> databases.

Each peer would have their own CouchDB. So if you have a thousand
peers, there are a thousand CouchDB instances, each with their local
copy of the database.

>
> Within each database, community members could post documents, comment
> and/or take other actions on any one doc.  However, each of these
> actions would not be an update to the original docs but would rather
> be creations of new docs.
>
> The end result should be a situation in which each subscriber
> eventually ends up with all of the documents in their corresponding
> local CouchDB copy.

This is pretty much the default behaviour you get from CouchDB out of
the box. The only issue is you need to make each instance aware of at
least one other instance. And these connections should form a
connected graph if you want everyone to see everyone else's changes.
To "make aware" a particular CouchDB, you simply instruct it to
replicate against a specific other instance (by providing a URL to the
other instance).

- Nebu

Re: Peer-to-Peer Replication

Posted by Zdravko Gligic <zg...@gmail.com>.
> What exactly are you wanting to accomplish here?

In it's simplest form, consider a large community of members or peers,
in which each member subscribes to one or more of a dozen CouchDB
databases.

Within each database, community members could post documents, comment
and/or take other actions on any one doc.  However, each of these
actions would not be an update to the original docs but would rather
be creations of new docs.

The end result should be a situation in which each subscriber
eventually ends up with all of the documents in their corresponding
local CouchDB copy.

> Whichever document has the largest _rev count is the most recent.

For simplicity, lets take this part out of the discussion by assuming
that any one document would be modifiable and/or deletable only by its
original author/poster.

Re: Peer-to-Peer Replication

Posted by Owen Marshall <om...@facilityone.com>.
On 04/06/2011 04:24 PM, Zdravko Gligic wrote:

> *snip*

I've got the gut feeling based on your questions that you want us to be
telling you about clustering :) Do note that clustering is orthogonal to
replication.

What exactly are you wanting to accomplish here?

>> CouchDB doesn't ensure that the latest revision wins -- you are expected
>> to resolve conflicts in a way that makes sense for your application.
> 
> I understand this in context of revisions to a single document.
> However, I was more curious about how it internally determined which
> of 2 arbitrary peers had a more recent and up to date copy.

Whichever document has the largest _rev count is the most recent.

Thus, "2-2a3c(...)" will win over "1-a41b(...)" because 2 > 1.

If however a node encounters a two different documents with the same
_rev count, as such:

> D._rev = "2-23863ae70f0e2477e9ea3664b38eab4b"
> D'._rev = "2-b967c531a44dd2887fb4d51c35003476"

CouchDB will pick the revision with the highest UUID component (D').

This is merely to ensure consistency of documents. It says nothing about
whether or not D' was better than D, nor does it try to merge D into D',
etc. Put simply, your application *must not* rely on _rev. It's only for
CouchDB replication.

Instead, your application should watch for conflicts ("_conflicts":true)
and handle them in whatever way makes sense.

As an example: one part of an application may walk through every
conflict and merge it back into a new revision, running everything
through some business logic and possibly creating intermediate states --
and for other parts, you might just keep the winning message and discard
the others. It all depends on what _you_ need.

-- 
Owen Marshall
FacilityONE
omarshall@facilityone.com | (502) 805-2126


Re: Peer-to-Peer Replication

Posted by Christian Polzer <ch...@hai-fai.de>.
Regards,
Chris



On 06.04.2011, at 22:24, Zdravko Gligic wrote:

>> *You* make the graph -- not CouchDB!
> 
> Given a large number of peers, could this not be a daunting task - to
> ensure that everyone gets eventually updated in a relatively efficient
> and timely manner?
> 
>> CouchDB will follow your instructions. No more or no less.
> 
> Am I correct in my interpretation of documentation that regardless of
> the overall design, replication is always between 2 nodes - a source
> and a target? In other words there is no way to throw at CouchDB
> multile nodes as sources and/or destinations and it would magically
> keep them all updated.
> 

I am now quoting from my soon to be finished thesis (yay! :-) ):
<quote>
Replication in CouchDB can be configured in multiple ways:
* Replication can be pushed or pulled. This is very handy for replication of mobile databases, where no fixed IP can be provided and the telecommunication providers may prohibit the use of dynamic DNS. Replication can just happen by pushing it towards other nodes.
* Replication can be configured for Master-Slave or Peer-to-Peer.
* Replication can be configured for continuous replication or just be one-time triggered by the application if needed. Until version 1.2 of CouchDB is released, replication settings are lost upon restarting CouchDB.
* Databases within one CouchDB can be replicated. This might be useful to keep a copy of a database on another hard-drive within one CouchDB node.
* Replication can be filtered so not all information is replication. 
* Documents can be named for replication.
</quote>


I agree on the thaught 
 "to throw at CouchDB multile nodes as sources and/or destinations and it would magically keep them all updated."

There is a feature coming for CouchDB with 1.2 that includes a dedicated document (or database?) about replication settings. It would be nice to be able to replicate that one as well...


>> CouchDB doesn't ensure that the latest revision wins -- you are expected
>> to resolve conflicts in a way that makes sense for your application.
> 
> I understand this in context of revisions to a single document.
> However, I was more curious about how it internally determined which
> of 2 arbitrary peers had a more recent and up to date copy.
> 



> Thanks again.


Re: Peer-to-Peer Replication

Posted by Zdravko Gligic <zg...@gmail.com>.
> *You* make the graph -- not CouchDB!

Given a large number of peers, could this not be a daunting task - to
ensure that everyone gets eventually updated in a relatively efficient
and timely manner?

> CouchDB will follow your instructions. No more or no less.

Am I correct in my interpretation of documentation that regardless of
the overall design, replication is always between 2 nodes - a source
and a target? In other words there is no way to throw at CouchDB
multile nodes as sources and/or destinations and it would magically
keep them all updated.

> CouchDB doesn't ensure that the latest revision wins -- you are expected
> to resolve conflicts in a way that makes sense for your application.

I understand this in context of revisions to a single document.
However, I was more curious about how it internally determined which
of 2 arbitrary peers had a more recent and up to date copy.

Thanks again.

Re: Peer-to-Peer Replication

Posted by Owen Marshall <om...@facilityone.com>.
On 04/06/2011 03:00 PM, Zdravko Gligic wrote:
>> The replication model is such that for every connected graph of peers, all peers in that graph
>> will update to the same up-to-date state. This is what they call
>> "eventual consistentcy".
> 
> Replication documentation seems to talk about replication between 2
> nodes (a source and a target) with 2 specific URSs and not any magical
> "graph of peers".  However, if there is such magic then where do I
> find some more info?

*You* make the graph -- not CouchDB!

Say you would like to place the same database on five CouchDB nodes. You
are free to decide how you want those nodes connected (or not).

For example, you could:
* set up a star network and make every node replicate from N1.
* set up a mesh, partially *or* fully connected
* continuously replicate on some nodes and replicate on demand on others

CouchDB will follow your instructions. No more or no less.

>> It's like in bittorrent, you don't have to worry about the clients with full copies of the
>> file somehow losing data; the replication data is versionned, so your
>> peers will only replicate
>> "forward" in time, not backwards.
> 
> I did find and read some info on how CouchDB keeps 2 indices, one by
> doc ID and a second being a numeric sequence that is specifically for
> replication purposes.  Since these are local sequences, I am a bit
> curious how it determines which one is later (as higher numbered one
> could be older?) but since I am painting by numbers here, I can let go
> of that curiosity and simply assume that it magically happens.

CouchDB doesn't ensure that the latest revision wins -- you are expected
to resolve conflicts in a way that makes sense for your application.

It will ensure that the *same* revision is *always* displayed as the
winner across all peers.

-- 
Owen Marshall
FacilityONE
omarshall@facilityone.com | (502) 805-2126


Re: Peer-to-Peer Replication

Posted by Zdravko Gligic <zg...@gmail.com>.
>>The replication model is such that for every connected graph of peers, all peers in that graph
will update to the same up-to-date state. This is what they call
"eventual consistentcy".

Replication documentation seems to talk about replication between 2
nodes (a source and a target) with 2 specific URSs and not any magical
"graph of peers".  However, if there is such magic then where do I
find some more info?

>>It's like in bittorrent, you don't have to worry about the clients with full copies of the
file somehow losing data; the replication data is versionned, so your
peers will only replicate
"forward" in time, not backwards.

I did find and read some info on how CouchDB keeps 2 indices, one by
doc ID and a second being a numeric sequence that is specifically for
replication purposes.  Since these are local sequences, I am a bit
curious how it determines which one is later (as higher numbered one
could be older?) but since I am painting by numbers here, I can let go
of that curiosity and simply assume that it magically happens.

Thanks Nebu.