You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Aaron Boxer <bo...@gmail.com> on 2011/02/15 16:01:08 UTC

interested in learning about replication algorithm

Hello,
I am new to couchdb, and I would like to learn as much as I can about
the replication algorithm, short of reading the code. Can anyone suggest
some resources? The more technical, the better.

In particular, I would like to understand how conflicts are resolved.

Thanks!

Aaron

Re: interested in learning about replication algorithm

Posted by Aaron Boxer <bo...@gmail.com>.
ahhhhh, I see. So the client resolves a conflict, and stores a new
version of the document
with conflicts resolved.


On Tue, Feb 15, 2011 at 10:18 AM, Robert Newson <ro...@gmail.com> wrote:
> None, couchdb performs no conflict resolution, merely detection and
> preservation of conflicts.
>
> :)
>
> B.
>
> On 15 February 2011 15:15, Aaron Boxer <bo...@gmail.com> wrote:
>> Thanks, Robert.
>> Of the 64 source files, which one(s) manage conflict resolution?
>>
>>
>> On Tue, Feb 15, 2011 at 10:02 AM, Robert Newson <ro...@gmail.com> wrote:
>>> Read the code, it's the most technical and therefore best way. ;)
>>>
>>> On 15 February 2011 15:01, Aaron Boxer <bo...@gmail.com> wrote:
>>>> Hello,
>>>> I am new to couchdb, and I would like to learn as much as I can about
>>>> the replication algorithm, short of reading the code. Can anyone suggest
>>>> some resources? The more technical, the better.
>>>>
>>>> In particular, I would like to understand how conflicts are resolved.
>>>>
>>>> Thanks!
>>>>
>>>> Aaron
>>>>
>>>
>>
>

Re: interested in learning about replication algorithm

Posted by Robert Newson <ro...@gmail.com>.
None, couchdb performs no conflict resolution, merely detection and
preservation of conflicts.

:)

B.

On 15 February 2011 15:15, Aaron Boxer <bo...@gmail.com> wrote:
> Thanks, Robert.
> Of the 64 source files, which one(s) manage conflict resolution?
>
>
> On Tue, Feb 15, 2011 at 10:02 AM, Robert Newson <ro...@gmail.com> wrote:
>> Read the code, it's the most technical and therefore best way. ;)
>>
>> On 15 February 2011 15:01, Aaron Boxer <bo...@gmail.com> wrote:
>>> Hello,
>>> I am new to couchdb, and I would like to learn as much as I can about
>>> the replication algorithm, short of reading the code. Can anyone suggest
>>> some resources? The more technical, the better.
>>>
>>> In particular, I would like to understand how conflicts are resolved.
>>>
>>> Thanks!
>>>
>>> Aaron
>>>
>>
>

Re: interested in learning about replication algorithm

Posted by Aaron Boxer <bo...@gmail.com>.
Thanks, Robert.
Of the 64 source files, which one(s) manage conflict resolution?


On Tue, Feb 15, 2011 at 10:02 AM, Robert Newson <ro...@gmail.com> wrote:
> Read the code, it's the most technical and therefore best way. ;)
>
> On 15 February 2011 15:01, Aaron Boxer <bo...@gmail.com> wrote:
>> Hello,
>> I am new to couchdb, and I would like to learn as much as I can about
>> the replication algorithm, short of reading the code. Can anyone suggest
>> some resources? The more technical, the better.
>>
>> In particular, I would like to understand how conflicts are resolved.
>>
>> Thanks!
>>
>> Aaron
>>
>

Re: interested in learning about replication algorithm

Posted by Robert Newson <ro...@gmail.com>.
Read the code, it's the most technical and therefore best way. ;)

On 15 February 2011 15:01, Aaron Boxer <bo...@gmail.com> wrote:
> Hello,
> I am new to couchdb, and I would like to learn as much as I can about
> the replication algorithm, short of reading the code. Can anyone suggest
> some resources? The more technical, the better.
>
> In particular, I would like to understand how conflicts are resolved.
>
> Thanks!
>
> Aaron
>

Re: interested in learning about replication algorithm

Posted by Randall Leeds <ra...@gmail.com>.
The algorithm at a high level goes like this:

Get some changes
Check them for revisions that are missing from the target
Pull/Push the missing revisions
Repeat

These functions map to the public API in an obvious way:
/_changes
/_missing_revs
/_all_docs or /_bulk_docs

For any production-ready replicator you'll probably want to checkpoint
your progress.
CouchDB stores checkpoints in documents prefixed with "_local/".
Docs named this way don't replicate, don't show up in views, etc.
Good for internal metadata stuff like this.

Stable checkpointing requires that, up to a certain sequence, all
updates must be flushed to disk on both sides.
Currently this is accomplished with a separate POST to /_ensure_full_commit.
Couch also honors a header on document update requests called
X-Couch-Full-Commit.

Almost all the replicator code is contained in couch_rep* or couch_replicator*.
The latter is the new replicator code by Filipe, which may have some
dependency on the old code (I'm not sure).

That should be enough to get you started.

-Randall

On Tue, Feb 15, 2011 at 18:51, Aaron Boxer <bo...@gmail.com> wrote:
> Thanks, guys! I guess I need to dig into the actual code.
>
> I would like to implement a similar algorithm in C, for another project
> I am working on.
>
>
>
> On Tue, Feb 15, 2011 at 5:48 PM, Robert Newson <ro...@gmail.com> wrote:
>> It's worth mentioning that, like git, the hash also includes the
>> previous contents (and, hence, is dependent on all previous updates),
>>
>> Only identical sequences of updates will yield the same _rev.
>>
>> B.
>>
>> On 15 February 2011 22:37, Randall Leeds <ra...@gmail.com> wrote:
>>> On Tue, Feb 15, 2011 at 07:30, Aaron Boxer <bo...@gmail.com> wrote:
>>>> Interesting. Thanks!
>>>>
>>>> How do version ids get generated?  How do the different nodes
>>>> avoid version id collision; i.e. two nodes updating a document with the
>>>> same version id?
>>>
>>> The revision id contains both a monotonically increasing number
>>> revision number and a hash of the document contents. The hash breaks
>>> ties (storing the conflict, not resolving it, but deterministically
>>> choosing a privileged version to report as the "newest").
>>>
>>> In this manner, should two nodes perform the same update the revision
>>> is said to exist in both places already and replication will note this
>>> and not copy the document again.
>>>
>>> -Randall
>>>
>>
>

Re: interested in learning about replication algorithm

Posted by Aaron Boxer <bo...@gmail.com>.
Thanks, guys! I guess I need to dig into the actual code.

I would like to implement a similar algorithm in C, for another project
I am working on.



On Tue, Feb 15, 2011 at 5:48 PM, Robert Newson <ro...@gmail.com> wrote:
> It's worth mentioning that, like git, the hash also includes the
> previous contents (and, hence, is dependent on all previous updates),
>
> Only identical sequences of updates will yield the same _rev.
>
> B.
>
> On 15 February 2011 22:37, Randall Leeds <ra...@gmail.com> wrote:
>> On Tue, Feb 15, 2011 at 07:30, Aaron Boxer <bo...@gmail.com> wrote:
>>> Interesting. Thanks!
>>>
>>> How do version ids get generated?  How do the different nodes
>>> avoid version id collision; i.e. two nodes updating a document with the
>>> same version id?
>>
>> The revision id contains both a monotonically increasing number
>> revision number and a hash of the document contents. The hash breaks
>> ties (storing the conflict, not resolving it, but deterministically
>> choosing a privileged version to report as the "newest").
>>
>> In this manner, should two nodes perform the same update the revision
>> is said to exist in both places already and replication will note this
>> and not copy the document again.
>>
>> -Randall
>>
>

Re: interested in learning about replication algorithm

Posted by Robert Newson <ro...@gmail.com>.
It's worth mentioning that, like git, the hash also includes the
previous contents (and, hence, is dependent on all previous updates),

Only identical sequences of updates will yield the same _rev.

B.

On 15 February 2011 22:37, Randall Leeds <ra...@gmail.com> wrote:
> On Tue, Feb 15, 2011 at 07:30, Aaron Boxer <bo...@gmail.com> wrote:
>> Interesting. Thanks!
>>
>> How do version ids get generated?  How do the different nodes
>> avoid version id collision; i.e. two nodes updating a document with the
>> same version id?
>
> The revision id contains both a monotonically increasing number
> revision number and a hash of the document contents. The hash breaks
> ties (storing the conflict, not resolving it, but deterministically
> choosing a privileged version to report as the "newest").
>
> In this manner, should two nodes perform the same update the revision
> is said to exist in both places already and replication will note this
> and not copy the document again.
>
> -Randall
>

Re: interested in learning about replication algorithm

Posted by Randall Leeds <ra...@gmail.com>.
On Tue, Feb 15, 2011 at 07:30, Aaron Boxer <bo...@gmail.com> wrote:
> Interesting. Thanks!
>
> How do version ids get generated?  How do the different nodes
> avoid version id collision; i.e. two nodes updating a document with the
> same version id?

The revision id contains both a monotonically increasing number
revision number and a hash of the document contents. The hash breaks
ties (storing the conflict, not resolving it, but deterministically
choosing a privileged version to report as the "newest").

In this manner, should two nodes perform the same update the revision
is said to exist in both places already and replication will note this
and not copy the document again.

-Randall

Re: interested in learning about replication algorithm

Posted by Aaron Boxer <bo...@gmail.com>.
Interesting. Thanks!

How do version ids get generated?  How do the different nodes
avoid version id collision; i.e. two nodes updating a document with the
same version id?

  --Aaron




On Tue, Feb 15, 2011 at 10:17 AM, Robert Dionne
<di...@dionne-associates.com> wrote:
> This WIKI page[1] is largely accurate and a good place to start in terms of understanding the model. Currently we're close to having a new replicator in trunk, so if you want to read the code I'd start with that one. Neither the new nore old replicators are well documented in terms of the relationships with the rest of the system, so depending on your erlang knowledge you may also need to ask lots of questions in #couchdb
>
> Cheers,
>
> Bob
>
>
> [1] http://wiki.apache.org/couchdb/Replication_and_conflicts
>
>
>
> On Feb 15, 2011, at 10:01 AM, Aaron Boxer wrote:
>
>> Hello,
>> I am new to couchdb, and I would like to learn as much as I can about
>> the replication algorithm, short of reading the code. Can anyone suggest
>> some resources? The more technical, the better.
>>
>> In particular, I would like to understand how conflicts are resolved.
>>
>> Thanks!
>>
>> Aaron
>
>

Re: interested in learning about replication algorithm

Posted by Robert Dionne <di...@dionne-associates.com>.
This WIKI page[1] is largely accurate and a good place to start in terms of understanding the model. Currently we're close to having a new replicator in trunk, so if you want to read the code I'd start with that one. Neither the new nore old replicators are well documented in terms of the relationships with the rest of the system, so depending on your erlang knowledge you may also need to ask lots of questions in #couchdb

Cheers,

Bob


[1] http://wiki.apache.org/couchdb/Replication_and_conflicts



On Feb 15, 2011, at 10:01 AM, Aaron Boxer wrote:

> Hello,
> I am new to couchdb, and I would like to learn as much as I can about
> the replication algorithm, short of reading the code. Can anyone suggest
> some resources? The more technical, the better.
> 
> In particular, I would like to understand how conflicts are resolved.
> 
> Thanks!
> 
> Aaron