You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Ceriel Jacobs <ce...@gmail.com> on 2009/12/11 21:34:39 UTC

Documentation: 1. Order in which replication is executed 2. Single invocation multiple ports 3. Replication bound to a

Dear list,

1. Quote from the documentation: [1]
> "... for each updated document, only fields and blobs that have changed are replicated... If replication > fails... the next replication restarts at the same document where it left off"

My question is: "How is the order of replication?" 
In other words: "In which order are things in a CouchDB replicated?"

For instance:
* 1st most recent changes, 2nd older changes, 3th oldest changes
and/or
* 1st inserts, 2nd updates, 3th deletes
and/or
* 1st data, 2nd BLOB's
and/or
* 1st documents in the top of the tree, 2nd are documents in the second level of the tree, etc.
and/or
* 1st walking down a tree till its leave, then the next leave of the same parent, until the last, and then one level up

Any information that clarify the inner workings is welcome.
Especially in situations with mixed data and BLOB's.

In case BLOB's are replicated with identical priority as stored data, that could lead to a design decision where BLOB's will not be stored in CouchDB though on the file system level. This to ensure that textual content is replicated with highest priority.

2. A related question is if a single invocation of CouchDB can be bound to multiple ports?

3. In case this is possible, can replication then be bound to run on a specific port?
This would give the ability to limit the bandwidth of replication traffic with a firewall rule. And not reduce the speed other (CouchDB) network traffic.



Thanks in advance,
~Ceriel


[1] http://couchdb.apache.org/docs/overview.html



Re: Documentation: 1. Order in which replication is executed 2. Single invocation multiple ports 3. Replication bound to a

Posted by Jens Alfke <je...@mooseyard.com>.
On Dec 11, 2009, at 12:34 PM, Ceriel Jacobs wrote:

> * 1st most recent changes, 2nd older changes, 3th oldest changes

I _think_ it's forward chronological order (the opposite of what you say here), but don't take my word for it.

> * 1st data, 2nd BLOB's

That wouldn't work, because documents are atomic; you can't update part of a document without updating all of it.

> * 1st documents in the top of the tree, 2nd are documents in the second level of the tree, etc.
> and/or
> * 1st walking down a tree till its leave, then the next leave of the same parent, until the last, and then one level up

To my knowledge there is no notion of a tree. Documents are keyed by an ID, in a flat namespace.

Each replication is an atomic transaction, so what you're asking about are implementation details — no client would ever see a database that was halfway through replication.

—Jens

Re: Documentation: 1. Order in which replication is executed 2. Single invocation multiple ports 3. Replication bound to a

Posted by Paul Davis <pa...@gmail.com>.
On Fri, Dec 11, 2009 at 3:34 PM, Ceriel Jacobs <ce...@gmail.com> wrote:
> Dear list,
>
> 1. Quote from the documentation: [1]
>> "... for each updated document, only fields and blobs that have changed are replicated... If replication > fails... the next replication restarts at the same document where it left off"
>
> My question is: "How is the order of replication?"
> In other words: "In which order are things in a CouchDB replicated?"
>
> For instance:
> * 1st most recent changes, 2nd older changes, 3th oldest changes
> and/or
> * 1st inserts, 2nd updates, 3th deletes
> and/or
> * 1st data, 2nd BLOB's
> and/or
> * 1st documents in the top of the tree, 2nd are documents in the second level of the tree, etc.
> and/or
> * 1st walking down a tree till its leave, then the next leave of the same parent, until the last, and then one level up
>
> Any information that clarify the inner workings is welcome.
> Especially in situations with mixed data and BLOB's.
>
> In case BLOB's are replicated with identical priority as stored data, that could lead to a design decision where BLOB's will not be stored in CouchDB though on the file system level. This to ensure that textual content is replicated with highest priority.
>

I should've also mentioned that update_seq is the log of most recent
edit for each document. Ie, if you PUT doc1, PUT doc2, DELETE doc1 the
update sequence would look like:

2: doc2
3: doc1

In which case, doc2 would be replicated before doc1.

Re: Documentation: 1. Order in which replication is executed 2. Single invocation multiple ports 3. Replication bound to a

Posted by Paul Davis <pa...@gmail.com>.
On Fri, Dec 11, 2009 at 3:34 PM, Ceriel Jacobs <ce...@gmail.com> wrote:
> Dear list,
>
> 1. Quote from the documentation: [1]
>> "... for each updated document, only fields and blobs that have changed are replicated... If replication > fails... the next replication restarts at the same document where it left off"
>
> My question is: "How is the order of replication?"
> In other words: "In which order are things in a CouchDB replicated?"
>
> For instance:
> * 1st most recent changes, 2nd older changes, 3th oldest changes
> and/or
> * 1st inserts, 2nd updates, 3th deletes
> and/or
> * 1st data, 2nd BLOB's
> and/or
> * 1st documents in the top of the tree, 2nd are documents in the second level of the tree, etc.
> and/or
> * 1st walking down a tree till its leave, then the next leave of the same parent, until the last, and then one level up
>
> Any information that clarify the inner workings is welcome.
> Especially in situations with mixed data and BLOB's.
>
> In case BLOB's are replicated with identical priority as stored data, that could lead to a design decision where BLOB's will not be stored in CouchDB though on the file system level. This to ensure that textual content is replicated with highest priority.

The general order is by update_seq:

$ curl http://127.0.0.1:5984/db_name/_all_docs_by_seq

I'm not sure on what happens right at the point of sending docs, but
replication should be initiated in roughly that order. There may be
details that affect two documents right next to each other (due to
simultaneous transers).

Depending on replication direction, the attachments are also
replicated differently. I can't ever keep it straight, but one
direction they get replicated inline (for the time being) and the
other direction they're replicated OOB. But the general order in which
that happens is based on the update_seq as well.

> 2. A related question is if a single invocation of CouchDB can be bound to multiple ports?

No.

> 3. In case this is possible, can replication then be bound to run on a specific port?
> This would give the ability to limit the bandwidth of replication traffic with a firewall rule. And not reduce the speed other (CouchDB) network traffic.

If it were possible, then you'd just specify the port in the URL
passed to the replicator.

HTH,
Paul Davis