You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Ethan <et...@gmail.com> on 2013/03/27 03:00:13 UTC

Saving changes

Hi! I'd like to build an "offline replication" system for CouchDB.
Basically, I'd like to build a system that can synchronize over an
intermediary that isn't trusted (like a typical web host). Ideally, I'd
grab whatever CouchDB sends over the wire to do replication, compress,
encrypt, and sign it, and then upload it (say via SSH). I'm looking for
more information on how to build something that does that. So far all I've
found is http://comments.gmane.org/gmane.comp.db.couchdb.user/164.

First, although _replicate can take "source" or "target" as URLs, anything
that isn't an HTTP URL gets "invalid database". So much for replicating
file:// :)

Secondly, I tried to bring up a simple HTTP server and try to synchronize
against it to try to wire sniff what's going on. It seems like the first
thing CouchDB does when replicating is do a HEAD request on the database.
I'm guessing it's trying to get the database's sequence number. I thought I
would try to do the same thing against CouchDB. When I do, the database
hangs.

r$ curl -vX HEAD http://127.0.0.1:5984/mydb/
* About to connect() to 127.0.0.1 port 5984 (#0)
*   Trying 127.0.0.1...
* connected
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> HEAD /mydb/ HTTP/1.1
> User-Agent: curl/7.27.0
> Host: 127.0.0.1:5984
> Accept: */*
>
* additional stuff not fine transfer.c:1037: 0 0
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
< Server: CouchDB/1.2.0 (Erlang OTP/R15B01)
< Date: Wed, 27 Mar 2013 01:59:35 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 223
< Cache-Control: must-revalidate
<
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0

(The last line repeats until I kill it with Control-C.)

Is there anything like a comprehensive guide to how replication works, or
how to interoperate with it?

Thanks,

Ethan

Re: Saving changes

Posted by Ethan <et...@gmail.com>.

On Tue, Mar 26, 2013 at 10:27 PM, Jens Alfke <je...@couchbase.com> wrote:

> The replication protocol is interactive, so you can’t do things the same
> way. For instance, a push replication first sends a _revs_diff request to
> tell the remote server which new revisions it has that the remote _might_
> be interested in; then the remote responds by listing the subset of those
> revisions that it doesn’t have yet, and what prior revisions of those
> documents it does have. Then the source PUTs the revisions one by one.
>

This is really, really helpful. Thanks!

You could write something that would do this in a less-optimal
> non-interactive way. It would remember the last time it synced to the
> target, gather up all the revisions that have happened since that sequence,
> and bundle them up into a file. It would work fairly well if the target
> server only replicates with the source server, i.e. it has no way to get
> these replications from somewhere else. If you have a more complex set of
> replications, then the source can end up sending way too much stuff because
> the target may already have gotten those same revisions from somewhere else.
>

I think that's just a fact of life for my application. In a worst case, two
different sources could want to replicate to a target at the same time,
before the target checks for new messages. In this case, there's no way to
avoid both sources uploading messages.

> I'm looking for
> > more information on how to build something that does that. So far all
> I've
> > found is http://comments.gmane.org/gmane.comp.db.couchdb.user/164.
>
> I’ve documented the replication protocol here:
>
> https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm
> The APIs that the replicator calls are all documented in the CouchDB wiki
> (partly because I made sure to add documentation for the ones that weren’t
> documented as I ran into them):
>         http://wiki.apache.org/couchdb/Complete_HTTP_API_Reference
>

Thank you so much for your work documenting this.

Ethan

Re: Saving changes

Posted by Jens Alfke <je...@couchbase.com>.

On Mar 26, 2013, at 7:00 PM, Ethan <et...@gmail.com> wrote:

> Hi! I'd like to build an "offline replication" system for CouchDB.
> Basically, I'd like to build a system that can synchronize over an
> intermediary that isn't trusted (like a typical web host). Ideally, I'd
> grab whatever CouchDB sends over the wire to do replication, compress,
> encrypt, and sign it, and then upload it (say via SSH).

The replication protocol is interactive, so you can’t do things the same way. For instance, a push replication first sends a _revs_diff request to tell the remote server which new revisions it has that the remote _might_ be interested in; then the remote responds by listing the subset of those revisions that it doesn’t have yet, and what prior revisions of those documents it does have. Then the source PUTs the revisions one by one.

You could write something that would do this in a less-optimal non-interactive way. It would remember the last time it synced to the target, gather up all the revisions that have happened since that sequence, and bundle them up into a file. It would work fairly well if the target server only replicates with the source server, i.e. it has no way to get these replications from somewhere else. If you have a more complex set of replications, then the source can end up sending way too much stuff because the target may already have gotten those same revisions from somewhere else.

> I'm looking for
> more information on how to build something that does that. So far all I've
> found is http://comments.gmane.org/gmane.comp.db.couchdb.user/164.

I’ve documented the replication protocol here:
	https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm
The APIs that the replicator calls are all documented in the CouchDB wiki (partly because I made sure to add documentation for the ones that weren’t documented as I ran into them):
	http://wiki.apache.org/couchdb/Complete_HTTP_API_Reference

> First, although _replicate can take "source" or "target" as URLs, anything
> that isn't an HTTP URL gets "invalid database". So much for replicating
> file:// :)

Well yeah; the destination has to be an HTTP server that handles at least the passive side of the replication protocol.

> I thought I
> would try to do the same thing against CouchDB. When I do, the database
> hangs.
> 
> r$ curl -vX HEAD http://127.0.0.1:5984/mydb/

I don’t think curl likes -X HEAD. You should use -I or --head to send a HEAD request.

—Jens