You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Damien Katz <da...@apache.org> on 2009/05/14 23:52:53 UTC

Re: Incremental replication over unreliable link -- how granular is replication restart

On May 14, 2009, at 10:36 AM, Matt Goodall wrote:

> 2009/5/14 Adam Kocoloski <ko...@apache.org>:
>> Hi Ben, welcome!  At the moment, CouchDB does not have any capacity  
>> for
>> intra-document replication checkpointing.  And you're right, in the  
>> specific
>> situation you describe Couch would have a difficult time making any
>> replication progress.
>>
>> Given that replication over slow, unreliable links is absolutely a  
>> CouchDB
>> design goal, I think we might eventually conjure up some more magic  
>> to make
>> some sort of intra-document (or at least intra-attachment)  
>> checkpointing
>> possible.  I think it will be post-1.0, though.  Best,
>>
>> Adam
>>
>> On May 14, 2009, at 7:12 AM, Ben Cohen wrote:
>>
>>> Hi all --
>>>
>>> This is my first message to the list.  I've been watching it for a  
>>> little
>>> while now and so far everything I read about the design of couchdb  
>>> I like a
>>> lot!  Thanks so much for all the cool work!
>>>
>>> One of the uses I'm planning for couchdb involves replicating a  
>>> database
>>> across a slow, unreliable link which will never become anything  
>>> other than
>>> slow and unreliable.  I understand the replication is incremental  
>>> and
>>> designed to 'pick up where it left off' in the case of replication
>>> interruption.  From the technical overview on the website:
>>>
>>>> The replication process is incremental. At the database level,
>>>> replication only examines documents updated since the last  
>>>> replication. Then
>>>> for each updated document, only fields and blobs that have  
>>>> changed are
>>>> replicated across the network. If replication fails at any step,  
>>>> due to
>>>> network problems or crash for example, the next replication  
>>>> restarts at the
>>>> same document where it left off.
>
> Is this actually accurate? It suggests that documents are replicated
> one-by-one and that replication can be interrupted at any point and
> will continue from wherever it got to before the interruption.
>
> Firstly, I believe the whole replication has to complete before any
> updates are visible in the target database.

No, each update is seen on the target as it's written by the replicator.

> If I restart the server in
> charge of replication and then restart the replication it always seems
> to start from the beginning. i.e. the Futon's "Processed source update
> #xxx" status starts from 0 (when replicating an empty database).

It can start scanning from the beginning, but it will not copy again  
documents it's already replicated.

The checkpointing work prevents it from scanning back from 0, but  
there are failure scenarios where it might start from 0 anyway. Adam  
has some ideas for a simple fix we can make this far less likely to  
happen.


>
> Secondly, if the network connection fails in the middle of replication
> (closing an ssh tunnel is a good way to test this ;-)) then it seems
> to retry a few (10) times before the replicator process terminates. If
> the network connection becomes available again (restart the ssh
> tunnel) the replicator doesn't seem to notice. Also, I just noticed
> that Futon still lists the replication on its status page.

There is lot of work we can do here, right now replication is strictly  
a batch operation. Eventually we will have permanent replications,  
where replicators are always working in near realtime, and  
indefinitely retrying when network connections are failing.

-Damien

>
> If I'm correct, and I really hope I'm missing something, then
> couchdb's replication is probably not currently suitable for
> replicating anything but very small database differences over an
> unstable connection. Does anyone have any real experience in this sort
> of scenario?
>
> - Matt
>
>>>>
>>> I've got a question about this process.  Say you have a document  
>>> to be
>>> replicated with a 1 megabyte attachment.  A replication process  
>>> starts, half
>>> the doc is transferred successfully and then the connection dies.   
>>> Assuming
>>> no changes to the source doc, when the replication restarts will the
>>> transfer start from the beginning of the document or will it pick up
>>> somewhere within the doc?
>>>
>>> For my use case I have a slow link that will periodically come  
>>> online for
>>> a certain fixed amount of time and initiate a replication.  If the
>>> replication isn't incremental 'within' a single document, then a  
>>> document in
>>> the database above a certain size will for me, never make it  
>>> across and I
>>> would imagine cause the replication to never make forward  
>>> progress ...
>>>
>>> Does couchdb's replication magic avoid the issue for me and  
>>> eventually
>>> transfer the document across my link?
>>>
>>> Thanks much,
>>> Ben Cohen
>>
>>