You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Ben Cohen <co...@gmail.com> on 2009/05/14 13:12:47 UTC

Incremental replication over unreliable link -- how granular is replication restart

Hi all --

This is my first message to the list.  I've been watching it for a  
little while now and so far everything I read about the design of  
couchdb I like a lot!  Thanks so much for all the cool work!

One of the uses I'm planning for couchdb involves replicating a  
database across a slow, unreliable link which will never become  
anything other than slow and unreliable.  I understand the replication  
is incremental and designed to 'pick up where it left off' in the case  
of replication interruption.  From the technical overview on the  
website:

> The replication process is incremental. At the database level,  
> replication only examines documents updated since the last  
> replication. Then for each updated document, only fields and blobs  
> that have changed are replicated across the network. If replication  
> fails at any step, due to network problems or crash for example, the  
> next replication restarts at the same document where it left off.
>
I've got a question about this process.  Say you have a document to be  
replicated with a 1 megabyte attachment.  A replication process  
starts, half the doc is transferred successfully and then the  
connection dies.  Assuming no changes to the source doc, when the  
replication restarts will the transfer start from the beginning of the  
document or will it pick up somewhere within the doc?

For my use case I have a slow link that will periodically come online  
for a certain fixed amount of time and initiate a replication.  If the  
replication isn't incremental 'within' a single document, then a  
document in the database above a certain size will for me, never make  
it across and I would imagine cause the replication to never make  
forward progress ...

Does couchdb's replication magic avoid the issue for me and eventually  
transfer the document across my link?

Thanks much,
Ben Cohen

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Damien Katz <da...@apache.org>.

On May 14, 2009, at 10:36 AM, Matt Goodall wrote:

> 2009/5/14 Adam Kocoloski <ko...@apache.org>:
>> Hi Ben, welcome!  At the moment, CouchDB does not have any capacity  
>> for
>> intra-document replication checkpointing.  And you're right, in the  
>> specific
>> situation you describe Couch would have a difficult time making any
>> replication progress.
>>
>> Given that replication over slow, unreliable links is absolutely a  
>> CouchDB
>> design goal, I think we might eventually conjure up some more magic  
>> to make
>> some sort of intra-document (or at least intra-attachment)  
>> checkpointing
>> possible.  I think it will be post-1.0, though.  Best,
>>
>> Adam
>>
>> On May 14, 2009, at 7:12 AM, Ben Cohen wrote:
>>
>>> Hi all --
>>>
>>> This is my first message to the list.  I've been watching it for a  
>>> little
>>> while now and so far everything I read about the design of couchdb  
>>> I like a
>>> lot!  Thanks so much for all the cool work!
>>>
>>> One of the uses I'm planning for couchdb involves replicating a  
>>> database
>>> across a slow, unreliable link which will never become anything  
>>> other than
>>> slow and unreliable.  I understand the replication is incremental  
>>> and
>>> designed to 'pick up where it left off' in the case of replication
>>> interruption.  From the technical overview on the website:
>>>
>>>> The replication process is incremental. At the database level,
>>>> replication only examines documents updated since the last  
>>>> replication. Then
>>>> for each updated document, only fields and blobs that have  
>>>> changed are
>>>> replicated across the network. If replication fails at any step,  
>>>> due to
>>>> network problems or crash for example, the next replication  
>>>> restarts at the
>>>> same document where it left off.
>
> Is this actually accurate? It suggests that documents are replicated
> one-by-one and that replication can be interrupted at any point and
> will continue from wherever it got to before the interruption.
>
> Firstly, I believe the whole replication has to complete before any
> updates are visible in the target database.

No, each update is seen on the target as it's written by the replicator.

> If I restart the server in
> charge of replication and then restart the replication it always seems
> to start from the beginning. i.e. the Futon's "Processed source update
> #xxx" status starts from 0 (when replicating an empty database).

It can start scanning from the beginning, but it will not copy again  
documents it's already replicated.

The checkpointing work prevents it from scanning back from 0, but  
there are failure scenarios where it might start from 0 anyway. Adam  
has some ideas for a simple fix we can make this far less likely to  
happen.


>
> Secondly, if the network connection fails in the middle of replication
> (closing an ssh tunnel is a good way to test this ;-)) then it seems
> to retry a few (10) times before the replicator process terminates. If
> the network connection becomes available again (restart the ssh
> tunnel) the replicator doesn't seem to notice. Also, I just noticed
> that Futon still lists the replication on its status page.

There is lot of work we can do here, right now replication is strictly  
a batch operation. Eventually we will have permanent replications,  
where replicators are always working in near realtime, and  
indefinitely retrying when network connections are failing.

-Damien

>
> If I'm correct, and I really hope I'm missing something, then
> couchdb's replication is probably not currently suitable for
> replicating anything but very small database differences over an
> unstable connection. Does anyone have any real experience in this sort
> of scenario?
>
> - Matt
>
>>>>
>>> I've got a question about this process.  Say you have a document  
>>> to be
>>> replicated with a 1 megabyte attachment.  A replication process  
>>> starts, half
>>> the doc is transferred successfully and then the connection dies.   
>>> Assuming
>>> no changes to the source doc, when the replication restarts will the
>>> transfer start from the beginning of the document or will it pick up
>>> somewhere within the doc?
>>>
>>> For my use case I have a slow link that will periodically come  
>>> online for
>>> a certain fixed amount of time and initiate a replication.  If the
>>> replication isn't incremental 'within' a single document, then a  
>>> document in
>>> the database above a certain size will for me, never make it  
>>> across and I
>>> would imagine cause the replication to never make forward  
>>> progress ...
>>>
>>> Does couchdb's replication magic avoid the issue for me and  
>>> eventually
>>> transfer the document across my link?
>>>
>>> Thanks much,
>>> Ben Cohen
>>
>>

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Adam Kocoloski <ko...@apache.org>.

>> Secondly, if the network connection fails in the middle of  
>> replication
>> (closing an ssh tunnel is a good way to test this ;-)) then it seems
>> to retry a few (10) times before the replicator process terminates.  
>> If
>> the network connection becomes available again (restart the ssh
>> tunnel) the replicator doesn't seem to notice. Also, I just noticed
>> that Futon still lists the replication on its status page.
>
> That's correct, the replicator does try to ignore transient failures.

Wait, are you saying that the replicator hangs if you do this?  That's  
a bug.

Adam

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Adam Kocoloski <ko...@apache.org>.

On May 14, 2009, at 6:26 PM, Matt Goodall wrote:

>>>>> Secondly, if the network connection fails in the middle of  
>>>>> replication
>>>>> (closing an ssh tunnel is a good way to test this ;-)) then it  
>>>>> seems
>>>>> to retry a few (10) times before the replicator process  
>>>>> terminates. If
>>>>> the network connection becomes available again (restart the ssh
>>>>> tunnel) the replicator doesn't seem to notice. Also, I just  
>>>>> noticed
>>>>> that Futon still lists the replication on its status page.
>>>>
>>>> That's correct, the replicator does try to ignore transient  
>>>> failures.
>>>
>>> Hmm, it seemed to fail on transient failures here. After killing and
>>> restarting my ssh tunnel I left the replication a while and it never
>>> seemed to continue, and the only way to clear it from the status  
>>> list
>>> was to restart the couchdb server. I'll check again though.
>>
>> Ok, I misread you earlier.  It's possible that CouchDB or ibrowse  
>> is trying
>> to reuse a socket when it really should be opening a new one.  That  
>> would be
>> a bug.
>
> This one definitely seems like a bug. Killing and restarting my SSH
> tunnel basically kills the replication, I can see no sign of it
> resuming.
>
> You get this in the log ...

<snip>

> and then nothing.
>
> Worst of all is that couch still thinks the replication is running and
> refuses to start another one. Currently, the only solution is to
> restart the couch server :-/.

Thanks again for catching this bug, Matt.  The example you showed  
occurs when we record a checkpoint record, but there was also a  
similar problem with writing attachments to disk.  I've committed a  
very simplistic fix for the problem; the replicator should now realize  
that these requests are never going to complete and commit seppuku.   
Not the most elegant solution, perhaps, but it's certainly better than  
restarting the server.  The error message should take one of the  
following forms (still working on standardizing these error messages,  
of course):

{"error":"replication_link_failure", "reason":"{gen_server, call ...}"}
{"error":"internal_server_error", "reason":"replication_link_failure"}
{"error":"attachment_request_failed", "reason":"failed to replicate  
http://..."}
{"error":"attachment_request_failed", "reason":"ibrowse error on  
http://... : Reason"}

We'll work on a more fine-grained failure mode in the future.  Best,  
Adam

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Matt Goodall <ma...@gmail.com>.

2009/5/14 Adam Kocoloski <ad...@gmail.com>:
> Hi Matt, going to snip a bit here to keep the discussion manageable ...
>
> On May 14, 2009, at 12:00 PM, Matt Goodall wrote:
>
>> When I tried things before writing my mail I was using two couchdb
>> servers running from relatively recent versions of trunk. So 0.9 and a
>> bit ;-).
>>
>> I didn't know about the ~10MB. I don't know if I reached that
>> threshold which may be why it seemed to be started over each time.
>> I'll try to retest with a lower threshold and more debugging to see
>> what's really happening. Any help on where that hard-coded 10MB value
>> is would be very helpful!
>
> In line 205 of couch_rep.erl you should see

:) Thanks, saved me a bit of time hunting for it.

>
>>    {NewBuffer, NewContext} = case couch_util:should_flush() of
>
> should_flush() takes an argument which is a number of bytes.  So changing
> that to
>
>>    {NewBuffer, NewContext} = case couch_util:should_flush(1000) of
>
> would cause the replicator to checkpoint after each kilobyte (on document
> boundaries, of course).  You should see a line in the logfile on the machine
> initiating the replication like

Yep, reducing the value allowed the checkpoints to happen frequently
enough to prove that you are absolutely correct - replication does
happen in batches and resumes from the last checkpointed batch.
Hurray!

>
> "recording a checkpoint at source update_seq N"
>
>>> Others have commented that the 10MB threshold really needs to be
>>> configurable.  E.g., set it to zero and you get per-document checkpoints,
>>> but your throughput will drop and the final DB size on the target will
>>> grow.
>>>  Super easy to do, but no one's gotten around to it.
>>
>> Presumably the threshold all depends on the quality of the network
>> connection between the two endpoints, although having the default
>> configurable is probably a good thing anyway.
>
> I think a configurable default is an OK option, but what I'd really like to
> see is the checkpoint threshold added as an optional field to the JSON body
> sent in an individual POST to _replicate.

Yep, exactly. I seem to have deleted the bit I had about a
per-replication option although the implication was still there.

Perhaps, the default should be configurable and be based on data size
*and* time, whichever is reached first? That way fast connections will
checkpoint fewer times with reduced DB size. Slow connections will
checkpoint more times but be more likely to be resumable.


>
>>>> Secondly, if the network connection fails in the middle of replication
>>>> (closing an ssh tunnel is a good way to test this ;-)) then it seems
>>>> to retry a few (10) times before the replicator process terminates. If
>>>> the network connection becomes available again (restart the ssh
>>>> tunnel) the replicator doesn't seem to notice. Also, I just noticed
>>>> that Futon still lists the replication on its status page.
>>>
>>> That's correct, the replicator does try to ignore transient failures.
>>
>> Hmm, it seemed to fail on transient failures here. After killing and
>> restarting my ssh tunnel I left the replication a while and it never
>> seemed to continue, and the only way to clear it from the status list
>> was to restart the couchdb server. I'll check again though.
>
> Ok, I misread you earlier.  It's possible that CouchDB or ibrowse is trying
> to reuse a socket when it really should be opening a new one.  That would be
> a bug.

This one definitely seems like a bug. Killing and restarting my SSH
tunnel basically kills the replication, I can see no sign of it
resuming.

You get this in the log ...

[error] [<0.62.0>] replicator terminating with reason {http_request_failed,
                                       [104,116,116,112,58,47,47,108,111,99,
                                        97,108,104,111,115,116,58,54,57,56,52,
                                        47,99,117,114,114,101,110,116,97,103,
                                        114,101,101,109,101,110,116,115,47,54,
                                        102,56,49,54,49,48,49,100,49,52,97,52,
                                        49,50,101,98,101,52,48,56,57,52,100,
                                        50,101,100,51,50,50,101,100,63,114,
                                        101,118,115,61,116,114,117,101,38,108,
                                        97,116,101,115,116,61,116,114,117,101,
                                        38,111,112,101,110,95,114,101,118,115,
                                        61,91,34,<<"1-1039057390">>,34,93]}
[info] [<0.62.0>] recording a checkpoint at source update_seq 213
[error] [<0.53.0>] Uncaught error in HTTP request: {exit,normal}
[info] [<0.61.0>] 127.0.0.1 - - 'GET' /_active_tasks 200
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[info] [<0.362.0>] retrying couch_rep HTTP post request due to {error,
conn_failed}: http://localhost:6984/currentagreements/_ensure_full_commit
[error] [<0.362.0>] couch_rep HTTP post request failed after 10
retries: http://localhost:6984/currentagreements/_ensure_full_commit

and then nothing.

Worst of all is that couch still thinks the replication is running and
refuses to start another one. Currently, the only solution is to
restart the couch server :-/.

>
>>>> If I'm correct, and I really hope I'm missing something, then
>>>> couchdb's replication is probably not currently suitable for
>>>> replicating anything but very small database differences over an
>>>> unstable connection. Does anyone have any real experience in this sort
>>>> of scenario?
>>>
>>> Personally, I do not.  I think the conclusion is a bit pessimistic,
>>> though.
>>
>> Sorry, wasn't meaning to be pessimistic. Just trying to report
>> honestly what I was seeing so it could be improved where possible.
>
> Absolutely, that statement probably came off too confrontational.  The more
> high-quality feedback like this we get the better off we'll be!  Cheers,
>
> Adam
>

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Adam Kocoloski <ad...@gmail.com>.

Hi Matt, going to snip a bit here to keep the discussion manageable ...

On May 14, 2009, at 12:00 PM, Matt Goodall wrote:

> When I tried things before writing my mail I was using two couchdb
> servers running from relatively recent versions of trunk. So 0.9 and a
> bit ;-).
>
> I didn't know about the ~10MB. I don't know if I reached that
> threshold which may be why it seemed to be started over each time.
> I'll try to retest with a lower threshold and more debugging to see
> what's really happening. Any help on where that hard-coded 10MB value
> is would be very helpful!

In line 205 of couch_rep.erl you should see

>     {NewBuffer, NewContext} = case couch_util:should_flush() of

should_flush() takes an argument which is a number of bytes.  So  
changing that to

>     {NewBuffer, NewContext} = case couch_util:should_flush(1000) of

would cause the replicator to checkpoint after each kilobyte (on  
document boundaries, of course).  You should see a line in the logfile  
on the machine initiating the replication like

"recording a checkpoint at source update_seq N"

>> Others have commented that the 10MB threshold really needs to be
>> configurable.  E.g., set it to zero and you get per-document  
>> checkpoints,
>> but your throughput will drop and the final DB size on the target  
>> will grow.
>>  Super easy to do, but no one's gotten around to it.
>
> Presumably the threshold all depends on the quality of the network
> connection between the two endpoints, although having the default
> configurable is probably a good thing anyway.

I think a configurable default is an OK option, but what I'd really  
like to see is the checkpoint threshold added as an optional field to  
the JSON body sent in an individual POST to _replicate.

>>> Secondly, if the network connection fails in the middle of  
>>> replication
>>> (closing an ssh tunnel is a good way to test this ;-)) then it seems
>>> to retry a few (10) times before the replicator process  
>>> terminates. If
>>> the network connection becomes available again (restart the ssh
>>> tunnel) the replicator doesn't seem to notice. Also, I just noticed
>>> that Futon still lists the replication on its status page.
>>
>> That's correct, the replicator does try to ignore transient failures.
>
> Hmm, it seemed to fail on transient failures here. After killing and
> restarting my ssh tunnel I left the replication a while and it never
> seemed to continue, and the only way to clear it from the status list
> was to restart the couchdb server. I'll check again though.

Ok, I misread you earlier.  It's possible that CouchDB or ibrowse is  
trying to reuse a socket when it really should be opening a new one.   
That would be a bug.

>>> If I'm correct, and I really hope I'm missing something, then
>>> couchdb's replication is probably not currently suitable for
>>> replicating anything but very small database differences over an
>>> unstable connection. Does anyone have any real experience in this  
>>> sort
>>> of scenario?
>>
>> Personally, I do not.  I think the conclusion is a bit pessimistic,  
>> though.
>
> Sorry, wasn't meaning to be pessimistic. Just trying to report
> honestly what I was seeing so it could be improved where possible.

Absolutely, that statement probably came off too confrontational.  The  
more high-quality feedback like this we get the better off we'll be!   
Cheers,

Adam

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Matt Goodall <ma...@gmail.com>.

2009/5/14 Adam Kocoloski <ko...@apache.org>:
> Hi Matt,
>
> On May 14, 2009, at 10:36 AM, Matt Goodall wrote:
>
>> 2009/5/14 Adam Kocoloski <ko...@apache.org>:
>>>
>>> Hi Ben, welcome!  At the moment, CouchDB does not have any capacity for
>>> intra-document replication checkpointing.  And you're right, in the
>>> specific
>>> situation you describe Couch would have a difficult time making any
>>> replication progress.
>>>
>>> Given that replication over slow, unreliable links is absolutely a
>>> CouchDB
>>> design goal, I think we might eventually conjure up some more magic to
>>> make
>>> some sort of intra-document (or at least intra-attachment) checkpointing
>>> possible.  I think it will be post-1.0, though.  Best,
>>>
>>> Adam
>>>
>>> On May 14, 2009, at 7:12 AM, Ben Cohen wrote:
>>>
>>>> Hi all --
>>>>
>>>> This is my first message to the list.  I've been watching it for a
>>>> little
>>>> while now and so far everything I read about the design of couchdb I
>>>> like a
>>>> lot!  Thanks so much for all the cool work!
>>>>
>>>> One of the uses I'm planning for couchdb involves replicating a database
>>>> across a slow, unreliable link which will never become anything other
>>>> than
>>>> slow and unreliable.  I understand the replication is incremental and
>>>> designed to 'pick up where it left off' in the case of replication
>>>> interruption.  From the technical overview on the website:
>>>>
>>>>> The replication process is incremental. At the database level,
>>>>> replication only examines documents updated since the last replication.
>>>>> Then
>>>>> for each updated document, only fields and blobs that have changed are
>>>>> replicated across the network. If replication fails at any step, due to
>>>>> network problems or crash for example, the next replication restarts at
>>>>> the
>>>>> same document where it left off.
>>
>> Is this actually accurate? It suggests that documents are replicated
>> one-by-one and that replication can be interrupted at any point and
>> will continue from wherever it got to before the interruption.
>
> Yes, there are some inaccuracies in that paragraph.  We do save checkpoints,
> but not per-document.  We also transfer the whole document, not just changed
> fields.  In some respects the Overview is really part Roadmap.  We've taken
> some flak for this before, perhaps it's time to revisit that page.
>
>> Firstly, I believe the whole replication has to complete before any
>> updates are visible in the target database. If I restart the server in
>> charge of replication and then restart the replication it always seems
>> to start from the beginning. i.e. the Futon's "Processed source update
>> #xxx" status starts from 0 (when replicating an empty database).
>
> The exact behavior has changed as CouchDB has evolved.  Are you running 0.9
> or higher?  In that case Couch should be saving checkpoints for every ~10MB
> of document data that comes across the wire.  If it fails after a
> checkpoint, the next replication should not be starting from 0.  If it is
> restarting, I'd consider that a bug.

When I tried things before writing my mail I was using two couchdb
servers running from relatively recent versions of trunk. So 0.9 and a
bit ;-).

I didn't know about the ~10MB. I don't know if I reached that
threshold which may be why it seemed to be started over each time.
I'll try to retest with a lower threshold and more debugging to see
what's really happening. Any help on where that hard-coded 10MB value
is would be very helpful!

>
> Others have commented that the 10MB threshold really needs to be
> configurable.  E.g., set it to zero and you get per-document checkpoints,
> but your throughput will drop and the final DB size on the target will grow.
>  Super easy to do, but no one's gotten around to it.

Presumably the threshold all depends on the quality of the network
connection between the two endpoints, although having the default
configurable is probably a good thing anyway.

>
>> Secondly, if the network connection fails in the middle of replication
>> (closing an ssh tunnel is a good way to test this ;-)) then it seems
>> to retry a few (10) times before the replicator process terminates. If
>> the network connection becomes available again (restart the ssh
>> tunnel) the replicator doesn't seem to notice. Also, I just noticed
>> that Futon still lists the replication on its status page.
>
> That's correct, the replicator does try to ignore transient failures.

Hmm, it seemed to fail on transient failures here. After killing and
restarting my ssh tunnel I left the replication a while and it never
seemed to continue, and the only way to clear it from the status list
was to restart the couchdb server. I'll check again though.

>
>> If I'm correct, and I really hope I'm missing something, then
>> couchdb's replication is probably not currently suitable for
>> replicating anything but very small database differences over an
>> unstable connection. Does anyone have any real experience in this sort
>> of scenario?
>
> Personally, I do not.  I think the conclusion is a bit pessimistic, though.

Sorry, wasn't meaning to be pessimistic. Just trying to report
honestly what I was seeing so it could be improved where possible.

>  Adding a configurable checkpoint threshold should make it possible to
> (slowly) replicate very large DB differences.  Ben's original point about
> the inability to replicate very large documents still stands, though.  I've
> opened a ticket to remind us about adding that feature in the future.
>  Cheers,
>
> Adam
>
>
>

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Adam Kocoloski <ko...@apache.org>.

Hi Matt,

On May 14, 2009, at 10:36 AM, Matt Goodall wrote:

> 2009/5/14 Adam Kocoloski <ko...@apache.org>:
>> Hi Ben, welcome!  At the moment, CouchDB does not have any capacity  
>> for
>> intra-document replication checkpointing.  And you're right, in the  
>> specific
>> situation you describe Couch would have a difficult time making any
>> replication progress.
>>
>> Given that replication over slow, unreliable links is absolutely a  
>> CouchDB
>> design goal, I think we might eventually conjure up some more magic  
>> to make
>> some sort of intra-document (or at least intra-attachment)  
>> checkpointing
>> possible.  I think it will be post-1.0, though.  Best,
>>
>> Adam
>>
>> On May 14, 2009, at 7:12 AM, Ben Cohen wrote:
>>
>>> Hi all --
>>>
>>> This is my first message to the list.  I've been watching it for a  
>>> little
>>> while now and so far everything I read about the design of couchdb  
>>> I like a
>>> lot!  Thanks so much for all the cool work!
>>>
>>> One of the uses I'm planning for couchdb involves replicating a  
>>> database
>>> across a slow, unreliable link which will never become anything  
>>> other than
>>> slow and unreliable.  I understand the replication is incremental  
>>> and
>>> designed to 'pick up where it left off' in the case of replication
>>> interruption.  From the technical overview on the website:
>>>
>>>> The replication process is incremental. At the database level,
>>>> replication only examines documents updated since the last  
>>>> replication. Then
>>>> for each updated document, only fields and blobs that have  
>>>> changed are
>>>> replicated across the network. If replication fails at any step,  
>>>> due to
>>>> network problems or crash for example, the next replication  
>>>> restarts at the
>>>> same document where it left off.
>
> Is this actually accurate? It suggests that documents are replicated
> one-by-one and that replication can be interrupted at any point and
> will continue from wherever it got to before the interruption.

Yes, there are some inaccuracies in that paragraph.  We do save  
checkpoints, but not per-document.  We also transfer the whole  
document, not just changed fields.  In some respects the Overview is  
really part Roadmap.  We've taken some flak for this before, perhaps  
it's time to revisit that page.

> Firstly, I believe the whole replication has to complete before any
> updates are visible in the target database. If I restart the server in
> charge of replication and then restart the replication it always seems
> to start from the beginning. i.e. the Futon's "Processed source update
> #xxx" status starts from 0 (when replicating an empty database).

The exact behavior has changed as CouchDB has evolved.  Are you  
running 0.9 or higher?  In that case Couch should be saving  
checkpoints for every ~10MB of document data that comes across the  
wire.  If it fails after a checkpoint, the next replication should not  
be starting from 0.  If it is restarting, I'd consider that a bug.

Others have commented that the 10MB threshold really needs to be  
configurable.  E.g., set it to zero and you get per-document  
checkpoints, but your throughput will drop and the final DB size on  
the target will grow.  Super easy to do, but no one's gotten around to  
it.

> Secondly, if the network connection fails in the middle of replication
> (closing an ssh tunnel is a good way to test this ;-)) then it seems
> to retry a few (10) times before the replicator process terminates. If
> the network connection becomes available again (restart the ssh
> tunnel) the replicator doesn't seem to notice. Also, I just noticed
> that Futon still lists the replication on its status page.

That's correct, the replicator does try to ignore transient failures.

> If I'm correct, and I really hope I'm missing something, then
> couchdb's replication is probably not currently suitable for
> replicating anything but very small database differences over an
> unstable connection. Does anyone have any real experience in this sort
> of scenario?

Personally, I do not.  I think the conclusion is a bit pessimistic,  
though.  Adding a configurable checkpoint threshold should make it  
possible to (slowly) replicate very large DB differences.  Ben's  
original point about the inability to replicate very large documents  
still stands, though.  I've opened a ticket to remind us about adding  
that feature in the future.  Cheers,

Adam

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Matt Goodall <ma...@gmail.com>.

2009/5/14 Adam Kocoloski <ko...@apache.org>:
> Hi Ben, welcome!  At the moment, CouchDB does not have any capacity for
> intra-document replication checkpointing.  And you're right, in the specific
> situation you describe Couch would have a difficult time making any
> replication progress.
>
> Given that replication over slow, unreliable links is absolutely a CouchDB
> design goal, I think we might eventually conjure up some more magic to make
> some sort of intra-document (or at least intra-attachment) checkpointing
> possible.  I think it will be post-1.0, though.  Best,
>
> Adam
>
> On May 14, 2009, at 7:12 AM, Ben Cohen wrote:
>
>> Hi all --
>>
>> This is my first message to the list.  I've been watching it for a little
>> while now and so far everything I read about the design of couchdb I like a
>> lot!  Thanks so much for all the cool work!
>>
>> One of the uses I'm planning for couchdb involves replicating a database
>> across a slow, unreliable link which will never become anything other than
>> slow and unreliable.  I understand the replication is incremental and
>> designed to 'pick up where it left off' in the case of replication
>> interruption.  From the technical overview on the website:
>>
>>> The replication process is incremental. At the database level,
>>> replication only examines documents updated since the last replication. Then
>>> for each updated document, only fields and blobs that have changed are
>>> replicated across the network. If replication fails at any step, due to
>>> network problems or crash for example, the next replication restarts at the
>>> same document where it left off.

Is this actually accurate? It suggests that documents are replicated
one-by-one and that replication can be interrupted at any point and
will continue from wherever it got to before the interruption.

Firstly, I believe the whole replication has to complete before any
updates are visible in the target database. If I restart the server in
charge of replication and then restart the replication it always seems
to start from the beginning. i.e. the Futon's "Processed source update
#xxx" status starts from 0 (when replicating an empty database).

Secondly, if the network connection fails in the middle of replication
(closing an ssh tunnel is a good way to test this ;-)) then it seems
to retry a few (10) times before the replicator process terminates. If
the network connection becomes available again (restart the ssh
tunnel) the replicator doesn't seem to notice. Also, I just noticed
that Futon still lists the replication on its status page.

If I'm correct, and I really hope I'm missing something, then
couchdb's replication is probably not currently suitable for
replicating anything but very small database differences over an
unstable connection. Does anyone have any real experience in this sort
of scenario?

- Matt

>>>
>> I've got a question about this process.  Say you have a document to be
>> replicated with a 1 megabyte attachment.  A replication process starts, half
>> the doc is transferred successfully and then the connection dies.  Assuming
>> no changes to the source doc, when the replication restarts will the
>> transfer start from the beginning of the document or will it pick up
>> somewhere within the doc?
>>
>> For my use case I have a slow link that will periodically come online for
>> a certain fixed amount of time and initiate a replication.  If the
>> replication isn't incremental 'within' a single document, then a document in
>> the database above a certain size will for me, never make it across and I
>> would imagine cause the replication to never make forward progress ...
>>
>> Does couchdb's replication magic avoid the issue for me and eventually
>> transfer the document across my link?
>>
>> Thanks much,
>> Ben Cohen
>
>

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Adam Kocoloski <ko...@apache.org>.

Hi Ben, welcome!  At the moment, CouchDB does not have any capacity  
for intra-document replication checkpointing.  And you're right, in  
the specific situation you describe Couch would have a difficult time  
making any replication progress.

Given that replication over slow, unreliable links is absolutely a  
CouchDB design goal, I think we might eventually conjure up some more  
magic to make some sort of intra-document (or at least intra- 
attachment) checkpointing possible.  I think it will be post-1.0,  
though.  Best,

Adam

On May 14, 2009, at 7:12 AM, Ben Cohen wrote:

> Hi all --
>
> This is my first message to the list.  I've been watching it for a  
> little while now and so far everything I read about the design of  
> couchdb I like a lot!  Thanks so much for all the cool work!
>
> One of the uses I'm planning for couchdb involves replicating a  
> database across a slow, unreliable link which will never become  
> anything other than slow and unreliable.  I understand the  
> replication is incremental and designed to 'pick up where it left  
> off' in the case of replication interruption.  From the technical  
> overview on the website:
>
>> The replication process is incremental. At the database level,  
>> replication only examines documents updated since the last  
>> replication. Then for each updated document, only fields and blobs  
>> that have changed are replicated across the network. If replication  
>> fails at any step, due to network problems or crash for example,  
>> the next replication restarts at the same document where it left off.
>>
> I've got a question about this process.  Say you have a document to  
> be replicated with a 1 megabyte attachment.  A replication process  
> starts, half the doc is transferred successfully and then the  
> connection dies.  Assuming no changes to the source doc, when the  
> replication restarts will the transfer start from the beginning of  
> the document or will it pick up somewhere within the doc?
>
> For my use case I have a slow link that will periodically come  
> online for a certain fixed amount of time and initiate a  
> replication.  If the replication isn't incremental 'within' a single  
> document, then a document in the database above a certain size will  
> for me, never make it across and I would imagine cause the  
> replication to never make forward progress ...
>
> Does couchdb's replication magic avoid the issue for me and  
> eventually transfer the document across my link?
>
> Thanks much,
> Ben Cohen

Re: Incremental replication over unreliable link -- how granular is replication restart

Posted by Elf <el...@gmail.com>.

Use rsync, Luke :)
It work over unreliable links, can resume aborted and partial files.

2009/5/14 Ben Cohen <co...@gmail.com>:
> Hi all --
>
> This is my first message to the list.  I've been watching it for a little
> while now and so far everything I read about the design of couchdb I like a
> lot!  Thanks so much for all the cool work!
>
> One of the uses I'm planning for couchdb involves replicating a database
> across a slow, unreliable link which will never become anything other than
> slow and unreliable.  I understand the replication is incremental and
> designed to 'pick up where it left off' in the case of replication
> interruption.  From the technical overview on the website:
>
>> The replication process is incremental. At the database level, replication
>> only examines documents updated since the last replication. Then for each
>> updated document, only fields and blobs that have changed are replicated
>> across the network. If replication fails at any step, due to network
>> problems or crash for example, the next replication restarts at the same
>> document where it left off.
>>
> I've got a question about this process.  Say you have a document to be
> replicated with a 1 megabyte attachment.  A replication process starts, half
> the doc is transferred successfully and then the connection dies.  Assuming
> no changes to the source doc, when the replication restarts will the
> transfer start from the beginning of the document or will it pick up
> somewhere within the doc?
>
> For my use case I have a slow link that will periodically come online for a
> certain fixed amount of time and initiate a replication.  If the replication
> isn't incremental 'within' a single document, then a document in the
> database above a certain size will for me, never make it across and I would
> imagine cause the replication to never make forward progress ...
>
> Does couchdb's replication magic avoid the issue for me and eventually
> transfer the document across my link?
>
> Thanks much,
> Ben Cohen



-- 
----------------
Best regards
Elf
mailto:elf2001@gmail.com