You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Chris Anderson <jc...@grabb.it> on 2008/07/16 19:51:54 UTC

debugging replication

I've started digging into replication issues a little deeper. This is
what I've found so far, replicating a 2GB database into a fresh
(empty) target database.

I'm triggering the replication via curl, so there are no browser/proxy
timeout issues at play, which I've found can muddy the waters a bit.

The replication starts fine, and as I query the target database, I can
see the doc count going up over time. Occasionally the doc-count stays
the same for a few minutes. After that it will sometimes start back
up, or equally likely, I get a nasty crash with output that looks
like: http://friendpaste.com/g7zRzvPc

If I restart replication by rerunning the curl command, it seems to
pick up where it left off just fine, with the doc count moving up
smoothly for a while, before I get another error. Just now, the one I
got wasn't a crasher, just a replication-stopping failure:
http://friendpaste.com/BBfSPEZm But I've seen this error as the first
in a fresh replication, and the crasher coming after a restarted
replication, so I don't think the order is significant.

Next I'll see if I can trigger the problem on a smaller dataset.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: debugging replication

Posted by Chris Anderson <jc...@grabb.it>.
The fix totally works. No doc-count hiccups, proceeding smoothly
through the 2GB...

On Wed, Jul 16, 2008 at 3:08 PM, Damien Katz <da...@apache.org> wrote:
> Looks like the problem is the replication write queue was growing wildly as
> it can't keep up with the reads. I've checked in a fix that should prevent
> this problem at the possible expense of replication throughput.
>
> -Damien
>
> I've checked in a fix that I think
> On Jul 16, 2008, at 1:51 PM, Chris Anderson wrote:
>
>> I've started digging into replication issues a little deeper. This is
>> what I've found so far, replicating a 2GB database into a fresh
>> (empty) target database.
>>
>> I'm triggering the replication via curl, so there are no browser/proxy
>> timeout issues at play, which I've found can muddy the waters a bit.
>>
>> The replication starts fine, and as I query the target database, I can
>> see the doc count going up over time. Occasionally the doc-count stays
>> the same for a few minutes. After that it will sometimes start back
>> up, or equally likely, I get a nasty crash with output that looks
>> like: http://friendpaste.com/g7zRzvPc
>>
>> If I restart replication by rerunning the curl command, it seems to
>> pick up where it left off just fine, with the doc count moving up
>> smoothly for a while, before I get another error. Just now, the one I
>> got wasn't a crasher, just a replication-stopping failure:
>> http://friendpaste.com/BBfSPEZm But I've seen this error as the first
>> in a fresh replication, and the crasher coming after a restarted
>> replication, so I don't think the order is significant.
>>
>> Next I'll see if I can trigger the problem on a smaller dataset.
>>
>> --
>> Chris Anderson
>> http://jchris.mfdz.com
>
>



-- 
Chris Anderson
http://jchris.mfdz.com

Re: debugging replication

Posted by Damien Katz <da...@apache.org>.
Looks like the problem is the replication write queue was growing  
wildly as it can't keep up with the reads. I've checked in a fix that  
should prevent this problem at the possible expense of replication  
throughput.

-Damien

I've checked in a fix that I think
On Jul 16, 2008, at 1:51 PM, Chris Anderson wrote:

> I've started digging into replication issues a little deeper. This is
> what I've found so far, replicating a 2GB database into a fresh
> (empty) target database.
>
> I'm triggering the replication via curl, so there are no browser/proxy
> timeout issues at play, which I've found can muddy the waters a bit.
>
> The replication starts fine, and as I query the target database, I can
> see the doc count going up over time. Occasionally the doc-count stays
> the same for a few minutes. After that it will sometimes start back
> up, or equally likely, I get a nasty crash with output that looks
> like: http://friendpaste.com/g7zRzvPc
>
> If I restart replication by rerunning the curl command, it seems to
> pick up where it left off just fine, with the doc count moving up
> smoothly for a while, before I get another error. Just now, the one I
> got wasn't a crasher, just a replication-stopping failure:
> http://friendpaste.com/BBfSPEZm But I've seen this error as the first
> in a fresh replication, and the crasher coming after a restarted
> replication, so I don't think the order is significant.
>
> Next I'll see if I can trigger the problem on a smaller dataset.
>
> -- 
> Chris Anderson
> http://jchris.mfdz.com