You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Daniel Gonzalez <go...@gonvaled.com> on 2012/04/27 12:03:28 UTC

Replications stopping unexpectedly

Hello,

I will describe my problem in a general way. If more details are needed, I
will try to gather them from my production environments.
We have several couchdb instances, with a bunch of databases. Some of these
databases are connected via replication.
Some of the replications are working via an ssh-tunnel, others by direct
internet connection. The latency between couchdb instances ranges between
few milliseconds to up de several hundreds of milliseconds.

My problem is that it is very common for the replications to stop. It could
due to connectivity being lost (sometimes the ssh tunnels fail and must be
recreated), but this is not the only reason.

And worse: the replications are not restarted automatically. They stay in
error. The problem is so frequent that I have a replication monitor process
looking for erroneous replications, and deleting and recreating the
replication documents of those replications in error, every 5 minutes. This
is the only method I have found to reliably restart the replications.

Is somebody else experiencing similar problems? Do you have any suggestion
on how to make replications more robust in front of connectivity issues?
Are there other methods to
restart erroneous replications, apart from redefining them?

Thanks,
Daniel Gonzalez

Re: Replications stopping unexpectedly

Posted by Robert Newson <rn...@apache.org>.

Have you seen this?

http://wiki.apache.org/couchdb/Replication#Replicator_database

B.

On 27 April 2012 11:03, Daniel Gonzalez <go...@gonvaled.com> wrote:
> Hello,
>
> I will describe my problem in a general way. If more details are needed, I
> will try to gather them from my production environments.
> We have several couchdb instances, with a bunch of databases. Some of these
> databases are connected via replication.
> Some of the replications are working via an ssh-tunnel, others by direct
> internet connection. The latency between couchdb instances ranges between
> few milliseconds to up de several hundreds of milliseconds.
>
> My problem is that it is very common for the replications to stop. It could
> due to connectivity being lost (sometimes the ssh tunnels fail and must be
> recreated), but this is not the only reason.
>
> And worse: the replications are not restarted automatically. They stay in
> error. The problem is so frequent that I have a replication monitor process
> looking for erroneous replications, and deleting and recreating the
> replication documents of those replications in error, every 5 minutes. This
> is the only method I have found to reliably restart the replications.
>
> Is somebody else experiencing similar problems? Do you have any suggestion
> on how to make replications more robust in front of connectivity issues?
> Are there other methods to
> restart erroneous replications, apart from redefining them?
>
> Thanks,
> Daniel Gonzalez

Re: Replications stopping unexpectedly

Posted by Johan Wester <jo...@gmail.com>.

Hi Daniel,

I experience the same problem. Using plain http, not using ssh tunnels. For
now, I also made a replication monitoring process that checks and if needed
re-creates the replication.
But of course that shouldn't be necessary.

Johan Wester

2012/4/27 Daniel Gonzalez <go...@gonvaled.com>

> Hello,
>
> I will describe my problem in a general way. If more details are needed, I
> will try to gather them from my production environments.
> We have several couchdb instances, with a bunch of databases. Some of these
> databases are connected via replication.
> Some of the replications are working via an ssh-tunnel, others by direct
> internet connection. The latency between couchdb instances ranges between
> few milliseconds to up de several hundreds of milliseconds.
>
> My problem is that it is very common for the replications to stop. It could
> due to connectivity being lost (sometimes the ssh tunnels fail and must be
> recreated), but this is not the only reason.
>
> And worse: the replications are not restarted automatically. They stay in
> error. The problem is so frequent that I have a replication monitor process
> looking for erroneous replications, and deleting and recreating the
> replication documents of those replications in error, every 5 minutes. This
> is the only method I have found to reliably restart the replications.
>
> Is somebody else experiencing similar problems? Do you have any suggestion
> on how to make replications more robust in front of connectivity issues?
> Are there other methods to
> restart erroneous replications, apart from redefining them?
>
> Thanks,
> Daniel Gonzalez
>