You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Steve Koppelman <st...@gmail.com> on 2012/10/04 16:39:38 UTC

Can I trust _replicator state in 1.2.0

Running couchdb 0.10, 1.0.1 and 1.1.1 over the last couple of years, I
got very familiar with the shortcomings of continuous replication.
Replication would simply stop without warning at some point, but the
replication document's status would remain "triggered". Eventually, I
just set up a cron job that periodically wiped out the entire
_replicator database and repopulated a new one. It was horrible, but
was less of a hassle than writing code to iterate through all the
replication relationships and compare the nodes' last-updated-document
timestamps for applications that could deal with a few minutes'
replication lag.

As I build out a new cluster, I'm curious to know whether it's safe to
trust a replication job's reported state under 1.2.0, or if there's
another recommended way to go. This isn't a banking or airline booking
system, so some replication lag is fine as long as there's eventual
consistency.

Rgds., etc.

-sk

Re: Can I trust _replicator state in 1.2.0

Posted by Bob Dionne <di...@dionne-associates.com>.

I think you can, though you'll need to evaluate this. 

We've been working with the latest[1] from 1.2 with reasonable results. There are still issues with occasional hangs on long running, filtered continuous replications and I think
also with how attachments are streamed. Performance could and will get better. The replicator touches a lot of the API so I think it's a place where lots of problems surface. For example, using BIFs from the binary module[2] helps ibrowse and couch_http:find_in_binary run a bit faster. When time permits these changes will find their way into CouchDB.

So I'd say give it a shot.

YMMV,

Bob

[1] https://github.com/cloudant/couch_replicator
[2] https://github.com/cloudant/ibrowse/commit/cc1f8e84a669

On Oct 4, 2012, at 10:39 AM, Steve Koppelman <st...@gmail.com> wrote:

> Running couchdb 0.10, 1.0.1 and 1.1.1 over the last couple of years, I
> got very familiar with the shortcomings of continuous replication.
> Replication would simply stop without warning at some point, but the
> replication document's status would remain "triggered". Eventually, I
> just set up a cron job that periodically wiped out the entire
> _replicator database and repopulated a new one. It was horrible, but
> was less of a hassle than writing code to iterate through all the
> replication relationships and compare the nodes' last-updated-document
> timestamps for applications that could deal with a few minutes'
> replication lag.
> 
> As I build out a new cluster, I'm curious to know whether it's safe to
> trust a replication job's reported state under 1.2.0, or if there's
> another recommended way to go. This isn't a banking or airline booking
> system, so some replication lag is fine as long as there's eventual
> consistency.
> 
> Rgds., etc.
> 
> -sk

Re: Can I trust _replicator state in 1.2.0

Posted by Benoit Chesneau <bc...@gmail.com>.

On Fri, Oct 5, 2012 at 2:57 AM, Dave Cottlehuber <dc...@jsonified.com> wrote:

>>
>
> Are you able to open a jira ticket with what's not been working, and
> ideally some logs or errors or other information?
>
> I'm not a huge replication user myself, but I have heard of reports of
> leakage in our usage of mochiweb with continuous replication, and
> possibly SSL. But my recollection might be wrong.
>
> Benoit, Paul - ring any bells?

I would be interresed by a more detailled feedback.
>
> A+
> Dave

There are some fixes for status updates and otther things like http
streaming in rcouch. couch_replicator needs some love. Making the code
simpler would help I think.

- benoît

Re: Can I trust _replicator state in 1.2.0

Posted by Dave Cottlehuber <dc...@jsonified.com>.

On 4 October 2012 21:56, stephen bartell <sn...@gmail.com> wrote:
> +1 Im curious on this one too.
>
> Currently we have implemented our own replicator.  It'd be nice to move to couch's own replicator.
>
> Stephen Bartell
>
> "The significant problems we face cannot be solved at the same level of thinking we were at when we created them." -Einstein
>
> On Oct 4, 2012, at 7:39 AM, Steve Koppelman wrote:
>
>> Running couchdb 0.10, 1.0.1 and 1.1.1 over the last couple of years, I
>> got very familiar with the shortcomings of continuous replication.
>> Replication would simply stop without warning at some point, but the
>> replication document's status would remain "triggered". Eventually, I
>> just set up a cron job that periodically wiped out the entire
>> _replicator database and repopulated a new one. It was horrible, but
>> was less of a hassle than writing code to iterate through all the
>> replication relationships and compare the nodes' last-updated-document
>> timestamps for applications that could deal with a few minutes'
>> replication lag.
>>
>> As I build out a new cluster, I'm curious to know whether it's safe to
>> trust a replication job's reported state under 1.2.0, or if there's
>> another recommended way to go. This isn't a banking or airline booking
>> system, so some replication lag is fine as long as there's eventual
>> consistency.
>>
>> Rgds., etc.
>>
>> -sk
>

Are you able to open a jira ticket with what's not been working, and
ideally some logs or errors or other information?

I'm not a huge replication user myself, but I have heard of reports of
leakage in our usage of mochiweb with continuous replication, and
possibly SSL. But my recollection might be wrong.

Benoit, Paul - ring any bells?

A+
Dave

Re: Can I trust _replicator state in 1.2.0

Posted by stephen bartell <sn...@gmail.com>.

+1 Im curious on this one too.

Currently we have implemented our own replicator.  It'd be nice to move to couch's own replicator.

Stephen Bartell

"The significant problems we face cannot be solved at the same level of thinking we were at when we created them." -Einstein

On Oct 4, 2012, at 7:39 AM, Steve Koppelman wrote:

> Running couchdb 0.10, 1.0.1 and 1.1.1 over the last couple of years, I
> got very familiar with the shortcomings of continuous replication.
> Replication would simply stop without warning at some point, but the
> replication document's status would remain "triggered". Eventually, I
> just set up a cron job that periodically wiped out the entire
> _replicator database and repopulated a new one. It was horrible, but
> was less of a hassle than writing code to iterate through all the
> replication relationships and compare the nodes' last-updated-document
> timestamps for applications that could deal with a few minutes'
> replication lag.
> 
> As I build out a new cluster, I'm curious to know whether it's safe to
> trust a replication job's reported state under 1.2.0, or if there's
> another recommended way to go. This isn't a banking or airline booking
> system, so some replication lag is fine as long as there's eventual
> consistency.
> 
> Rgds., etc.
> 
> -sk