You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Foucauld Degeorges <fo...@gmail.com> on 2015/07/29 16:40:01 UTC

Using _active_tasks to track continuous replication status

Hi all

I'm trying to figure out a way to know whether a document has been
effectively replicated (in continuous mode) to a remote CouchDB, with a
minimum of network requests.
My first idea would be to compare the date of the document creation /
modification against the date of the last successful replication . How do I
get the latter?

The _actve_tasks api gives me a updated_on value which seems to just be the
date the _active_tasks document was generated, but the rest of the doc
might indicate other things, like "that last replication didn't go so
well". There are also sequence numbers which look like they tell me the
last sequence that was succesfully replicated. I find the documentation
unclear about _active_tasks.

Has anyone done this before?

Cheers
Foucauld Degeorges

Re: Using _active_tasks to track continuous replication status

Posted by Adam Kocoloski <ko...@apache.org>.
Hi Foucauld, unfortunately it’s not a safe assumption. The replicator will by default parallelize the requests between a given source and target, and in the course of that parallelization it may update documents on a target in a different order than they originally appeared on the source.

The problem you’re describing is one I’ve given some thought to in the context of internal replications in a CouchDB 2.0 cluster, where it’s important to be able to compute what fraction of documents in a database have been durably stored on a minimum number of replicas. One thing we know is important to track is the minimum sequence in a target database that wholly contains a peer sequence in the source database; e.g. what’s the sequence in B at the moment that a given checkpointed_source_seq is recorded for an A -> B replication? My recollection is that the replicator does _not_ track that bit of information today - but it could.

Regards, Adam

> On Jul 30, 2015, at 4:19 AM, Foucauld Degeorges <fo...@gmail.com> wrote:
> 
> Awesome, thanks.
> Now, to push it a bit further, let's suppose I have three databases in
> replication : A -> B -> C, and I need to know which document created in A
> is the last one that got to C, so I can display a "synchronized / not
> synchronized" symbol next to the various documents in my view.
> 
> To know which document got to C, here is my idea: using B's _active_tasks I
> know which update sequence of B got to C. I query B's _changes feed to know
> what document/revision that update sequence concerns. I query that same
> document/revision on A using the local_seq option. Now I know which update
> sequence of A is the last one that got to C, assuming that* sequences are
> kept in the same order* through two replications.
> 
> Is that a safe assumption? Or can anyone think of a simpler way to do this,
> without having to HEAD each document individually?
> 
> Foucauld
> 
> 
> 2015-07-29 17:18 GMT+02:00 Alexander Shorin <kx...@gmail.com>:
> 
>> On Wed, Jul 29, 2015 at 6:13 PM, Foucauld Degeorges <fo...@gmail.com>
>> wrote:
>>> Just to clarify, by "completed sequence" you mean
>> checkpointed_source_seq,
>>> is that right ?
>> 
>> Yes. This is an update sequence of source which is recorded in
>> checkpoint what means that all the data up to that value are already
>> replicated.
>> 
>> --
>> ,,,^..^,,,
>> 


Re: Using _active_tasks to track continuous replication status

Posted by Foucauld Degeorges <fo...@gmail.com>.
Awesome, thanks.
Now, to push it a bit further, let's suppose I have three databases in
replication : A -> B -> C, and I need to know which document created in A
is the last one that got to C, so I can display a "synchronized / not
synchronized" symbol next to the various documents in my view.

To know which document got to C, here is my idea: using B's _active_tasks I
know which update sequence of B got to C. I query B's _changes feed to know
what document/revision that update sequence concerns. I query that same
document/revision on A using the local_seq option. Now I know which update
sequence of A is the last one that got to C, assuming that* sequences are
kept in the same order* through two replications.

Is that a safe assumption? Or can anyone think of a simpler way to do this,
without having to HEAD each document individually?

Foucauld


2015-07-29 17:18 GMT+02:00 Alexander Shorin <kx...@gmail.com>:

> On Wed, Jul 29, 2015 at 6:13 PM, Foucauld Degeorges <fo...@gmail.com>
> wrote:
> > Just to clarify, by "completed sequence" you mean
> checkpointed_source_seq,
> > is that right ?
>
> Yes. This is an update sequence of source which is recorded in
> checkpoint what means that all the data up to that value are already
> replicated.
>
> --
> ,,,^..^,,,
>

Re: Using _active_tasks to track continuous replication status

Posted by Alexander Shorin <kx...@gmail.com>.
On Wed, Jul 29, 2015 at 6:13 PM, Foucauld Degeorges <fo...@gmail.com> wrote:
> Just to clarify, by "completed sequence" you mean checkpointed_source_seq,
> is that right ?

Yes. This is an update sequence of source which is recorded in
checkpoint what means that all the data up to that value are already
replicated.

--
,,,^..^,,,

Re: Using _active_tasks to track continuous replication status

Posted by Foucauld Degeorges <fo...@gmail.com>.
Thank you, option 2 is perfect !
Just to clarify, by "completed sequence" you mean checkpointed_source_seq,
is that right ?

Foucauld

2015-07-29 16:46 GMT+02:00 Alexander Shorin <kx...@gmail.com>:

> Hi,
>
> Two ways:
>
> 1) Get document revision on source side. Continuously HEAD request
> that document with the same revision on the target side until 200 OK
> response.
> 2) Get document local_seq on source side. Watch replication status on
> _active_tasks until completed sequence will be equal or great
> document`s one.
>
> --
> ,,,^..^,,,
>
>
> On Wed, Jul 29, 2015 at 5:40 PM, Foucauld Degeorges <fo...@gmail.com>
> wrote:
> > Hi all
> >
> > I'm trying to figure out a way to know whether a document has been
> > effectively replicated (in continuous mode) to a remote CouchDB, with a
> > minimum of network requests.
> > My first idea would be to compare the date of the document creation /
> > modification against the date of the last successful replication . How
> do I
> > get the latter?
> >
> > The _actve_tasks api gives me a updated_on value which seems to just be
> the
> > date the _active_tasks document was generated, but the rest of the doc
> > might indicate other things, like "that last replication didn't go so
> > well". There are also sequence numbers which look like they tell me the
> > last sequence that was succesfully replicated. I find the documentation
> > unclear about _active_tasks.
> >
> > Has anyone done this before?
> >
> > Cheers
> > Foucauld Degeorges
>

Re: Using _active_tasks to track continuous replication status

Posted by Alexander Shorin <kx...@gmail.com>.
Hi,

Two ways:

1) Get document revision on source side. Continuously HEAD request
that document with the same revision on the target side until 200 OK
response.
2) Get document local_seq on source side. Watch replication status on
_active_tasks until completed sequence will be equal or great
document`s one.

--
,,,^..^,,,


On Wed, Jul 29, 2015 at 5:40 PM, Foucauld Degeorges <fo...@gmail.com> wrote:
> Hi all
>
> I'm trying to figure out a way to know whether a document has been
> effectively replicated (in continuous mode) to a remote CouchDB, with a
> minimum of network requests.
> My first idea would be to compare the date of the document creation /
> modification against the date of the last successful replication . How do I
> get the latter?
>
> The _actve_tasks api gives me a updated_on value which seems to just be the
> date the _active_tasks document was generated, but the rest of the doc
> might indicate other things, like "that last replication didn't go so
> well". There are also sequence numbers which look like they tell me the
> last sequence that was succesfully replicated. I find the documentation
> unclear about _active_tasks.
>
> Has anyone done this before?
>
> Cheers
> Foucauld Degeorges