You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Nick Vatamaniuc <va...@apache.org> on 2019/10/31 19:45:50 UTC

[VOTE] Couch replicator _scheduler/jobs and _scheduler/docs API change

Hi everyone,

For CouchDB 3.0 I am proposing to add all the replication status
details from _active_tasks like "docs_written", "docs_read", etc to
_scheduler/jobs and _scheduler/docs endpoints.

Couch replicator scheduler has been out for a few years already. It
added flexibility, performance and other features. One of the goals of
the scheduler was to provide a better set of APIs for monitoring
replication job progress. Those APIs are _scheduler/jobs and
_scheduler/docs on top of the already existing _active_tasks and just
polling _replicator db documents.

_scheduler/jobs is focused on replication jobs, their history, how
many times they started, stopped, their state. It returns replications
jobs indexed by their job ids, and includes jobs started from
_replicator dbs as well as from the _replicate HTTP endpoint.

_scheduler/docs in turn, is focused on showing the state of
replication documents. It returns replication jobs indexed by
_replicator db and doc id which started the jobs.

_scheduler/docs API was returning an "info" field for each job. That
info field was designed to contain details pertaining to the current
state of the job. So if there is an error it will have the error
message. For a completed job, it was showing the replication stats
persisted to the replication document, so that users do not have to
poll their status.

However, when the jobs were running or were in a pending state that
field was "null". The idea was to eventually replace "null" with stats
from _active_tasks such that users don't need to ever look at
_active_tasks to monitor their replication progress. So the proposal
here is to do exactly that.

To be specific:

1) In the "pending" or "running" state _scheduler/docs will return
active_tasks status values in the "info" field. ("completed" is
already doing that btw).

2) _scheduler/jobs will have a new "info" field which also show the
same status values

There is a draft PR for it

https://github.com/apache/couchdb/pull/2292#issuecomment-548526613

Some examples of what it looks like:

https://github.com/apache/couchdb/pull/2292#issuecomment-548526346
https://github.com/apache/couchdb/pull/2292#issuecomment-548526613

What does everyone think?

Let's do a lazy consensus since we are replacing a place-holder null
value and adding a new field to _scheduler jobs. Let me know if this
is not appropriate and we need a different voting / discussion
procedure.

Cheers,
-Nick

Re: [VOTE] Couch replicator _scheduler/jobs and _scheduler/docs API change

Posted by Jan Lehnardt <ja...@apache.org>.
Solid, +1!

Best
Jan
—

> On 31. Oct 2019, at 20:45, Nick Vatamaniuc <va...@apache.org> wrote:
> 
> Hi everyone,
> 
> For CouchDB 3.0 I am proposing to add all the replication status
> details from _active_tasks like "docs_written", "docs_read", etc to
> _scheduler/jobs and _scheduler/docs endpoints.
> 
> Couch replicator scheduler has been out for a few years already. It
> added flexibility, performance and other features. One of the goals of
> the scheduler was to provide a better set of APIs for monitoring
> replication job progress. Those APIs are _scheduler/jobs and
> _scheduler/docs on top of the already existing _active_tasks and just
> polling _replicator db documents.
> 
> _scheduler/jobs is focused on replication jobs, their history, how
> many times they started, stopped, their state. It returns replications
> jobs indexed by their job ids, and includes jobs started from
> _replicator dbs as well as from the _replicate HTTP endpoint.
> 
> _scheduler/docs in turn, is focused on showing the state of
> replication documents. It returns replication jobs indexed by
> _replicator db and doc id which started the jobs.
> 
> _scheduler/docs API was returning an "info" field for each job. That
> info field was designed to contain details pertaining to the current
> state of the job. So if there is an error it will have the error
> message. For a completed job, it was showing the replication stats
> persisted to the replication document, so that users do not have to
> poll their status.
> 
> However, when the jobs were running or were in a pending state that
> field was "null". The idea was to eventually replace "null" with stats
> from _active_tasks such that users don't need to ever look at
> _active_tasks to monitor their replication progress. So the proposal
> here is to do exactly that.
> 
> To be specific:
> 
> 1) In the "pending" or "running" state _scheduler/docs will return
> active_tasks status values in the "info" field. ("completed" is
> already doing that btw).
> 
> 2) _scheduler/jobs will have a new "info" field which also show the
> same status values
> 
> There is a draft PR for it
> 
> https://github.com/apache/couchdb/pull/2292#issuecomment-548526613
> 
> Some examples of what it looks like:
> 
> https://github.com/apache/couchdb/pull/2292#issuecomment-548526346
> https://github.com/apache/couchdb/pull/2292#issuecomment-548526613
> 
> What does everyone think?
> 
> Let's do a lazy consensus since we are replacing a place-holder null
> value and adding a new field to _scheduler jobs. Let me know if this
> is not appropriate and we need a different voting / discussion
> procedure.
> 
> Cheers,
> -Nick

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


Re: [VOTE] Couch replicator _scheduler/jobs and _scheduler/docs API change

Posted by Nick Vatamaniuc <va...@gmail.com>.
The vote passes after 72+hours with

2 : +1s,
0 : -1s,

Thank you!
-Nick

On Fri, Nov 1, 2019 at 2:04 PM Jay Doane <ja...@apache.org> wrote:
>
> Makes sense to me. +1
>
> On Thu, Oct 31, 2019 at 12:46 PM Nick Vatamaniuc <va...@apache.org>
> wrote:
>
> > Hi everyone,
> >
> > For CouchDB 3.0 I am proposing to add all the replication status
> > details from _active_tasks like "docs_written", "docs_read", etc to
> > _scheduler/jobs and _scheduler/docs endpoints.
> >
> > Couch replicator scheduler has been out for a few years already. It
> > added flexibility, performance and other features. One of the goals of
> > the scheduler was to provide a better set of APIs for monitoring
> > replication job progress. Those APIs are _scheduler/jobs and
> > _scheduler/docs on top of the already existing _active_tasks and just
> > polling _replicator db documents.
> >
> > _scheduler/jobs is focused on replication jobs, their history, how
> > many times they started, stopped, their state. It returns replications
> > jobs indexed by their job ids, and includes jobs started from
> > _replicator dbs as well as from the _replicate HTTP endpoint.
> >
> > _scheduler/docs in turn, is focused on showing the state of
> > replication documents. It returns replication jobs indexed by
> > _replicator db and doc id which started the jobs.
> >
> > _scheduler/docs API was returning an "info" field for each job. That
> > info field was designed to contain details pertaining to the current
> > state of the job. So if there is an error it will have the error
> > message. For a completed job, it was showing the replication stats
> > persisted to the replication document, so that users do not have to
> > poll their status.
> >
> > However, when the jobs were running or were in a pending state that
> > field was "null". The idea was to eventually replace "null" with stats
> > from _active_tasks such that users don't need to ever look at
> > _active_tasks to monitor their replication progress. So the proposal
> > here is to do exactly that.
> >
> > To be specific:
> >
> > 1) In the "pending" or "running" state _scheduler/docs will return
> > active_tasks status values in the "info" field. ("completed" is
> > already doing that btw).
> >
> > 2) _scheduler/jobs will have a new "info" field which also show the
> > same status values
> >
> > There is a draft PR for it
> >
> > https://github.com/apache/couchdb/pull/2292#issuecomment-548526613
> >
> > Some examples of what it looks like:
> >
> > https://github.com/apache/couchdb/pull/2292#issuecomment-548526346
> > https://github.com/apache/couchdb/pull/2292#issuecomment-548526613
> >
> > What does everyone think?
> >
> > Let's do a lazy consensus since we are replacing a place-holder null
> > value and adding a new field to _scheduler jobs. Let me know if this
> > is not appropriate and we need a different voting / discussion
> > procedure.
> >
> > Cheers,
> > -Nick
> >

Re: [VOTE] Couch replicator _scheduler/jobs and _scheduler/docs API change

Posted by Jay Doane <ja...@apache.org>.
Makes sense to me. +1

On Thu, Oct 31, 2019 at 12:46 PM Nick Vatamaniuc <va...@apache.org>
wrote:

> Hi everyone,
>
> For CouchDB 3.0 I am proposing to add all the replication status
> details from _active_tasks like "docs_written", "docs_read", etc to
> _scheduler/jobs and _scheduler/docs endpoints.
>
> Couch replicator scheduler has been out for a few years already. It
> added flexibility, performance and other features. One of the goals of
> the scheduler was to provide a better set of APIs for monitoring
> replication job progress. Those APIs are _scheduler/jobs and
> _scheduler/docs on top of the already existing _active_tasks and just
> polling _replicator db documents.
>
> _scheduler/jobs is focused on replication jobs, their history, how
> many times they started, stopped, their state. It returns replications
> jobs indexed by their job ids, and includes jobs started from
> _replicator dbs as well as from the _replicate HTTP endpoint.
>
> _scheduler/docs in turn, is focused on showing the state of
> replication documents. It returns replication jobs indexed by
> _replicator db and doc id which started the jobs.
>
> _scheduler/docs API was returning an "info" field for each job. That
> info field was designed to contain details pertaining to the current
> state of the job. So if there is an error it will have the error
> message. For a completed job, it was showing the replication stats
> persisted to the replication document, so that users do not have to
> poll their status.
>
> However, when the jobs were running or were in a pending state that
> field was "null". The idea was to eventually replace "null" with stats
> from _active_tasks such that users don't need to ever look at
> _active_tasks to monitor their replication progress. So the proposal
> here is to do exactly that.
>
> To be specific:
>
> 1) In the "pending" or "running" state _scheduler/docs will return
> active_tasks status values in the "info" field. ("completed" is
> already doing that btw).
>
> 2) _scheduler/jobs will have a new "info" field which also show the
> same status values
>
> There is a draft PR for it
>
> https://github.com/apache/couchdb/pull/2292#issuecomment-548526613
>
> Some examples of what it looks like:
>
> https://github.com/apache/couchdb/pull/2292#issuecomment-548526346
> https://github.com/apache/couchdb/pull/2292#issuecomment-548526613
>
> What does everyone think?
>
> Let's do a lazy consensus since we are replacing a place-holder null
> value and adding a new field to _scheduler jobs. Let me know if this
> is not appropriate and we need a different voting / discussion
> procedure.
>
> Cheers,
> -Nick
>