You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@couchdb.apache.org by gi...@git.apache.org on 2017/04/04 23:07:09 UTC

[GitHub] nickva opened a new pull request #470: 63012 scheduler

nickva opened a new pull request #470: 63012 scheduler
URL: https://github.com/apache/couchdb/pull/470

Introduce Scheduling CouchDB Replicator

Jira: https://issues.apache.org/jira/browse/COUCHDB-3324

The core of the new replicator is a scheduler. It which allows running a large number of replication jobs by switching between them, stopping some and starting others periodically. Jobs which fail are backed off exponentially. There is also an improved inspection and querying API: `_scheduler/jobs` and `_scheduler/docs`.

Replication protocol hasn't change so it is possible to replicate between CouchDB 1.x, 2.x, PouchDB, and other implementations of CouchDB replication protocol.

## Scheduler

Scheduler allows running a large number of replication jobs. Tested with up to 100k replication jobs in 3 node cluster. Replication jobs are run in a fair, round-robin fashion. Scheduler behavior can be configured by these configuration options in `[replicator]` sections:

* `max_jobs` : Number of actively running replications. Making this too high
could cause performance issues. Making it too low could mean replications jobs
might not have enough time to make progress before getting unscheduled again.
This parameter can be adjusted at runtime and will take effect during next
rescheduling cycle.

* `interval` : Scheduling interval in milliseconds. During each reschedule
cycle scheduler might start or stop up to "max_churn" number of jobs.

* `max_churn` : Maximum number of replications to start and stop during
rescheduling. This parameter along with "interval" defines the rate of job
replacement. During startup, however a much larger number of jobs could be
started (up to max_jobs) in a short period of time.

## _scheduler/{jobs,docs} API

There is an improved replication state querying API, with a focus on ease of use and performance. The new API avoids having to update the replication document with transient state updates. In production that can lead to conflicts and performance issues. The two new APIs are:

* `_scheduler/jobs` : This endpoint shows active replication jobs. These are jobs managed by the scheduler. Some of them might be running, some might be waiting to run, or backed off (penalized) because they crashed too many times. Semantically this is somewhat equivalent to `_active_tasks` but focuses only on replications. Jobs which have completed or which were never created because of malformed replication document will not be shown here as they are not managed by the scheduler. `_replicate` replications, started form `_replicate` endpoint not from a document in a `_replicator` db, will also show up here.

* `_scheduler/docs` : This endpoint is an improvement on having to go back and re-read replication document to query their state. It represents the state of all the replications started from documents in `_replicator` dbs. Unlike `_scheduler/jobs` it will also show jobs which have failed or completed (that is, which are not managed by the scheduler anymore).

## Compatibility Mode

Understandably some customers are using the document-based API to query replication states (`triggered`, `error`, `completed` etc). To ease the upgrade path, there is a compatibility configuration setting:

```
[replicator]
update_docs = false | true
```
It defaults to `false` but when set to `true` it will continue updating replication document with the state of the replication jobs.

## Other Improvements

* Network resource usage and performance was improved by implementing a common connection pool. This should help in cases of a large number of connections to the same sources or target. Previously connection pools were shared only withing a single replication job.

* Improved rate limiting handling. Replicator requests will auto-discover rate limit capacity on target and sources based on a proven Additive Increase / Multiplicative Decrease feedback control algorithm.

* Improve performance by avoiding repeatedly retrying failing replication jobs. Instead use exponential backoff. In a large multi-user cluster, quite a few replication jobs are invalid, are crashing or failing (for various reasons such as inability to checkpoint to source, mismatched credentials, missing databases). Penalizing failing replication will free up system resources for more useful work.

* Improve recovery from long but temporary network failure. Currently if replications jobs fail to start 10 times in a row, they will not be retried anymore. This is sometimes desirable, but in some scenarios (for example, after a sustained DNS failure which eventually recovers), replications reach their retry limit and cease to work. Previously it required operator intervention to continue. Scheduling replicator will never give up retrying a valid scheduled replication job. So it should recover automatically.

* Better handling of filtered replications: Failing to fetch filters could block couch replicator manager, lead to message queue backups and memory exhaustion. Also, when replication filter code changes update replication accordingly (replication job ID should change in that case). This PR fixes both of those issues. Filters and replication ID calculation are managed in separate worker threads (in couch replicator doc processor). Also, when filter code changes on the source, replication jobs will restart and re-calculate their new ID automatically.

## Related PR:

Documentation PR: WIP

## Tests

* EUnit test coverage. Some modules such as multidb_changes, have close to 100% coverage. Some have less. All previous replication tests have been updated to work with the new scheduling replicator.

* Except for one modification, existing Javascript tests pass.

* Additional integration tests. To validate, test and benchmark some edge cases, an additional toolkit was created: https://github.com/cloudant-labs/couchdyno/blob/master/README_tests.md which allows testing scenarios where nodes fail, creation of large number of replication jobs, manipulation of cluster configurations, and setting up long running (soak) tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services