You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@couchdb.apache.org by va...@apache.org on 2021/01/26 23:19:52 UTC

[couchdb-documentation] branch rfc-017-fair-share-scheduler-3.x created (now 01a9088)

This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a change to branch rfc-017-fair-share-scheduler-3.x
in repository https://gitbox.apache.org/repos/asf/couchdb-documentation.git.


      at 01a9088  fixing links after branch rename (master -> main) (#616)

This branch includes the following new commits:

     new 01a9088  fixing links after branch rename (master -> main) (#616)

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[couchdb-documentation] 01/01: fixing links after branch rename (master -> main) (#616)

Posted by va...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a commit to branch rfc-017-fair-share-scheduler-3.x
in repository https://gitbox.apache.org/repos/asf/couchdb-documentation.git

commit 01a90888105b29bfbd09a718c715029794e035c6
Author: Ronny <ro...@kioskkinder.com>
AuthorDate: Mon Jan 25 11:33:49 2021 +0100

    fixing links after branch rename (master -> main) (#616)
---
 .github/PULL_REQUEST_TEMPLATE.md       |   2 +-
 CONTRIBUTING.md                        |   2 +-
 README.md                              |   2 +-
 rfcs/008-map-indexes.md                |   2 +-
 rfcs/011-opentracing.md                |  10 +-
 rfcs/013-node-types.md                 |   2 +-
 rfcs/016-fdb-replicator.md             |  12 +-
 rfcs/017-fair-share-scheduling.md      | 216 +++++++++++++++++++++++++++++++++
 src/best-practices/reverse-proxies.rst |   2 +-
 src/ddocs/views/collation.rst          |   2 +-
 src/query-server/javascript.rst        |   2 +-
 src/replication/protocol.rst           |   2 +-
 src/setup/cluster.rst                  |   2 +-
 13 files changed, 237 insertions(+), 21 deletions(-)

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index 7a14b07..c538e44 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -36,5 +36,5 @@
 
 ## Checklist
 
-- [ ] Update [rebar.config.script](https://github.com/apache/couchdb/blob/master/rebar.config.script) with the commit hash once this PR is rebased and merged
+- [ ] Update [rebar.config.script](https://github.com/apache/couchdb/blob/main/rebar.config.script) with the commit hash once this PR is rebased and merged
 <!-- Before opening the PR, consider running `make check` locally for a faster turnaround time -->
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 55c41d6..db5aa08 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,4 +1,4 @@
 This repository follows the same contribution guidelines as the
 main Apache CouchDB contribution guidelines:
 
-https://github.com/apache/couchdb/blob/master/CONTRIBUTING.md
+https://github.com/apache/couchdb/blob/main/CONTRIBUTING.md
diff --git a/README.md b/README.md
index fd4efc4..0fd09e6 100644
--- a/README.md
+++ b/README.md
@@ -26,6 +26,6 @@ with issue reporting or contributing to the upkeep of this project.
 
 [1]: http://mail-archives.apache.org/mod_mbox/couchdb-user/
 [2]: http://mail-archives.apache.org/mod_mbox/couchdb-dev/
-[3]: https://github.com/apache/couchdb/blob/master/CONTRIBUTING.md
+[3]: https://github.com/apache/couchdb/blob/main/CONTRIBUTING.md
 
 
diff --git a/rfcs/008-map-indexes.md b/rfcs/008-map-indexes.md
index 991fa2b..723e76d 100644
--- a/rfcs/008-map-indexes.md
+++ b/rfcs/008-map-indexes.md
@@ -165,7 +165,7 @@ In CouchDB 2.x, strings are compared via ICU. The way to do this with Foundation
 
 ### Index building
 
-An index will be built and updated via a [background job worker](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md). When a request for a view is received, the request process will add a job item onto the background queue for the index to be updated. A worker will take the item off the queue and update the index. Once the index has been built, the background job server will notify the request that the index is up to date. The request process will the [...]
+An index will be built and updated via a [background job worker](https://github.com/apache/couchdb-documentation/blob/main/rfcs/007-background-jobs.md). When a request for a view is received, the request process will add a job item onto the background queue for the index to be updated. A worker will take the item off the queue and update the index. Once the index has been built, the background job server will notify the request that the index is up to date. The request process will then  [...]
 
 Initially, the building of an index will be a single worker running through the changes feed and creating the index. In the future, we plan to parallelise that work so that multiple workers could build the index at the same time. This will reduce build times.
 
diff --git a/rfcs/011-opentracing.md b/rfcs/011-opentracing.md
index bf4a059..4c3c300 100644
--- a/rfcs/011-opentracing.md
+++ b/rfcs/011-opentracing.md
@@ -39,8 +39,8 @@ The following HTTP headers would be used to link tracing span with application s
 - b3
 
 More information about the use of these headers can be found [here](https://github.com/openzipkin/b3-propagation).
-Open tracing [specification](https://github.com/opentracing/specification/blob/master/specification.md) 
-has a number of [conventions](https://github.com/opentracing/specification/blob/master/semantic_conventions.md) 
+Open tracing [specification](https://github.com/opentracing/specification/blob/main/specification.md) 
+has a number of [conventions](https://github.com/opentracing/specification/blob/main/semantic_conventions.md) 
 which would be good to follow.
 
 In a nutshell the idea is:
@@ -133,7 +133,7 @@ Following headers on the response would be supported
 
 ## Conventions
 
-The conventions bellow are based on [conventions from opentracing](https://github.com/opentracing/specification/blob/master/semantic_conventions.md#standard-span-tags-and-log-fields).
+The conventions bellow are based on [conventions from opentracing](https://github.com/opentracing/specification/blob/main/semantic_conventions.md#standard-span-tags-and-log-fields).
 All tags are optional since it is just a recomendation from open tracing to hint visualization and filtering tools.
 
 ### Span tags
@@ -224,11 +224,11 @@ The security risk of injecting malicious payload into ini config is mitigated vi
 
 # References
 
-- [opentracing specification](https://github.com/opentracing/specification/blob/master/specification.md)
+- [opentracing specification](https://github.com/opentracing/specification/blob/main/specification.md)
 - https://opentracing.io/
 - https://www.jaegertracing.io/docs/1.14/
 - https://zipkin.io
-- [opentracing conventions](https://github.com/opentracing/specification/blob/master/semantic_conventions.md) 
+- [opentracing conventions](https://github.com/opentracing/specification/blob/main/semantic_conventions.md) 
 
 
 # Acknowledgements
diff --git a/rfcs/013-node-types.md b/rfcs/013-node-types.md
index 9c811e2..5cd3adc 100644
--- a/rfcs/013-node-types.md
+++ b/rfcs/013-node-types.md
@@ -135,7 +135,7 @@ N/A
 
 [1] https://github.com/apache/couchdb/issues/1338
 
-[2] https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md
+[2] https://github.com/apache/couchdb-documentation/blob/main/rfcs/007-background-jobs.md
 
 # Acknowledgments
 
diff --git a/rfcs/016-fdb-replicator.md b/rfcs/016-fdb-replicator.md
index a10ea6e..e53ad7b 100644
--- a/rfcs/016-fdb-replicator.md
+++ b/rfcs/016-fdb-replicator.md
@@ -21,11 +21,11 @@ CouchDB <= 3.x, replication jobs were mapped to individual cluster nodes and a
 scheduler component would run up to `max_jobs` number of jobs at a time on each
 node. The new design proposes using `couch_jobs`, as described in the
 [Background Jobs
-RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md),
+RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/007-background-jobs.md),
 to have a central, FDB-based queue of replication jobs. `couch_jobs`
 application will manage job scheduling and coordination. The new design also
 proposes using heterogeneous node types as defined in the [Node Types
-RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md)
+RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/013-node-types.md)
 such that replication jobs will be created only on `api_frontend` nodes and run
 only on `replication` nodes.
 
@@ -58,12 +58,12 @@ changes feed, then stop.
 
 `api_frontend node` : Database node which has the `api_frontend` type set to
 `true` as described in
-[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+[RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/013-node-types.md).
 Replication jobs can be only be created on these nodes.
 
 `replication node` : Database node which has the `replication` type set to
 `true` as described in
-[RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md).
+[RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/013-node-types.md).
 Replication jobs can only be run on these nodes.
 
 `filtered` replications: Replications with a user-defined filter on the source
@@ -369,9 +369,9 @@ traffic sent out only from those nodes.
 
 # References
 
-* [Background Jobs RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/007-background-jobs.md)
+* [Background Jobs RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/007-background-jobs.md)
 
-* [Node Types RFC](https://github.com/apache/couchdb-documentation/blob/master/rfcs/013-node-types.md)
+* [Node Types RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/013-node-types.md)
 
 * [CouchDB 3.x replicator implementation](https://github.com/apache/couchdb/blob/3.x/src/couch_replicator/README.md)
 
diff --git a/rfcs/017-fair-share-scheduling.md b/rfcs/017-fair-share-scheduling.md
new file mode 100644
index 0000000..d4e2f46
--- /dev/null
+++ b/rfcs/017-fair-share-scheduling.md
@@ -0,0 +1,216 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Fair Share Job Scheduling for CouchDB 3.x Replicator'
+labels: rfc, discussion
+assignees: 'vatamane@apache.org'
+
+---
+
+# Introduction
+
+This document describes an improvement to the CouchDB 3.x replicator to
+introduce fair resource sharing between replication jobs in different
+_replicator databases.
+
+## Abstract
+
+Currently CouchDB replicator 3.x schedules jobs without any regard to what
+database they originated from. If there are multiple `_replicator` dbs then
+replication jobs from dbs with most jobs will consume most of the scheduler's
+resources. The proposal is to implement a fair sharing scheme as described in
+[A Fair Share Scheduler][2] paper by Judy Kay and Piers Lauder. It would allow
+sharing replication scheduler resources fairly amongst `_replicator` dbs.
+
+The idea was originally discussed on the [couchdb-dev][1] mailing list and the
+use of the Fair Share algorithm suggested by Joan Touzet.
+
+## Requirements Language
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in [RFC
+2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+`_replicator` databases : A database that is either named `_replicator` or ends
+with the `/_replicator` suffix.
+
+`shares` : An abstract representation of entitlement to run on the replication
+scheduler.
+
+`usage` : A measure of resource usage by jobs from a particular `_replicator`
+db. For the scheduling replicator this will be the total time spent running.
+
+`continuous` replications : Replication jobs created with the `"continuous":
+true` parameter. These jobs will try to run continuously until the user removes
+them. They may be temporarily paused to allow other jobs to make progress.
+
+`one-shot` replications : Replication jobs which are not `continuous`. If the
+`"continuous":true` parameter is not specified, by default, replication jobs
+will be `one-shot`. These jobs will try to run until they reach the end of the
+changes feed, then stop.
+
+`job priority` : A job attribute which indicates the likelihood of the job
+being executed before other jobs. Following the convention in the "Fair Share"
+paper, jobs with a lower priority value are at the front of the pending queue,
+and get executed first.
+
+`max_jobs` : Configuration parameter which specifies up to how many replication
+jobs to run on each `replication` node.
+
+`max_churn` : Configuration parameter which specifies a limit of how many new
+jobs to spawn during each rescheduling interval.
+
+---
+
+# Detailed Description
+
+The general idea behind the algorithm is to continuously monitor
+per-`_replicator` jobs statistics and update each job's priorities in
+proportion to the usage from all the jobs in the same `_replicator` db. To make
+sure all jobs eventually get a chance to run and do not starve, all the
+priorities are continuously boosted, such that jobs which haven't run for a
+while, and maybe be starved, will eventually get a chance to run.
+
+The algorithm has 3 basic components that can run mostly independently from
+each other:
+
+1) Keep track of `usage` for each `_replicator` db . In the paper this part is
+called "user-level scheduling". As jobs run, they send reports to this
+component. Those reports are accumulated for one period, then rolled up when
+the period ends. There is also a decay coefficient applied to account for
+recent historical usage (this is called `K1` in the paper). This ensures in
+absence of jobs running from a particular `_replicator` db, the usage would
+drops to 0 and the whole entry is removed from the table table altogether.
+
+ Every `UsageUpdateInterval` seconds (called `t1` in the paper):
+   For each `Db`:
+     ```erlang
+     DecayCoeff = get_usage_decay_coefficient(0.5)
+     AccumulatedUsage = get_accumulated_usage(Db),
+     update_usage(Db, usage(Db) * DecayCoeff + AccumulatedUsage)
+     reset_accumulated_usage(Db)
+     ```
+
+2) Uniformly decay all process priorities. Periodically lower the priority
+values, and thus boost the priority, of all the pending and running jobs in the
+system. The paper in this step applies a per-process "nice" value, which is
+skipped in the initial proposal. It could be added later if needed.
+
+ Every `UniformPriorityBoostInterval` seconds (called `t2` in the paper):
+   For each `Job`:
+     ```erlang
+     DecayCoeff = get_uniform_decay_coefficient(0.75),
+     Job#job.priority = Job#job.priority * DecayCoeff
+     ```
+
+[note]: If jobs were scheduled to run at an absolute future time (a deadline) this step could be avoided. Then, the effect of all the jobs needing to periodically move to the front of the queue would be accomplished instead by the current time (i.e. `now()`) moving head along the time-line.
+
+3) Adjust running process priority in proportion to the shares used by all the
+jobs in the same db:
+
+ Every `RunningPriorityReduceInterval` seconds (called `t3` in the paper):
+   For each `Job`:
+     ```erlang
+     Db = Job#job.db,
+     SharesSq = shares(Db) * shares(Db),
+     Job#job.priority = Job#job.priority + (usage(Db) * pending(Db)) / SharesSq
+     ```
+
+### How Jobs Start and Stop
+
+During each rescheduling cycle, `max_churn` running jobs from the back of the
+queue are stopped and `max_churn` jobs from the front of the pending queue are
+started. This part is not modified from the existing scheduling algorithm,
+except now, the jobs would be ordered by their `priority` value before being
+ordered by their last start time.
+
+In addition, `one-shot` replication jobs would still be skipped when stopping
+and we'd let them run in order to maintain traditional replication semantics
+just like before.
+
+When picking the jobs to run exclude jobs which have been exponentially backed
+off due to repeated errors. This part is unmodified and from the original
+scheduler.
+
+### Configuration
+
+The decay coefficients and interval times for each of the 3 parts of the algorithm would be configurable in the `[replicator]` config section.
+
+Per-`_replicator` db shares would be configurable in the `[replicator.shares]` section as:
+
+```ini
+[replicator.shares]
+$prefix/_replicator = $numshares
+```
+
+By default each db is assigned 100 shares. Then higher number of shares should
+then indicated a larger proportion of scheduler resources allocated to that db.
+A lower number would get proportionally less shares.
+
+For example:
+
+```ini
+[replicator.shares]
+
+; This is the default
+; _replicator = 100
+
+high/_replicator = 200
+low/_replicator = 50
+```
+
+# Advantages and Disadvantages
+
+Advantages:
+
+  * Allow a fair share of resources between multiple `_replicator` db instances
+
+  * Can boost or lower the priority of some replication jobs by adjusting the
+    shares assigned to that database instance.
+
+Disadvantages:
+
+  * Adds more complexity to the scheduler
+
+# Key Changes
+
+ * Modifies replication scheduler
+
+   each `_replicator` db in the system.
+
+ * A delay in `running` state as reflected in monitoring API responses
+
+ * `[replicator] update_docs = false` configuration option becomes hard-coded
+
+## Applications and Modules affected
+
+ * `couch_replicator` application
+
+## HTTP API additions
+
+N/A
+
+## HTTP API deprecations
+
+N/A
+
+# Security Considerations
+
+None
+
+# References
+
+* [1]: https://lists.apache.org/thread.html/rebba9a43bfdf9696f2ce974b0fc7550a631c7b835e4c14e51cd27a87%40%3Cdev.couchdb.apache.org%3E "couchdb-dev"
+
+* [2]: https://proteusmaster.urcf.drexel.edu/urcfwiki/images/KayLauderFairShare.pdf "Fair Share Scheduler"
+
+# Co-authors
+
+ * Joan Touzet (@wohali)
+
+# Acknowledgments
+
+ * Joan Touzet (@wohali)
diff --git a/src/best-practices/reverse-proxies.rst b/src/best-practices/reverse-proxies.rst
index 4ef2fc3..46102ac 100644
--- a/src/best-practices/reverse-proxies.rst
+++ b/src/best-practices/reverse-proxies.rst
@@ -66,7 +66,7 @@ is for a 3 node CouchDB cluster:
         server couchdb2 x.x.x.x:5984 check inter 5s
 
 .. _HAProxy: http://haproxy.org/
-.. _code repository: https://github.com/apache/couchdb/blob/master/rel/haproxy.cfg
+.. _code repository: https://github.com/apache/couchdb/blob/main/rel/haproxy.cfg
 
 Reverse proxying with nginx
 ===========================
diff --git a/src/ddocs/views/collation.rst b/src/ddocs/views/collation.rst
index 4ce5182..37fae9f 100644
--- a/src/ddocs/views/collation.rst
+++ b/src/ddocs/views/collation.rst
@@ -108,7 +108,7 @@ Collation Specification
 
 This section is based on the view_collation function in `view_collation.js`_:
 
-.. _view_collation.js: https://github.com/apache/couchdb/blob/master/test/javascript/tests/view_collation.js
+.. _view_collation.js: https://github.com/apache/couchdb/blob/main/test/javascript/tests/view_collation.js
 
 .. code-block:: javascript
 
diff --git a/src/query-server/javascript.rst b/src/query-server/javascript.rst
index 2cd1acc..ce48c02 100644
--- a/src/query-server/javascript.rst
+++ b/src/query-server/javascript.rst
@@ -96,7 +96,7 @@ modules and functions:
 
 .. data:: JSON
 
-    `JSON2 <https://github.com/apache/couchdb/blob/master/share/server/json2.js>`_
+    `JSON2 <https://github.com/apache/couchdb/blob/main/share/server/json2.js>`_
     object.
 
 .. function:: isArray(obj)
diff --git a/src/replication/protocol.rst b/src/replication/protocol.rst
index 0bfbe79..f4e7b4f 100644
--- a/src/replication/protocol.rst
+++ b/src/replication/protocol.rst
@@ -533,7 +533,7 @@ ID:
     See `couch_replicator_ids.erl`_ for an example of a Replication ID generation
     implementation.
 
-    .. _couch_replicator_ids.erl: https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_ids.erl
+    .. _couch_replicator_ids.erl: https://github.com/apache/couchdb/blob/main/src/couch_replicator/src/couch_replicator_ids.erl
 
 Retrieve Replication Logs from Source and Target
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/src/setup/cluster.rst b/src/setup/cluster.rst
index 603d031..3660541 100644
--- a/src/setup/cluster.rst
+++ b/src/setup/cluster.rst
@@ -365,4 +365,4 @@ Ensure the ``all_nodes`` and ``cluster_nodes`` lists match.
 You CouchDB cluster is now set up.
 
 .. _HAProxy: http://haproxy.org/
-.. _example configuration for HAProxy: https://github.com/apache/couchdb/blob/master/rel/haproxy.cfg
+.. _example configuration for HAProxy: https://github.com/apache/couchdb/blob/main/rel/haproxy.cfg